*2.3. Methods*

#### 2.3.1. Quantifying Yg and the Contribution of SDT to Yg

An RS process-based crop yield model (Section 2.3.2 driven by MODIS data was used to simulate Ya, limiting the potential yield by only SDT (Yp0) over the NCP. Yp was computed on the basis of the modeled Yp0 (see Section 2.3.4. Two Ygs and the contribution of suboptimum SDT to Yg were calculated as follows:

$$\text{Yg} = \text{Yp} - \text{Ya} \tag{3}$$

$$\mathbf{Y}\mathbf{g}\_0 = \mathbf{Y}\mathbf{p} - \mathbf{Y}\mathbf{p}\_0 \tag{4}$$

$$\mathbf{C}\mathbf{\varUpsilon}\mathbf{g}\_0 = \mathbf{\varUpsilon}\mathbf{g}\_0/\mathbf{\varUpsilon}\mathbf{g} \tag{5}$$

where Yg or Yg0 is the gap between Yp and Ya or potential yield limited by suboptimum SDT, and CYg0 is the contribution of Yg0 to Yg.

When mapping the mean annual Yg and CYg0 based on MODIS 1 km resolution Ya and Yg0, the two 1 km data were aggregated to 5 km-resolution rasters. Most 1 km pixels over the NCP were not continuously cropped with maize in time (Supplementary Figure S2), whereas most 5 km grids have more than one 1 km pixels continuously cropped with maize in 2010–2015. A 5 km grid represents all maize fields located on the grid. However, we noted that the aggregation may reduce the effects of random factors (uncertainties in RS data or meteorological inputs) on yield, as discussed in Section 4.1.

#### 2.3.2. A Brief Overview of PRYM–Maize

A process-based RS crop yield model, PRYM–Maize, which was developed in our previous study and running on a daily step, was used to simulate maize yield in conjunction with MODIS data in this study [35]. PRYM–Maize consists of two basic modules: water balance and productivity modules. The water balance module (WBM) follows the RS and water balance-based Penman-Monteith model version 2 (RS-WBPM2) [38], which was modified from Bai, et al. [40], to simulate water stress based on MODIS VIs and meteorological data. The productivity module consists of multiple sub-processes, photosynthesis, grain conversion, and respirations. The ecosystem-level photosynthesis rate was scaled from the leaf-level value based on a two-leaf canopy model and the MODIS-based LAI (Equation (1))). A full description of the model can be found in Zhang, Bai, Zhang and Shahzad [35] or Supplementary Text S1.

#### 2.3.3. Modifications to PRYM–Maize

In PRYM–Maize, we simulated the growth of maize grain on a daily step as a function of development stage (DVS) and daily net primary productivity (NPPdaily), as presented in Supplementary Text S1.3. The DVS index is required to compute the proportions of photosynthesis-produced dry matters that were allocated to different organs, namely, root, stem, leaf, and grain. In the previous version of PRYM–Maize, the DVS is calculated as a function of GDDs, as also presented in other CGMs [41–43]. However, GDDs are vulnerable to uncertainties in input air temperature data. Alternatively, GPP was potentially useful to indicate the crop DVS because it is more directly related to crop growth than are GDDs [24]. In addition, RS information represented by GPP may partly eliminate the error in input air temperature data. Huang, Ryu, Jiang, Kimm, Kim, Kang and Shim [24] proposed using normalized accumulated GPP (GPPnor\_acc), estimated using an RS-based model, to indicate the DVS of crop and found that the two factors were tightly correlated. This method was successfully used in simulating rice grain-filling across three flux sites over the Korean Peninsula [24]. In this study, we do not use GDDs but adopt GPPnor\_acc to compute the DVS. GPPnor\_acc was designed to vary from 0 to 1 in Huang, Ryu, Jiang, Kimm, Kim, Kang and Shim [24] as crops grow accompanied by accumulating GPP, and GPPnor = 0 and 1 correspond to emerging and maturing stages of crops. Such that

$$\text{GPP}\_{\text{nor\\_acc}}^{(t\_0)} = \frac{\sum\_{t=\text{EM}}^{t\_0} \text{GPP}^{(t)}}{\text{GPP}\_{\text{acc}}} \tag{6}$$

$$\text{GPP}\_{\text{acc}} = \sum\_{t=\text{EM}}^{\text{HV}} \text{GPP}^{(t)} \tag{7}$$

where GPPacc denotes the accumulated GPP along the entire growing season; GPP (*<sup>t</sup>*0) nor\_acc denotes normalized accumulated GPP for day *t*0 and is equivalent to DVS; and EM and HV denote the emerging and harvest date, respectively. Pixel-level EM and HV were retrieved from the RS NDVI time series (Section 2.3.5).

DVS varies from 0 to 2, indicating the development of crops from emerging to maturing stages; therefore, we estimated DVS by scaling GPPnor\_acc using 2, as follows:

$$\text{DVS}^{(t\_0)} = \mathbf{2} \times \text{GPP}^{(t\_0)}\_{\text{nor\\_acc}} \tag{8}$$

2.3.4. Simulating Potential LAI and Yp

Temporal variations in LAI of crops fully depend on simulations in CGMs; thus, it is feasible to simulate Yp using a calibrated crop model by removing all field managemen<sup>t</sup> limits for crop growth in the model. However, not all managemen<sup>t</sup> factors in an RS-based yield model can be adjusted to the optimum because RS-LAI, as a critical input of the RS-based model, does include environmental stress on crop growth. However, it is still possible to use an RS-based model to simulate Yp if the stress on crop growth represented by actual LAI is removed. Here, we define the LAI time series, representing no environmental stress on crops and leading to potential yielding, as potential LAI (LAIp).

RS-LAI derived from the MODIS VI (Equation (1)) may fail in representing the magnitude of LAIp but could capture the variation trend of LAIp in time because the DVS of crops, including maize, primarily depends on the effective accumulative temperature rather than field managements, as represented by existing CGMs [41,42,44]. Figure 2 is a scatter plot, with 45 data pairs, for maize yield vs. peak LAI observed during the growing season (PLAI). These data were collected from 21 published and 8 unpublished super-high yield experiments over China. These experiments attempted to optimize all field managements, and the growth of maize was primarily limited by air temperature and solar radiation. Under optimum field managements, maize yield positively correlates with PLAI. When PLAI is greater than 6.6, maize yield tends to saturate at a high level, ~20 t hm−2, and the correlations between PLAI and maize yield become insignificant (Figure 2). This implies that a PLAI greater than 6.6 may be adequate for maize yielding.

**Figure 2.** Maize yield against growing season maximum leaf area index (LAImax) over China. Red circles denote data retrieved from 21 published [45–65] and 8 unpublished trials. For one trial involving multiple treatments or maize spices, the data pair from the highest yield treatment of each species was picked up. Blue circles denote recent field trails that have not been published yet.

Based on the above analyses, we proposed using the following steps to calculate LAIp and Yp:

1. Derive a reference curve ( *C*RF = LAI(0) RF , LAI(2) RF , . . . , LAI(*J*HV) RF ) from the LAI time series within the reference region (Figure 1), where *J*HV denotes the number of days from EM to HV, LAIRF denotes a reference LAI value of a given day. *C*RF has a PLAI greater than 6.6, representing the roughly temporal variation in LAIp. We obtain *C*RF using two procedures: (i) Average the LAI time series of all maize pixels retrieved from MODIS VI-based LAI (Equation (1)) by date in 2010 within the reference region (Figure 2) to obtain an LAI curve (*C*0 = LAI(0) 0 , LAI(2) 0 , . . . , LAI(*J*HV) 0 ) for maize; (ii) calculate *C*RF from *C*0 as *C*RF = PLAIopt× *C*0 −min(*<sup>C</sup>*0) max(*<sup>C</sup>*0)−min(*<sup>C</sup>*0), where PLAIopt denotes the optimum PLAI for maize potentially yielding and is expected to be greater than 6.6. In this study, we made a conservative estimate for PLAIopt and set it to 7.5, and this estimate was supported by Liu, Hou, Xie, Ming, Wang, Xu, Liu, Yang and Li [50], who reported PLAI = 7.53 of the maize plants, achieving the highest maize yield (22.5 t hm−2) record in China.

2. Derive LAIp for each pixel by fitting the time series of RS-LAI (*C*RS = LAI(0) RS , LAI(2) RS , LAI(*J*HV)

. . . , RS ) of the pixel to *C*RF using linear regression analysis, as shown in Figure 3, where *J*HV denotes the number of days from EM to HV for a given pixel. In other words, we fitted *C*RF = *k* × *C*RS + *b* for each pixel to obtain *k* and *b* and then calculated LAIp as *k* × *C*RS + *b*. If *J*HV = *J*HV, *C*RF was linearly stretched in the

span 3. Run the RS process-based crop yield model to simulate Yp0 with input LAI substituted byLAIp, *f* N(N)=1,and *g*sm,2000=0.017 ms<sup>−</sup>1.

 of *C*RS.

4. We referred to the method of Lobell [28], which assumed optimal field managemen<sup>t</sup> (SDT in this study) can be achieved within a given domain with similar climate conditions, to compute Yp at each pixel as the 95th of Yp0 values of surrounding pixels. These surrounding pixels were selected using two criteria: (i) Within a 50 km buffer around the central pixel and (ii) differences in accumulated GDDs and total solar radiation during the growing season (June–September) between the surroundings and the center pixel were less than 50 ◦C d and 50 MJ, respectively. The buffer size here referred to the size of a zone represented by a meteorological site in Wart, et al. [66].

timeline to match the time

**Figure 3.** An example of deriving LAIp from RS-LAI and CREF. LAIp on the right panel (**b**) was derived by fitting the RS-LAI on the left panel (**a**) to CREF using the least-square method.

#### 2.3.5. Extracting Crop Phenology Using RS Data

Emerging (EM) and harvest (HV) dates of each pixel in each year were retrieved from the 8-day and 1 km time-series NDVI data using TIMESAT3.3. The Savitzky–Golay filter was used to denoise the NDVI time series. The EM and HV dates were estimated on the basis of the denoised NDVI time series, and they corresponded to the start and end of the season (SOS and EOS, respectively) defined in TIMESAT. The key parameters required were set as follows:


where "window size" means the total nearest days that are used to denoise the current data; "relative amplitude" indicates that the SOS or EOS was estimated as the time when NDVI increases or decreases to a given proportion, as specified by the season start or end value, of the relative amplitude of NDVI time series during a specific growing season.
