2.3.1. Downscaling of MODIS GPP Products Based on the EBK Interpolation

The EBK was used to downscale the MODIS GPP products from a 500 m spatial resolution to 30 m [29,30]. The EBK is superior compared with the conventional spatial downscaling methods that rely solely on the spectral data of images and do not take into account image texture and structure, as well as with the classical kriging methods that ignore the explanation of the error introduced by modeling semivariograms [31]. The EBK is a powerful non-stationary algorithm and divides an image into subsets and uses simulation to makes the process automatic for spatial interpolation [32]. In this method, the following Kriging Equation (2) was utilized to predict GPP values [30,33,34]:

$$\mathbf{Z}\_{\mathbf{(x\_0)}} = \sum\_{i=1}^{n} \lambda\_i \cdot \mathbf{Z}\_{\mathbf{(x\_i)}} + \sum\_{i=1}^{n} s\_i \cdot \mathbf{U}\_{\mathbf{(x\_i)}} \tag{2}$$

where *Z<sup>i</sup>* , (i = 1, . . . , *n*) is the GPP value at location *x*<sup>i</sup> , λ*<sup>i</sup>* , (i = 1,2,3 . . . ,*n*), represents the kriging weight generated using the parameters of cross-variograms, and *s<sup>i</sup>* , (i = 1,2,3 . . . ,*n*) is the kriging weight estimated on the basis of a cross-variogram between Z(*x<sup>i</sup>* ) and *U*(*x<sup>i</sup>* ) . The *n* denotes the total number of observations. The variable *U*(*x*) was a standardized rank that was calculated as [30]:

$$\mathcal{U}\_{\left(\mathbf{x}\_{l}\right)} = \frac{\mathcal{R}}{n} \tag{3}$$

where *R* represents a rank of the *R*th order statistic of GPP on the land surface at location *x<sup>i</sup>* . Downscaling the 500 m MODIS GPP products to the 30 m spatial resolution using EBK was performed using ArcGIS 10.3 Geostatistical Analyst (Environmental Systems Research Institute, Inc., Redlands, CA, USA).

### 2.3.2. Selecting the Phases of GPP

The important step in selecting the phases for CLQ evaluation was the determination of the relevant growth stage. In this study, Pearson product moment correlation that quantifies the linear relationship between two variables [35,36] was applied to obtain the spectral variables with the highest coefficients at the significance level of 0.05. The correlation coefficient is calculated as:

$$r\_i = \frac{\sum\_{n=1}^{N} \left( \mathcal{R}\_{ni} - \overline{\mathcal{R}\_i} \right) (y - \overline{y})}{\sqrt{\sum\_{n=1}^{N} \left( \mathcal{R}\_{ni} - \overline{\mathcal{R}\_i} \right)^2 \sum\_{n=1}^{N} (y - \overline{y})^2}},\tag{4}$$

where *r<sup>i</sup>* is the correlation coefficient between growth stage and CLQ, *N* is the total number of CLQ samples. *Rni* is the *i*th growth stage of the *n*th CLQ sample, *R<sup>i</sup>* is the average of the CLQ sample values in the *i*th growth stage, and *y* is the nth CLQ, *y* is the average value of CLQ.

Moreover, the Variance Inflation Factor (VIF) was applied to mitigate the collinearity among the GPP predictors, which is calculated as [37,38]:

$$\text{VIF} = \frac{1}{1 - R\_i^2} \tag{5}$$

where, *R* 2 *i* is the determination coefficient between the *i*th predictor and the remaining independent variables. The larger the VIF, the greater the collinearity between the predictors. In general, the values from 0 to 10, 10 to 100, and equal to and greater than 100, respectively, imply no strong and serious collinearity.

### 2.3.3. Partial Least Squares Regression

In this study, to improve the evaluation of CLQ, GA-BPNN was proposed and compared with PLSR and SVR to predict CLQ using GPPs.

As a linear multivariate model, PLSR relates two data matrices, *X* and *Y*. Compared with traditional regression, however, PLSR can be used to analyze the predictors that have strong correlations [39,40], which is expressed as follows:

$$Y = X\beta + \varepsilon,\tag{6}$$

where *Y* is the response variable CLQ, *X* is the predictor GPP at different stages, β is the regression coefficient, and ε is the residual.

### 2.3.4. Support Vector Regression

The SVR is a widely used supervised learning method for solving the problem of regression fitting. Different from the traditional process from induction to deduction, SVR greatly simplifies the usual regression process by making efficient "transductive inference" from training samples to prediction [41–43]. Traditional regression algorithms use training samples to generate a model (a global trend) that is used to predict values of the dependent variable at unknown locations. In SVR, the values of 2011–2015 MODIS-GPPs as predictors *x* in this study are first plotted onto an m-dimensional feature space (m–number of predictors) and a linear model with its epsilon band-acceptable prediction surface is constructed in the feature space.

$$f(\mathbf{x}) = a\nu\mathbb{Z}(\mathbf{x}) + b \tag{7}$$

where *f*(*x*) presents CLQ estimates, ∅(*x*) denotes the GPP set of nonlinear transformations, *b* is the error term, and ω is the weight coefficient. After preprocessing of data, the error term often has a zero mean and can be dropped. According to structural risk maximization principles, ω and *b* are calculated by following the objective function *R*(*x*).

$$R(\mathbf{x}) = \frac{1}{2} \|\boldsymbol{\omega}\|^2 + \frac{1}{l} \sum\_{i=1}^{l} \left| f(\mathbf{x}\_i) - y\_i \right|\_\varepsilon \tag{8}$$

where *<sup>f</sup>*(*xi*) <sup>−</sup> *<sup>y</sup><sup>i</sup>* ε is the insensitive loss function, ε is the error tolerance of the insensitive loss function, *l* denotes the number of samples, kωk 2 reflects the flatness of in the m-dimensional space.

The model complexity can be simplified by minimizing kωk 2 . By introducing (non-negative) slack variables ξ<sup>i</sup> , ξ ∗ i , i = 1, . . . , n and penalty factor (C) to derive the deviation of training samples outside the insensitive loss function, SVR is formulated by minimizing the following objective function:

$$\begin{aligned} \mathbf{R}\left(\boldsymbol{\omega}, \boldsymbol{\xi}\_{\dot{\mathbf{u}}}, \boldsymbol{\xi}\_{\dot{\mathbf{i}}}^{\*}\right) &= \frac{1}{2} \|\boldsymbol{\omega}\|^{2} + \mathbb{C} \sum\_{\mathbf{i}=1}^{1} \left(\boldsymbol{\xi}\_{\dot{\mathbf{i}}} + \boldsymbol{\xi}\_{\dot{\mathbf{i}}}^{\*}\right) \\ \text{s.t.} \begin{cases} \mathbf{y}\_{\mathbf{i}} - \boldsymbol{\mathcal{D}}(\mathbf{x}) - b \leq \boldsymbol{\varepsilon} + \boldsymbol{\xi}\_{\dot{\mathbf{i}}}^{\*} \\ \boldsymbol{\omega}\boldsymbol{\mathcal{D}}(\mathbf{x}) + b - \mathbf{y}\_{\mathbf{i}} \leq \boldsymbol{\varepsilon} + \boldsymbol{\xi}\_{\dot{\mathbf{i}}}^{\*} \text{ ( $\mathbf{i} = 1, \dots, \mathbf{n}$ )} \\ \boldsymbol{\xi}\_{\dot{\mathbf{i}}}, \boldsymbol{\xi}\_{\dot{\mathbf{i}}}^{\*} \geq \mathbf{0}, \mathbf{i} = \mathbf{1}, \dots, \mathbf{n} \end{cases} \end{aligned} \tag{9}$$

The above problem can be translated into the following dual problem:

$$\begin{aligned} \min\_{\begin{subarray}{c}\boldsymbol{\alpha}\_{\mathrm{i}}^{(\ast)} \in \mathsf{R}^{2\mathrm{i}} \end{subarray}} & \frac{1}{2} \sum\_{\mathbf{i},\mathbf{j}=1}^{1} \left(\boldsymbol{\alpha}\_{\mathrm{i}} - \boldsymbol{\alpha}\_{\mathrm{i}}^{\*}\right) \big(\boldsymbol{\alpha}\_{\mathrm{j}} - \boldsymbol{\alpha}\_{\mathrm{j}}^{\*}\big) \big(\boldsymbol{\mathcal{D}}\big(\mathbf{x}\_{\mathrm{i}}\big)\big(\mathbf{x}\_{\mathrm{j}}\big)\big) + \varepsilon \sum\_{\mathbf{i}=1}^{1} \left(\boldsymbol{\alpha}\_{\mathrm{i}}^{\*} + \boldsymbol{\alpha}\_{\mathrm{i}}\right) - \sum\_{\mathbf{i}=1}^{1} \operatorname{y}\_{\mathrm{i}} \big(\boldsymbol{\alpha}\_{\mathrm{i}}^{\*} - \boldsymbol{\alpha}\_{\mathrm{i}}\big) \\ & \text{s.t.} \begin{cases} \quad \sum\_{\mathbf{i}=1}^{1} \left(\boldsymbol{\alpha}\_{\mathrm{i}} - \boldsymbol{\alpha}\_{\mathrm{i}}^{\*}\right) = 0 \\ \quad 0 \le \boldsymbol{\alpha}\_{\mathrm{i}}^{(\ast)} \le \mathrm{C}, \mathrm{i} = 1, \ldots, \mathrm{l}. \end{cases} \end{aligned} \tag{10}$$

where α<sup>i</sup> − α ∗ i is the transformation of the ω variable after introducing the Lagrange factor. The SVR function is obtained by solving the above problems:

$$f(\mathbf{x}) = a\mathcal{Z}(\mathbf{x}) + b = \sum\_{\mathbf{i}=1}^{1} (\alpha\_{\mathbf{i}}^{\*} - \alpha\_{\mathbf{i}}) \mathbf{K}(\mathbf{x}\_{\mathbf{i}}, \mathbf{x}) + b \tag{11}$$

where *K*(*x<sup>i</sup>* , *x*) = ∅(*xi*)∅(*x*) is kernel function.

2.3.5. Genetic Algorithm-Back Propagation Neural Network

The GA-BPNN model is developed by combining a back propagation neural network (BPNN) with the genetic algorithm optimization (GA). The BPNN has been widely used to find solutions for nonlinear relationships. The training samples are used to train the multi-layer BPNN using the error back propagation (BP) algorithm. During the process of modifying the weights, however, the standard BP algorithm ignores previous gradient direction, which often leads to the oscillation and slow convergence of the learning process [44]. The GA mimics biological evolution processes and has the capacity of finding global optimum solutions of the problems and thus can be utilized to optimize the thresholds and weights of the BPNN [45]. Therefore, the combination of BPNN and GA results

in an integrated model that provides the potential of improving the efficiency and accuracy of the predictions. The flow chart of GA-BPNN with its structure [46] is shown in Figure 2. *Agriculture* **2020**, *10*, x FOR PEER REVIEW 7 of 16

**Figure 2.** The Flow chart of GA-BPNN. **Figure 2.** The Flow chart of GA-BPNN. **Figure 2.** The Flow chart of GA-BPNN.

### **3. Results 3. Results**

**3. Results** 

### *3.1. Downscaling of MODIS GPPs by EBK Interpolation 3.1. Downscaling of MODIS GPPs by EBK Interpolation*

To improve the accuracy of evaluating CLQ, 2011–2015 MODIS-GPPs with the 500 m spatial resolution were scaled downs to a 30 m spatial resolution using the EBK interpolation. Compared with the original standard MODIS-GPP for the same date, the downscaled MODIS-GPP on the 233rd to 289th day in 2013 shows more detailed information (Figure 3), indicating that the quality of the downscaled data is higher than the original standard data. To improve the accuracy of evaluating CLQ, 2011–2015 MODIS-GPPs with the 500 m spatial resolution were scaled downs to a 30 m spatial resolution using the EBK interpolation. Compared with the original standard MODIS-GPP for the same date, the downscaled MODIS-GPP on the 233rd to 289th day in 2013 shows more detailed information (Figure 3), indicating that the quality of the downscaled data is higher than the original standard data. *3.1. Downscaling of MODIS GPPs by EBK Interpolation*  To improve the accuracy of evaluating CLQ, 2011–2015 MODIS-GPPs with the 500 m spatial resolution were scaled downs to a 30 m spatial resolution using the EBK interpolation. Compared with the original standard MODIS-GPP for the same date, the downscaled MODIS-GPP on the 233rd to 289th day in 2013 shows more detailed information (Figure 3), indicating that the quality of the downscaled data is higher than the original standard data.

study area in 2013: (**a**) 500 m original standard MODIS-GPPs and (**b**) 30 m MODIS-GPPs interpolated by the Empirical Bayes Kriging (EBK) method. **Figure 3.** The spatial distribution maps of the cumulative GPP from the 233rd to 289th days in the study area in 2013: (**a**) 500 m original standard MODIS-GPPs and (**b**) 30 m MODIS-GPPs interpolated by the Empirical Bayes Kriging (EBK) method. **Figure 3.** The spatial distribution maps of the cumulative GPP from the 233rd to 289th days in the study area in 2013: (**a**) 500 m original standard MODIS-GPPs and (**b**) 30 m MODIS-GPPs interpolated by the Empirical Bayes Kriging (EBK) method.

**Figure 3.** The spatial distribution maps of the cumulative GPP from the 233rd to 289th days in the

Table 2 shows the comparison results of MODIS-GPPs between the spatial resolutions of 500 m and 30 m based on the field observations (dry biomass) of 30 samples plots. The average MODIS-GPP was 475.09 g C/m<sup>2</sup> for the 500 m, 472.73 g C/m<sup>2</sup> for 30 m, and the average field observation was 465.49 g/m<sup>2</sup> . Compared with the field observations, the 30 m spatial resolution MODIS GPP has a root mean square error (RMSE) of 7.43 and a normalized RMSE (NRMSE) of 1.59%, while the corresponding values for the 500 m MODIS GPP were 33.43 and 7.18%. The results imply that the downscaled MODIS-GPPs by EBK interpolation can reflect the productivity of cultivated land more accurately than the unscaled MODIS-GPPs.

**Plot# Field Observations 30 m MODIS-GPPs 500 m MODIS-GPPs Estimates Absolute Error (%) Estimates Absolute Error (%)** 1 514.23 521.95 1.50 530.31 3.13 2 485.68 496.85 2.30 485.82 0.03 3 519.91 529.79 1.90 525.33 1.04 4 685.27 688.70 0.50 731.61 6.76 5 538.52 546.06 1.40 555.16 3.09 6 592.24 599.94 1.30 609.87 2.98 7 639.33 645.08 0.90 571.20 10.66 8 402.12 411.77 2.40 407.59 1.36 9 437.78 445.66 1.80 447.50 2.22 10 451.55 460.13 1.90 490.44 8.61 11 555.35 560.35 0.90 598.75 7.81 12 299.14 307.52 2.80 352.92 17.98 13 317.47 325.09 2.40 330.34 4.05 14 506.90 515.01 1.60 520.97 2.78 15 408.60 416.77 2.00 447.72 9.57 16 438.98 446.44 1.70 415.71 5.30 17 457.91 464.78 1.50 446.98 2.39 18 448.84 456.02 1.60 450.13 0.29 19 393.27 399.95 1.70 409.76 4.19 20 494.46 501.87 1.50 459.74 7.02 21 394.18 401.67 1.90 395.16 0.25 22 379.54 386.38 1.80 389.23 2.55 23 425.62 431.58 1.40 431.13 1.29 24 380.76 387.62 1.80 383.95 0.84 25 567.61 572.15 0.80 682.55 20.25 26 401.86 410.30 2.10 370.57 7.79 27 539.04 543.35 0.80 541.45 0.45 28 541.05 546.46 1.00 509.82 5.77 29 364.34 372.00 2.10 368.16 1.05 30 383.00 390.66 2.00 392.90 2.59 Mean 465.49 472.73 1.64 475.09 4.80 Stdev 91.77 91.00 97.57 RMSE 7.43 33.43 NRMSE (%) 1.59 7.18

**Table 2.** Comparison of the original 500 m spatial resolution MODIS-GPPs from the 289th day in 2013 with their downscaled 30 m products by the Empirical Bayes Kriging (EBK) method based on the dry biomass field observations of 30 sample plots.
