Estimation of Agronomic Characters of Wheat Based on Variable Selection and  Machine Learning Algorithms

Wang, Dunliang; Li, Rui; Liu, Tao; Sun, Chengming; Guo, Wenshan

doi:10.3390/agronomy13112808

Open AccessArticle

Estimation of Agronomic Characters of Wheat Based on Variable Selection and Machine Learning Algorithms

by

Dunliang Wang

^1,2,3,

Rui Li

^1,2,3,

Tao Liu

^2,3,

Chengming Sun

^2,3,*

and

Wenshan Guo

^2,3

¹

Institute Agricultural Science Taihu Area Jiangsu, Suzhou 215155, China

²

Jiangsu Key Laboratory of Crop Cultivation and Physiology, Agricultural College of Yangzhou University, Yangzhou 225009, China

³

Jiangsu Co-Innovation Center for Modern Production Technology of Grain Crops, Yangzhou University, Yangzhou 225009, China

^*

Author to whom correspondence should be addressed.

Agronomy 2023, 13(11), 2808; https://doi.org/10.3390/agronomy13112808

Submission received: 13 October 2023 / Revised: 9 November 2023 / Accepted: 10 November 2023 / Published: 13 November 2023

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Wheat is one of the most important food crops in the world, and its high and stable yield is of great significance for ensuring food security. Timely, non-destructive, and accurate monitoring of wheat growth information is of great significance for optimizing cultivation management, improving fertilizer utilization efficiency, and improving wheat yield and quality. Different color indices and vegetation indices were calculated based on the reflectance of the wheat canopy obtained by a UAV remote sensing platform equipped with a digital camera and a hyperspectral camera. Three variable-screening algorithms, namely competitive adaptive re-weighted sampling (CARS), iteratively retains informative variables (IRIVs), and the random forest (RF) algorithm, were used to screen the acquired indices, and then three regression algorithms, namely gradient boosting decision tree (GBDT), multiple linear regression (MLR), and random forest regression (RFR), were used to construct the monitoring models of wheat aboveground biomass (AGB) and leaf nitrogen content (LNC), respectively. The results showed that the three variable-screening algorithms demonstrated different performances for different growth indicators, with the optimal variable-screening algorithm for AGB being RF and the optimal variable-screening algorithm for LNC being CARS. In addition, using different variable-screening algorithms results in more vegetation indices being selected than color indices, and it can effectively avoid autocorrelation between variables input into the model. This study indicates that constructing a model through variable-screening algorithms can reduce redundant information input into the model and achieve a better estimation of growth parameters. A suitable combination of variable-screening algorithms and regression algorithms needs to be considered when constructing models for estimating crop growth parameters in the future.

Keywords:

wheat; UAV; variable selection; machine learning; vegetation index

1. Introduction

Wheat is an important cereal crop, and its sustainable production is essential to ensure food security in the context of rapid global population growth [1,2,3]. Nitrogen is one of the important nutrients required for crop growth and development and plays an indispensable role in crop growth. Lack of nitrogen fertilizer can limit crop photosynthesis, while the excessive application of nitrogen fertilizer can lead to problems such as resource waste, soil acidification, and environmental pollution [4,5]. As good indicators of nitrogen fertilizer application, leaf nitrogen content (LNC) and aboveground biomass (AGB) at the main growth stages (jointing, booting, and flowering) play an important role in evaluating the quality of nitrogen fertilizer application, assisting nitrogen fertilizer application, and reducing N loss [6]. Therefore, the quantification of LNC and AGB is the key foundation for producing high-yield and high-quality crops.

Remote sensing technology, which allows for the rapid, non-destructive, real-time monitoring of crop growth, is now maturing. Satellite data with various temporal, spatial, and spectral resolutions are widely used in various scales of crop yield prediction studies [7]. However, spectral data from satellite platforms are somewhat limited in terms of spatial resolution and temporal sampling, hampering the timely estimation of crop agronomic traits [8]. Unmanned aerial vehicles (UAVs) can obtain high temporal- and spatial-resolution imagery and achieve large-scale crop monitoring, making them an attractive technology for crop growth assessment in smart agriculture in recent years [9,10]. Sampled data from UAV platforms have improved temporal, spatial, and spectral resolution compared to satellite platforms [11]. Therefore, the high-throughput images acquired by the different sensors carried by UAVs offer great opportunities for crop phenotype monitoring [12,13].

Vegetation indices (VIs) extracted from UAV-based high-throughput imagery have proven to be a well-established method for monitoring crop agronomic traits [7,14]. Previous studies have demonstrated the potential of UAV RGB imagery and spectral imagery for monitoring biomass [15,16,17] and nitrogen status [18,19]. The optimal VIs can maximize sensitivity to agronomic traits and reduce the impact of environmental factors and sensor types on spectral data [20,21]. Hunt et al. [22] used small drones to acquire RGB images of farmland and their study found that the normalized green-red difference index (NGRDI) before canopy closure was sensitive to AGB. The red-blue ratio index (RBRI) extracted from UAV RGB images by Schirrmann et al. [23] was strongly associated with biomass, with a coefficient of determination (R²) ranging from 0.72 to 0.99. In addition, some studies have shown that spectral data from the red edge and near-infrared bands also have good applications in biomass estimation [24,25]. For example, commonly used VIs such as the green optimum soil adjusted vegetation index (GOASVI), the modified soil adjusted vegetation index (MSAVI), and the normalized difference vegetation index (NDVI) have been shown to give satisfactory results in estimating the agronomic traits (e.g., AGB and nitrogen status) of crops such as wheat, maize, and rice in many studies [26,27,28,29]. Therefore, RGB and hyperspectral images from UAVs contain a large amount of color and spectral information, which can be used to detect changes in crop growth, providing the technology to quantify LNC and AGB over large areas.

Traditional statistical analysis models, such as simple linear regression and multiple linear regression algorithms, are commonly used for the remote sensing inversion of crop agronomic traits. In recent years, the application of machine learning algorithms in crop growth monitoring has also become more and more extensive. Machine learning is a data-driven algorithm that can autonomously process complex linear relationships between data [30]. Machine learning algorithms, which can solve strongly non-linear problems with remote sensing variables and agronomic traits [31], are increasingly being applied. Previous studies have shown that machine learning algorithms such as artificial neural networks (ANNs), random forest (RF), and support vector regression (SVR) can be adequately applied to canopy spectral data [18,32], avoiding the inherent multicollinearity problem of multiple linear regression [18]. Verrelst et al. [33] applied four machine learning algorithms, including neural networks (NNs), kernel ridge regression (KRR), support vector regression (SVR) and Gaussian process regression (GPR), to estimate three traits of leaf chlorophyll content (LCC), leaf area index (LAI) and fractional vegetation cover (FVC), and compared model performance, and they found that the GPR model estimated the best results. Zheng et al. [34] used UAV multispectral images to extract VIs and combined 13 regression algorithms (including simple linear regression, machine learning algorithms, and physical models) to construct and compare LNC estimation models for winter wheat. Their findings showed that simple linear regression algorithms and machine learning algorithms performed well for LNC assessment, but LNC inversion based on physical models was still challenging, and inversion accuracy was low.

Hyperspectral data contain a large amount of spectral data, but this also implies data redundancy issues. Choosing the appropriate variable-selection algorithm prior to modeling can reduce the model running time and improve the model estimation [35]. However, much of the current research has focused on the application of different modeling approaches to estimate crop agronomic traits, with few studies using variable-selection algorithms for hyperspectral remote sensing to monitor traits such as LNC and AGB, even though the variable selection is also an important factor affecting model results. To address this issue, this study aims to improve the accuracy of LNC and AGB estimation for wheat based on UAV hyperspectral and RGB imagery by combining different variable-selection algorithms and modeling approaches to better assist N fertilizer application and increase N fertilizer utilization efficiency.

This study used UAV RGB images and hyperspectral images to dynamically monitor wheat AGB and LNC under different growing conditions. The specific objectives were to (1) investigate the relationship between color features and spectral indices from UAV imagery and LNC and AGB; (2) filter feature variables by using feature-selection algorithms to remove redundant variables, combine multiple regression models for LNC and AGB estimation, and explore the effects of different variables on the estimated models; and (3) evaluate the accuracy and robustness of the LNC and AGB estimation models developed from combined RGB and hyperspectral imagery using statistical analysis.

2. Materials and Methods

2.1. Study Site and Experiment Design

The two-year field trial was conducted in Yizheng City, Jiangsu Province (32°30′ N, 119°13′ E), which is a typical wheat cultivation area in the middle and lower reaches of the Yangtze River in eastern China (Figure 1). The previous crop of the two-year field trial was rice. Winter wheat in Exp.1 was grown on clay loam soil with an average pH of 7.15, an organic matter content of 14.24 g/kg, an effective nitrogen of 65.23 mg/kg, an effective phosphorus of 43.43 mg/kg and an effective potassium of 112.37 mg/kg. The average pH of the soil in the experimental field was 7.24, the organic matter content was 14.12 g/kg, the effective nitrogen was 72.52 mg/kg, the effective phosphorus was 63.60 mg/kg and the effective potassium was 102.76 mg/kg in Exp.2. The field trials involved three wheat varieties (V1–V3: Yangmai 23, Zhenmai 9 and Ningmai 13). In addition, two planting densities (D1: 450 grains/m² and D2: 600 grains/m²) and four N fertilizer application treatments (N1-N4: 0 kg/hm², 105 kg/hm², 210 kg/hm² and 315 kg/hm²) were considered in the trial, which was replicated twice in total. The N fertilizer application strategy for the three trials was 5:1:4 in three applications: basal (1 d before transplanting), tiller (7 d after transplanting) and spike (at the beginning of spike differentiation). The field trial consisted of 48 sub-sample plots with an area of 18 m² (3 m × 6 m). The field trials in both years were mechanically sown in early November in rows 30 cm apart and harvested in early June. The phosphorus and potassium fertilizers (120 kg/hm² P₂O₅, 120 kg/hm² K₂O) were applied once before sowing. Other field management practices (e.g., weeding, pesticide application, etc.) followed local practices.

2.2. Field Data Acquisition

Ground sampling was carried out during the main growth period of wheat (Table 1). Twenty wheat plants were randomly selected at each plot for destructive sampling to determine AGB at the jointing, booting and flowering stages. Plant samples were separated by organ and oven-dried at 105 °C for 0.5 h, and then they were oven-dried at 80 °C to a constant weight, after which each organ was weighed for dry matter. The sum of the dry matter weight of each organ was used to determine the AGB of wheat in different plots. The samples of leaves were ground and sieved to determine the leaf nitrogen concentration of each plot using the Kjeldahl method [36].

2.3. UAV Image Acquisition

UAV missions were conducted prior to field data collection. A DJI Phantom 4 RTK (SZ DJI Technology Co.; Shenzhen, China) drone was selected as the RGB data collection platform to monitor wheat growth at the jointing, booting and flowering stages. The RGB camera equipped with the DJI Phantom 4 RTK drone used a 20-megapixel CMOS and a 24 mm focal length sensor, and the ground resolution of the RGB image was 2.47 cm. The ground station software (DJI GS PRO) was used to design the flight path of the UAV, which flew at an altitude of 25 m, with a forward and lateral overlap setting of 80% and a flight speed of 3 m/s. The images were acquired as JPEG RGB images at 5472 × 3648 pixels, with an effective pixel count of 20 million.

The DJI M600 Pro hexacopter was equipped with the Gaiasky-mini2-VN imager (Dualix Spectral Imaging Technology, Beijing, China) to obtain hyperspectral image data of the test field. The hyperspectral images captured by the Gaiasky-mini2-VN imager contain 176 channels and a wavelength range of 400–1000 nm, with a ground resolution of 8.5 mm. The UAV flight altitude was set to 100 m, the route coordinate points were manually planned, and hovering was used to capture the images. The content repetition rate between two adjacent images was 80%. The camera was calibrated prior to launch to adjust the exposure time. In addition, three gray-scale gradient calibration plates and five ground control points (GCPs) were placed on the ground to calibrate the images. The flight took place from 10:30 a.m. to 11:30 a.m. local time in clear weather with no strong winds.

2.4. UAV Image Processing

Before estimating wheat agronomic traits, the UAV images had to be processed to obtain relevant image data, including color indices and vegetation indices, as shown in Figure 2.

The RGB images were calibrated and stitched using Pix4Dmapper (Pix4D SA, Lausanne, Switzerland) software, and the resulting RGB orthophotos were saved as tagged image format (TIF) files (Figure 2). The UAV hyperspectral images were radiometrically calibrated using the SpecView software (Specim, Oulu, Finland), and reflectance was calibrated against captured reference grey cloth data. The method of radiometric calibration is shown in the following equation:

R_{r e f} = \frac{D N_{r} - D N_{d}}{D N_{w} - D N_{d}}

(1)

where R_ref is the reflectance value of the calibrated image, DN_r is the digital number (DN) value of the raw image, DN_w is the reflectance of the white background plate and DN_d is the reflectance of the black background plate.

The method of reflectance calibration is shown in the following equation:

R_{f i x e d} = \frac{R_{r e f} \times R_{s t}}{R_{g r a y}}

(2)

where R_fixed is the spectral reflectance of the image after eliminating atmospheric, water vapor, etc., R_ref is the reflectance of the calibrated image, R_st is the spectral reflectance of the standard gray cloth, and R_gray is the spectral reflectance of the reference gray cloth taken.

Hyperspectral images were geometrically calibrated and stitched using HiSpectralStitcher software (Dualix Spectral Imaging Technology) in conjunction with ground control point coordinates, and the resulting hyperspectral orthophotos were saved as TIF files (Figure 2). Finally, the stitched RGB and hyperspectral images were subjected to plot-based orthophoto cropping using ENVI 5.3 (EXELIS, Boulder, CO, USA) software, and the correlation indices were extracted based on the cropped regions.

2.5. Image Feature Extraction

2.5.1. Color Index Extraction

Color indices can reflect crop growth conditions. The DN values of the red, green and blue (R, G and B) channels of the RGB image are normalized to obtain r, g and b. The formula is as follows:

r = \frac{R}{R + G + B}

(3)

g = \frac{G}{R + G + B}

(4)

b = \frac{B}{R + G + B}

(5)

In this study, 16 color indices that are more commonly used for estimating agronomic traits in wheat were selected, and the calculation of each color index is shown in Table 2.

2.5.2. Vegetation Index Extraction

Vegetation indices are linear or non-linear combinations between different remote-sensing spectral bands. In this paper, 18 more commonly used vegetation indices were selected as features of wheat canopy spectra; the specific names and calculation methods are shown in Table 3.

2.6. Variable Selection

UAV hyperspectral images are characterized by high dimensionality and covariance. Therefore, this study applied three feature-selection algorithms, namely competitive adaptive re-weighted sampling (CARS), iteratively retains informative variables (IRIVs), and the random forest (RF) algorithm for the selection of RGB and hyperspectral indices to reduce redundant information and improve model performance.

CARS is a variable-selection algorithm that mimics Darwin’s “survival of the fittest” theory [59,60]. CARS assesses the relative importance of variables based on the stability index calculated by an exponentially decreasing function and then selects variables with high regression coefficients in the PLS model through an adaptive weighted sampling technique, combined with tenfold cross-validation, to choose the subset of the PLS model with the lowest root-mean-square error of cross-validation (RMSECV), which is used as the optimal combination of the characteristic variables [61].

IRIVs is a new variable-selection algorithm that uses a binary matrix rearrangement filter (BMSF) to generate a large number of variable combinations [62,63], combined with PLSR, and uses the root-mean-square error of cross-validation (RMSECV) to assess the effectiveness of different random variable combination models [62,64]. IRIVs performs multiple iterations, retaining strong and weak information variables, removing confounding variables and uninformative variables, and finally determining the best combination of variable sets [65].

The RF algorithm identifies covariance and nonlinear relationships between different variables and evaluates the importance of each variable with good generalization performance [66,67]. RF applied the bagging method, using random samples from the training set to build independent regression trees to estimate the importance of different variables [68], with the following formula:

I m p o r t a n c e (x_{i}) = \sum_{i = 1}^{n} \frac{e r r o B_{2} - e r r o B_{1}}{n}

(6)

where erroB₁ represents the error of out of bag for variable x_i with one decision tree, erroB₂ represents the error of adding noise to variable x_i with one decision tree, and n represents the number of decision trees.

Detailed steps for implementing CARS, IRIVs and RF are given in references [30,60,65]. The operations related to the three feature-selection algorithms were conducted in MATLAB 2019a (Matrix Laboratory, Math-Works, Natick, MA, USA).

2.7. Modelling and Validation of Agronomic Traits in Wheat

Considering the possible complex relationship between these UAV image feature variables and LNC and AGB, this study used one linear regression algorithm (multiple linear regression, MLR) and two integrated machine learning algorithms (random forest regression, RFR; gradient boosting decision tree, GBDT) to estimate wheat LNC and AGB (Figure 3).

The GBDT algorithm uses all the samples in the training set to fit the regression tree [69]. The regression tree of the RFR algorithm is parallel, whereas the tree of the GBDT is continuous. Each new tree in the GBDT is optimized by a loss function determined by the steepest gradient. Therefore, the last regression tree after several iterations and improvements is used to compute the target estimate. For detailed information on the GBDT algorithm, refer to Friedman [70] and Wei et al. [71].

MLR is a regression method that uses two or more independent variables to predict the dependent variable. This study used IBM SPSS Statistics 24 (Cary, NC., USA) for MLR model construction.

The RFR algorithm estimates wheat agronomic traits (LNC and AGB) by combining multiple regression trees [7]. The RFR algorithm starts by randomly selecting subsamples from the training set (60% of the recorded target samples), then fits the regression trees with the sub-samples, and ultimately calculates the target modeled values by averaging the values of all the regression trees. Detailed information about RFR can be found in the study of Wang et al. [72].

In this study, 2/3 of the dataset (n = 216) is used for model calibration, and the remaining 1/3 of the dataset (n = 72) is used for model validation. In the model evaluation, the coefficient of determination (R²), root-mean-square error (RMSE) and normalized root-mean-square error (NRMSE) are used to evaluate the goodness of model accuracy.

R^{2} = {(\frac{\sum_{i = 1}^{n} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2} \sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}})}^{2}

(7)

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - y_{i}^{'})}^{2}}{n}}

(8)

N R M S E = \frac{R M S E}{\bar{x}}

(9)

where

x_{i}

is the estimated value of sample i,

\bar{x}

is the mean value of

x_{i}

,

y_{i}

is the estimated value of sample i,

\bar{y}

is the mean value of

y_{i}

,

y_{i}^{'}

is the measured value of sample i, and n is the number of samples.

3. Results

3.1. Statistics of Ground Truth Data

Table 4 shows the measured values of AGB and LNC in the field. The CVs of AGB for the training set were 21.53%, 19.18% and 16.98% at the jointing, booting and flowering stages, respectively, while the CVs of LNC were 27.47%, 27.68% and 27.89 at the above three reproductive periods, respectively. For the data in the validation set, the CVs of AGB were 31.83%, 20.59% and 21.05 at the jointing, booting and flowering periods, respectively, and the CVs of LNC were 27.63%, 27.10% and 28.37% under the three periods, respectively. In comparison, AGB and LNC showed good consistency between the training set data and the validation set data, and all CVs were greater than 15%. Overall, the different treatments of the experiment resulted in significant differences in wheat growth, providing a reliable dataset for the subsequent construction of the AGB and LNC estimation models.

3.2. Correlation Analysis

The correlation coefficients (r) between wheat agronomic traits (AGB and LNC) and remote sensing indices (a: color indices, b: vegetation indices) are shown in Figure 4, with color depth representing the strength of the correlation. From Figure 4, it can be seen that there were differences in the correlation coefficients between different types of remote sensing indices and different agronomic traits. The correlation between the color index and wheat AGB and LNC was weak. The strongest correlation was between B and wheat AGB with a correlation coefficient of −0.42. The highest correlation with wheat LNC was r with a correlation coefficient of 0.43. The strongest correlation between vegetation index and wheat AGB was WBI with a correlation coefficient of 0.51. There were highly significant correlations between all vegetation indices and wheat LNC, with the strongest correlation being with GNDVI, with a correlation coefficient of 0.89. This result suggested that the correlations between vegetation indices and AGB and LNC were stronger than those of color indices.

3.3. Development of Models

In this study, the selected 16 color indices and 18 vegetation indices were first screened using 3 methods (CARS, IRIVs, RF). After variable screening, three regression algorithms, GBDT, MLR, and RFR, were used to develop estimation models for wheat AGB and LNC, respectively. Table 5 lists the optimal variables selected by the different variable-screening algorithms, and it can be found that the variables selected by the three methods differ significantly. For AGB estimation, GNDVI was the only commonly selected variable among the three screening algorithms, and the optimal variables were mainly selected from vegetation indices. Similar to the selection results of AGB estimation, the three variable-selection algorithms selected the optimal variables for estimating LNC, and the optimal variables were mainly obtained from the vegetation indices. In addition, WBI, MRENDVI and TCARI were the commonly selected variables in the three screening algorithms.

3.4. Validation of Models

Based on the variable-screening results, the optimal variables are selected as input parameters, and the optimal estimation models for wheat AGB and LNC were constructed by combining GBDT, MLR, and RFR regression algorithms (Table 6). Figure 5 and Table 6 show the nine wheat AGB estimation models based on three variable-selection algorithms and three regression algorithms. The wheat LNC estimation model with different variable-selection results combined with three regression algorithms is shown in Figure 6 and Table 6. The highest training accuracy of the wheat AGB estimation model was obtained by RF-RFR, with an R² of 0.79, RMSE of 1.67 t/hm², and NRMSE of 20.19. The best accuracy of the wheat LNC estimation model was obtained by CARS-RFR, with an R² of 0.95, RMSE of 2.32 mg/g and NRMSE of 6.75%. The validation results had the same performance as the training (Table 6). The results showed that the RFR regression algorithm had a better ability to construct the estimation models of wheat AGB and LNC, and after filtering by the variable-screening algorithms, it could effectively prevent redundant information from being input into the models, thus reducing the occurrence of overfitting in the models.

4. Discussion

4.1. Comparison of Different Variable-Screening Algorithms

Three variable-screening algorithms, CARS, IRIVs and RF, were used to screen the input parameters for constructing wheat AGB and LNC estimation models, respectively. Screening variables is an important step in statistical analysis and data modeling to identify the eigenvalues that have the greatest impact on the target variable [73]. Variable-screening algorithms should be selected based on target variables and data characteristics [74]. For example, Wang’s results show that the RF variable-screening algorithm exhibits good performance in processing wheat SPAD data [30]. The experimental results of Li et al. indicated that the estimation accuracy in the wheat yield prediction model could be improved by using the LASSO variable-selection algorithm [75].

In this study, the variables screened using the RF algorithm were used to construct the AGB estimation model with the best accuracy, while for the LNC estimation model, it was the CARS algorithm that gave the best results. This may be due to the fact that AGB and LNC change differently at different growth stages. AGB increases all the time during the whole fertility of wheat, whereas LNC increases during the nutritive growth stage and starts decreasing at the reproductive growth stage. Therefore, it is necessary to select appropriate variable-selection algorithms based on data characteristics when selecting model input variables. After variable selection, the development of model regression algorithms can achieve the goal of reducing redundant variables and improving the accuracy and robustness of agricultural trait estimation models.

4.2. Impact of Different Combinations of Algorithms on Estimation Models

This study involved three wheat varieties, two planting densities, four nitrogen applications, three fertility periods, and ninety-six plots to estimate AGB and LNC, which resulted in a complex relationship between the two growth parameters of wheat (AGB and LNC) and the color and vegetation indices. Therefore, this study investigated the feasibility of a model development strategy combining multiple variable-selection algorithms and multiple regression algorithms to estimate wheat AGB and LNC. According to the results in Table 5, the input parameters selected by the variable-screening algorithm for both the AGB dataset and the LNC dataset were mainly vegetation indices, while spectral indices were selected less, which is consistent with the results of the correlation analysis. This is due to the fact that hyperspectral cameras acquire a greater number of bands and fuller spectral information. Therefore, the type of sensor is also important for the estimation of crop growth parameters.

Nine AGB estimation models and nine LNC estimation models based on different combinations of algorithms were developed and compared, respectively (Figure 7). As can be seen in Figure 7, for the same variable-selection algorithm, there are differences in the accuracy of the estimated models obtained by combining different regression algorithms. Among the nine AGB estimation models, the estimation model developed by RF-RFR has the highest accuracy. Among the nine LNC estimation models, the CARS-RFR model had the best estimation accuracy. The results of this study are similar to previous studies. By comparing the effects of four machine learning regression algorithms on maize AGB estimation models, Han et al. found that the AGB estimation model constructed using the RFR algorithm had the highest prediction accuracy [76]. Using UAV hyperspectral imagery and combining three machine learning algorithms to estimate rice LNC, Wang et al. found that the accuracy of LNC estimation models constructed using the three machine learning algorithms also differed [77]. In addition, the redundant information input into the model is also reduced to some extent by the variable screening, which also improves the efficiency of the model operation and reduces the possibility of overfitting in the constructed model.

In addition, deep learning algorithms were not used in the evaluation of wheat LNC and AGB in this study, mainly due to two reasons [78,79]: firstly, deep learning is usually suitable for processing large-scale data and complex problems; secondly, deep learning algorithms typically require a large amount of data to construct models. If the dataset is relatively small, the deep learning model may overfit, while traditional machine learning algorithms are usually more robust for models constructed from small data samples. Therefore, deep learning algorithms were not applied to estimate wheat agronomic traits in this study.

4.3. Limitations and Future Research

This study compared the differences between different variable-screening algorithms and regression algorithms in constructing different growth indicators of wheat. A preliminary study proposed monitoring methods for wheat AGB and LNC, but there are still some limitations. It was found that as the reproductive process progressed, chlorophyll content decreased, differences in the appearance of wheat (mainly leaf color) diminished at different rates of N application, and as reproductive growth increased, spikelets appeared and the canopy structure changed. Methods for monitoring wheat growth based on a single type of image will be limited, and the accuracy of estimation will be reduced. Therefore, it is essential to monitor wheat growth using multi-source data-fusion techniques.

Hyperspectral data have attractive characteristics in crop monitoring, but hyperspectral data obtained based on proximal platforms are not suitable for large-scale crop growth monitoring. Satellite platforms have the advantages of being low-cost, continuous, and large-scale in acquiring spectral data [80]. Compared to proximal platforms (e.g., handheld devices or drone platforms), satellite images have lower spatial resolution and are greatly affected by weather factors such as clouds and precipitation, making it difficult to provide real-time monitoring data like drones do [7,81]. However, with the gradual maturity of space-borne hyperspectral technology, the availability of hyperspectral data based on satellite platforms for large-scale estimation of crop agronomic traits has been greatly improved.

With the development of science and technology and the continuously upgrading iterations of a variety of sensors, the cost of use continues to decline. The types of sensors used in agriculture are also gradually increasing, providing more information for crop growth monitoring, such as thermal infrared images, elevation, and other data. In future research, the selection of model input parameters and the combination of regression algorithms can be further evaluated by combining multi-source data types from multiple platforms (such as proximal platforms and satellite platforms) to improve the accuracy of crop growth parameter monitoring.

This study only used data from two growing seasons in one region, and future studies should consider collecting data from more regions to enrich the diversity of model input parameters. Therefore, the conclusions drawn in this study can be further evaluated in future studies.

5. Conclusions

In this study, wheat populations with different experimental designs (including planting densities, varieties, and nitrogen applications) were used as research subjects. Wheat canopy images were acquired using a UAV carrying RGB and hyperspectral cameras, and estimation models of wheat AGB and LNC were constructed by combining different variable-selection algorithms and regression algorithms. The results of this study indicate that the combination of different variable-screening algorithms and regression algorithms can effectively improve the accuracy of wheat AGB and LNC estimation models. Different combinations of screening and regression algorithms can be used to estimate targets more accurately for different crop growth indicators. The RF-RFR model was established by combining the RF variable-selection algorithm and the RFR regression algorithm to estimate wheat AGB with the highest accuracy. The CARS variable-selection algorithm and RFR regression algorithm were combined to build the CARS-RFR model to estimate wheat LNC with the highest accuracy. In summary, among the estimation models of AGB and LNC under different cultivation conditions, the RF estimation model performed best.

Author Contributions

Conceptualization, D.W., R.L. and W.G.; Formal analysis, D.W.; Investigation, R.L., T.L. and C.S.; Methodology, D.W.; Writing—original draft, D.W. and R.L.; Writing—review and editing, D.W. and C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (31701355, 31872852 and 32172111); the National Key Research and Development Program of China (2018YFD0300805); Suzhou Science and Technology Plan Project (SNG2021030, 2023SS13); and the Postgraduate Research and Practice Innovation Program of Jiangsu Province (XKYCX19_105).

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors wish to thank all those who helped in this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yin, Q.; Zhang, Y.; Li, W.; Wang, J.; Wang, W.; Ahmad, I.; Zhou, G.; Huo, Z. Estimation of Winter Wheat SPAD Values Based on UAV Multispectral Remote Sensing. Remote Sens. 2023, 15, 3595. [Google Scholar] [CrossRef]
Sabanci, K.; Aslan, M.F.; Ropelewska, E.; Unlersen, M.F.; Durdu, A. A Novel Convolutional-Recurrent Hybrid Network for Sunn Pest–Damaged Wheat Grain Detection. Food Anal. Methods 2022, 15, 1748–1760. [Google Scholar] [CrossRef]
Zhu, J.; Yang, G.; Feng, X.; Li, X.; Fang, H.; Zhang, J.; Bai, X.; Tao, M.; He, Y. Detecting Wheat Heads from UAV Low-Altitude Remote Sensing Images Using Deep Learning Based on Transformer. Remote Sens. 2022, 14, 5141. [Google Scholar] [CrossRef]
Li, X.; Hu, C.; Delgado, J.A.; Zhang, Y.; Ouyang, Z. Increased Nitrogen Use Efficiencies as a Key Mitigation Alternative to Reduce Nitrate Leaching in North China Plain. Agric. Water Manag. 2007, 89, 137–147. [Google Scholar] [CrossRef]
Hao, T.; Zhu, Q.; Zeng, M.; Shen, J.; Shi, X.; Liu, X.; Zhang, F.; de Vries, W. Impacts of Nitrogen Fertilizer Type and Application Rate on Soil Acidification Rate under a Wheat-Maize Double Cropping System. J. Environ. Manag. 2020, 270, 110888. [Google Scholar] [CrossRef] [PubMed]
Shu, M.; Shen, M.; Dong, Q.; Yang, X.; Li, B.; Ma, Y. Estimating the Maize Above-Ground Biomass by Constructing the Tridimensional Concept Model Based on UAV-Based Digital and Multi-Spectral Images. Field Crops Res. 2022, 282, 108491. [Google Scholar] [CrossRef]
Maimaitijiang, M.; Sagan, V.; Sidike, P.; Hartling, S.; Esposito, F.; Fritschi, F.B. Soybean Yield Prediction from UAV Using Multimodal Data Fusion and Deep Learning. Remote Sens. Environ. 2020, 237, 111599. [Google Scholar] [CrossRef]
Zaman-Allah, M.; Vergara, O.; Araus, J.L.; Tarekegne, A.; Magorokosho, C.; Zarco-Tejada, P.J.; Hornero, A.; Albà, A.H.; Das, B.; Craufurd, P.; et al. Unmanned Aerial Platform-Based Multi-Spectral Imaging for Field Phenotyping of Maize. Plant Methods 2015, 11, 35. [Google Scholar] [CrossRef]
Aasen, H.; Honkavaara, E.; Lucieer, A.; Zarco-Tejada, P.J. Quantitative Remote Sensing at Ultra-High Resolution with UAV Spectroscopy: A Review of Sensor Technology, Measurement Procedures, and Data Correction Workflows. Remote Sens. 2018, 10, 1091. [Google Scholar] [CrossRef]
Mancini, F.; Dubbini, M.; Gattelli, M.; Stecchi, F.; Fabbri, S.; Gabbianelli, G. Using Unmanned Aerial Vehicles (UAV) for High-Resolution Reconstruction of Topography: The Structure from Motion Approach on Coastal Environments. Remote Sens. 2013, 5, 6880–6898. [Google Scholar] [CrossRef]
Sagan, V.; Maimaitijiang, M.; Sidike, P.; Maimaitiyiming, M.; Erkbol, H.; Hartling, S.; Peterson, K.T.; Peterson, J.; Burken, J.; Fritschi, F. UAV/Satellite Multiscale Data Fusion for Crop Monitoring and Early Stress Detection. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2019, XLII-2-W13, 715–722. [Google Scholar] [CrossRef]
Guo, Y.; Wang, H.; Wu, Z.; Wang, S.; Sun, H.; Senthilnath, J.; Wang, J.; Robin Bryant, C.; Fu, Y. Modified Red Blue Vegetation Index for Chlorophyll Estimation and Yield Prediction of Maize from Visible Images Captured by UAV. Sensors 2020, 20, 5055. [Google Scholar] [CrossRef] [PubMed]
Nex, F.; Remondino, F. UAV for 3D Mapping Applications: A Review. Appl. Geomat. 2014, 6, 1–15. [Google Scholar] [CrossRef]
Li, R.; Wang, D.; Zhu, B.; Liu, T.; Sun, C.; Zhang, Z. Estimation of Nitrogen Content in Wheat Using Indices Derived from RGB and Thermal Infrared Imaging. Field Crops Res. 2022, 289, 108735. [Google Scholar] [CrossRef]
Bendig, J.; Bolten, A.; Bennertz, S.; Broscheit, J.; Eichfuss, S.; Bareth, G. Estimating Biomass of Barley Using Crop Surface Models (CSMs) Derived from UAV-Based RGB Imaging. Remote Sens. 2014, 6, 10395–10412. [Google Scholar] [CrossRef]
Bendig, J.; Yu, K.; Aasen, H.; Bolten, A.; Bennertz, S.; Broscheit, J.; Gnyp, M.L.; Bareth, G. Combining UAV-Based Plant Height from Crop Surface Models, Visible, and near Infrared Vegetation Indices for Biomass Monitoring in Barley. Int. J. Appl. Earth Obs. Geoinf. 2015, 39, 79–87. [Google Scholar] [CrossRef]
Willkomm, M.; Bolten, A.; Bareth, G. Non-Destructive Monitoring of Rice by Hyperspectral in-Field Spectrometry and Uav-Based Remote Sensing: Case Study of Field-Grown Rice in North Rhine-Westphalia, Germany. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, XLI-B1, 1071–1077. [Google Scholar] [CrossRef]
Inoue, Y.; Sakaiya, E.; Zhu, Y.; Takahashi, W. Diagnostic Mapping of Canopy Nitrogen Content in Rice Based on Hyperspectral Measurements. Remote Sens. Environ. 2012, 126, 210–221. [Google Scholar] [CrossRef]
Maresma, Á.; Ariza, M.; Martínez, E.; Lloveras, J.; Martínez-Casasnovas, J.A. Analysis of Vegetation Indices to Determine Nitrogen Application and Yield Prediction in Maize (Zea Mays L.) from a Standard UAV Service. Remote Sens. 2016, 8, 973. [Google Scholar] [CrossRef]
Wu, C.; Niu, Z.; Tang, Q.; Huang, W. Estimating Chlorophyll Content from Hyperspectral Vegetation Indices: Modeling and Validation. Agric. For. Meteorol. 2008, 148, 1230–1241. [Google Scholar] [CrossRef]
Hunt, E.R.; Doraiswamy, P.C.; McMurtrey, J.E.; Daughtry, C.S.T.; Perry, E.M.; Akhmedov, B. A Visible Band Index for Remote Sensing Leaf Chlorophyll Content at the Canopy Scale. Int. J. Appl. Earth Obs. Geoinf. 2013, 21, 103–112. [Google Scholar] [CrossRef]
Hunt, E.R.; Cavigelli, M.; Daughtry, C.S.T.; Mcmurtrey, J.E.; Walthall, C.L. Evaluation of Digital Photography from Model Aircraft for Remote Sensing of Crop Biomass and Nitrogen Status. Precision Agric. 2005, 6, 359–378. [Google Scholar] [CrossRef]
Schirrmann, M.; Giebel, A.; Gleiniger, F.; Pflanz, M.; Lentschke, J.; Dammer, K.-H. Monitoring Agronomic Parameters of Winter Wheat Crops with Low-Cost UAV Imagery. Remote Sens. 2016, 8, 706. [Google Scholar] [CrossRef]
Cheng, T.; Song, R.; Li, D.; Zhou, K.; Zheng, H.; Yao, X.; Tian, Y.; Cao, W.; Zhu, Y. Spectroscopic Estimation of Biomass in Canopy Components of Paddy Rice Using Dry Matter and Chlorophyll Indices. Remote Sens. 2017, 9, 319. [Google Scholar] [CrossRef]
Gnyp, M.L.; Miao, Y.; Yuan, F.; Ustin, S.L.; Yu, K.; Yao, Y.; Huang, S.; Bareth, G. Hyperspectral Canopy Sensing of Paddy Rice Aboveground Biomass at Different Growth Stages. Field Crops Res. 2014, 155, 42–55. [Google Scholar] [CrossRef]
Qi, J.; Chehbouni, A.; Huete, A.R.; Kerr, Y.H.; Sorooshian, S. A Modified Soil Adjusted Vegetation Index. Remote Sens. Environ. 1994, 48, 119–126. [Google Scholar] [CrossRef]
Weiss, M.; Jacob, F.; Duveiller, G. Remote Sensing for Agricultural Applications: A Meta-Review. Remote Sens. Environ. 2020, 236, 111402. [Google Scholar] [CrossRef]
Yue, J.; Yang, G.; Tian, Q.; Feng, H.; Xu, K.; Zhou, C. Estimate of Winter-Wheat above-Ground Biomass Based on UAV Ultrahigh-Ground-Resolution Image Textures and Vegetation Indices. ISPRS J. Photogramm. Remote Sens. 2019, 150, 226–244. [Google Scholar] [CrossRef]
Zha, H.; Miao, Y.; Wang, T.; Li, Y.; Zhang, J.; Sun, W.; Feng, Z.; Kusnierek, K. Improving Unmanned Aerial Vehicle Remote Sensing-Based Rice Nitrogen Nutrition Index Prediction with Machine Learning. Remote Sens. 2020, 12, 215. [Google Scholar] [CrossRef]
Wang, J.; Zhou, Q.; Shang, J.; Liu, C.; Zhuang, T.; Ding, J.; Xian, Y.; Zhao, L.; Wang, W.; Zhou, G.; et al. UAV- and Machine Learning-Based Retrieval of Wheat SPAD Values at the Overwintering Stage for Variety Screening. Remote Sens. 2021, 13, 5166. [Google Scholar] [CrossRef]
Chlingaryan, A.; Sukkarieh, S.; Whelan, B. Machine Learning Approaches for Crop Yield Prediction and Nitrogen Status Estimation in Precision Agriculture: A Review. Comput. Electron. Agric. 2018, 151, 61–69. [Google Scholar] [CrossRef]
Yao, X.; Huang, Y.; Shang, G.; Zhou, C.; Cheng, T.; Tian, Y.; Cao, W.; Zhu, Y. Evaluation of Six Algorithms to Monitor Wheat Leaf Nitrogen Concentration. Remote Sens. 2015, 7, 14939–14966. [Google Scholar] [CrossRef]
Verrelst, J.; Muñoz, J.; Alonso, L.; Delegido, J.; Rivera, J.P.; Camps-Valls, G.; Moreno, J. Machine Learning Regression Algorithms for Biophysical Parameter Retrieval: Opportunities for Sentinel-2 and -3. Remote Sens. Environ. 2012, 118, 127–139. [Google Scholar] [CrossRef]
Zheng, H.; Li, W.; Jiang, J.; Liu, Y.; Cheng, T.; Tian, Y.; Zhu, Y.; Cao, W.; Zhang, Y.; Yao, X. A Comparative Assessment of Different Modeling Algorithms for Estimating Leaf Nitrogen Content in Winter Wheat Using Multispectral Images from an Unmanned Aerial Vehicle. Remote Sens. 2018, 10, 2026. [Google Scholar] [CrossRef]
Bolón-Canedo, V.; Sánchez-Maroño, N.; Alonso-Betanzos, A. Feature Selection for High-Dimensional Data. In Artificial Intelligence: Foundations, Theory, and Algorithms; Springer International Publishing: Cham, Switzerland, 2015; ISBN 978-3-319-21857-1. [Google Scholar]
Bradstreet, R.B. Kjeldahl Method for Organic Nitrogen. Anal. Chem. 1954, 26, 185–187. [Google Scholar] [CrossRef]
Gitelson, A.A.; Stark, R.; Grits, U.; Rundquist, D.; Kaufman, Y.; Derry, D. Vegetation and Soil Lines in Visible Spectral Space: A Concept and Technique for Remote Estimation of Vegetation Fraction. Int. J. Remote Sens. 2002, 23, 2537–2562. [Google Scholar] [CrossRef]
Jia, L.; Buerkert, A.; Chen, X.; Roemheld, V.; Zhang, F. Low-Altitude Aerial Photography for Optimum N Fertilization of Winter Wheat on the North China Plain. Field Crops Res. 2004, 89, 389–395. [Google Scholar] [CrossRef]
Guijarro, M.; Pajares, G.; Riomoros, I.; Herrera, P.J.; Burgos-Artizzu, X.P.; Ribeiro, A. Automatic Segmentation of Relevant Textures in Agricultural Images. Comput. Electron. Agric. 2011, 75, 75–83. [Google Scholar] [CrossRef]
Riehle, D.; Reiser, D.; Griepentrog, H.W. Robust Index-Based Semantic Plant/Background Segmentation for RGB- Images. Comput. Electron. Agric. 2020, 169, 105201. [Google Scholar] [CrossRef]
Rasmussen, J.; Ntakos, G.; Nielsen, J.; Svensgaard, J.; Poulsen, R.N.; Christensen, S. Are Vegetation Indices Derived from Consumer-Grade Cameras Mounted on UAVs Sufficiently Reliable for Assessing Experimental Plots? Eur. J. Agron. 2016, 74, 75–92. [Google Scholar] [CrossRef]
Wang, Y.; Wang, D.; Zhang, G.; Wang, J. Estimating Nitrogen Status of Rice Using the Image Segmentation of G-R Thresholding Method. Field Crops Res. 2013, 149, 33–39. [Google Scholar] [CrossRef]
Baret, F.; Guyot, G. Potentials and Limits of Vegetation Indices for LAI and APAR Assessment. Remote Sens. Environ. 1991, 35, 161–173. [Google Scholar] [CrossRef]
Gitelson, A.A.; Kaufman, Y.J.; Merzlyak, M.N. Use of a Green Channel in Remote Sensing of Global Vegetation from EOS-MODIS. Remote Sens. Environ. 1996, 58, 289–298. [Google Scholar] [CrossRef]
Tucker, C.J.; Elgin, J.H.; McMurtrey, J.E.; Fan, C.J. Monitoring Corn and Soybean Crop Development with Hand-Held Radiometer Spectral Data. Remote Sens. Environ. 1979, 8, 237–248. [Google Scholar] [CrossRef]
Haboudane, D.; Miller, J.R.; Pattey, E.; Zarco-Tejada, P.J.; Strachan, I.B. Hyperspectral Vegetation Indices and Novel Algorithms for Predicting Green LAI of Crop Canopies: Modeling and Validation in the Context of Precision Agriculture. Remote Sens. Environ. 2004, 90, 337–352. [Google Scholar] [CrossRef]
Ceccato, P.; Gobron, N.; Flasse, S.; Pinty, B.; Tarantola, S. Designing a Spectral Index to Estimate Vegetation Water Content from Remote Sensing Data: Part 1: Theoretical Approach. Remote Sens. Environ. 2002, 82, 188–197. [Google Scholar] [CrossRef]
Peñuelas, J.; Gamon, J.A.; Fredeen, A.L.; Merino, J.; Field, C.B. Reflectance Indices Associated with Physiological Changes in Nitrogen- and Water-Limited Sunflower Leaves. Remote Sens. Environ. 1994, 48, 135–146. [Google Scholar] [CrossRef]
Fernández-Manso, A.; Fernández-Manso, O.; Quintano, C. SENTINEL-2A Red-Edge Spectral Indices Suitability for Discriminating Burn Severity. Int. J. Appl. Earth Obs. Geoinf. 2016, 50, 170–175. [Google Scholar] [CrossRef]
Behmann, J.; Steinrücken, J.; Plümer, L. Detection of Early Plant Stress Responses in Hyperspectral Images. ISPRS J. Photogramm. Remote Sens. 2014, 93, 98–111. [Google Scholar] [CrossRef]
Gitelson, A.; Merzlyak, M.N. Quantitative Estimation of Chlorophyll-a Using Reflectance Spectra: Experiments with Autumn Chestnut and Maple Leaves. J. Photochem. Photobiol. B Biol. 1994, 22, 247–252. [Google Scholar] [CrossRef]
Gupta, R.K.; Vijayan, D.; Prasad, T.S. Comparative Analysis of Red-Edge Hyperspectral Indices. Adv. Space Res. 2003, 32, 2217–2222. [Google Scholar] [CrossRef]
Ahmad, S.; Chandra Pandey, A.; Kumar, A.; Parida, B.R.; Lele, N.V.; Bhattacharya, B.K. Chlorophyll Deficiency (Chlorosis) Detection Based on Spectral Shift and Yellowness Index Using Hyperspectral AVIRIS-NG Data in Sholayar Reserve Forest, Kerala. Remote Sens. Appl. Soc. Environ. 2020, 19, 100369. [Google Scholar] [CrossRef]
Metternicht, G. Vegetation Indices Derived from High-Resolution Airborne Videography for Precision Crop Management. Int. J. Remote Sens. 2003, 24, 2855–2877. [Google Scholar] [CrossRef]
Sims, D.A.; Gamon, J.A. Relationships between Leaf Pigment Content and Spectral Reflectance across a Wide Range of Species, Leaf Structures and Developmental Stages. Remote Sens. Environ. 2002, 81, 337–354. [Google Scholar] [CrossRef]
Hunt, E.R.; Daughtry, C.S.T.; Eitel, J.U.H.; Long, D.S. Remote Sensing Leaf Chlorophyll Content Using a Visible Band Index. Agron. J. 2011, 103, 1090–1099. [Google Scholar] [CrossRef]
Herrmann, I.; Karnieli, A.; Bonfil, D.J.; Cohen, Y.; Alchanatis, V. SWIR-Based Spectral Indices for Assessing Nitrogen Content in Potato Fields. Int. J. Remote Sens. 2010, 31, 5127–5143. [Google Scholar] [CrossRef]
Main, R.; Cho, M.A.; Mathieu, R.; O’Kennedy, M.M.; Ramoelo, A.; Koch, S. An Investigation into Robust Spectral Indices for Leaf Chlorophyll Estimation. ISPRS J. Photogramm. Remote Sens. 2011, 66, 751–761. [Google Scholar] [CrossRef]
Shu, M.; Shen, M.; Zuo, J.; Yin, P.; Wang, M.; Xie, Z.; Tang, J.; Wang, R.; Li, B.; Yang, X.; et al. The Application of UAV-Based Hyperspectral Imaging to Estimate Crop Traits in Maize Inbred Lines. Plant Phenomics 2021, 2021, 9890745. [Google Scholar] [CrossRef]
Xing, Z.; Du, C.; Shen, Y.; Ma, F.; Zhou, J. A Method Combining FTIR-ATR and Raman Spectroscopy to Determine Soil Organic Matter: Improvement of Prediction Accuracy Using Competitive Adaptive Reweighted Sampling (CARS). Comput. Electron. Agric. 2021, 191, 106549. [Google Scholar] [CrossRef]
Li, H.-D.; Xu, Q.-S.; Liang, Y.-Z. libPLS: An Integrated Library for Partial Least Squares Regression and Linear Discriminant Analysis. Chemom. Intell. Lab. Syst. 2018, 176, 34–43. [Google Scholar] [CrossRef]
Zhang, H.; Wang, H.; Dai, Z.; Chen, M.; Yuan, Z. Improving Accuracy for Cancer Classification with a New Algorithm for Genes Selection. BMC Bioinform. 2012, 13, 298. [Google Scholar] [CrossRef] [PubMed]
Yun, Y.-H.; Bin, J.; Liu, D.-L.; Xu, L.; Yan, T.-L.; Cao, D.-S.; Xu, Q.-S. A Hybrid Variable Selection Strategy Based on Continuous Shrinkage of Variable Space in Multivariate Calibration. Anal. Chim. Acta 2019, 1058, 58–69. [Google Scholar] [CrossRef] [PubMed]
Yun, Y.-H.; Wang, W.-T.; Tan, M.-L.; Liang, Y.-Z.; Li, H.-D.; Cao, D.-S.; Lu, H.-M.; Xu, Q.-S. A Strategy That Iteratively Retains Informative Variables for Selecting Optimal Variable Subset in Multivariate Calibration. Anal. Chim. Acta 2014, 807, 36–43. [Google Scholar] [CrossRef] [PubMed]
Wei, L.; Yuan, Z.; Yu, M.; Huang, C.; Cao, L. Estimation of Arsenic Content in Soil Based on Laboratory and Field Reflectance Spectroscopy. Sensors 2019, 19, 3904. [Google Scholar] [CrossRef]
Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
Sun, J.; Yang, J.; Shi, S.; Chen, B.; Du, L.; Gong, W.; Song, S. Estimating Rice Leaf Nitrogen Concentration: Influence of Regression Algorithms Based on Passive and Active Leaf Reflectance. Remote Sens. 2017, 9, 951. [Google Scholar] [CrossRef]
Cen, H.; Wan, L.; Zhu, J.; Li, Y.; Li, X.; Zhu, Y.; Weng, H.; Wu, W.; Yin, W.; Xu, C.; et al. Dynamic Monitoring of Biomass of Rice under Different Nitrogen Treatments Using a Lightweight UAV with Dual Image-Frame Snapshot Cameras. Plant Methods 2019, 15, 32. [Google Scholar] [CrossRef]
Yang, L.; Zhang, X.; Liang, S.; Yao, Y.; Jia, K.; Jia, A. Estimating Surface Downward Shortwave Radiation over China Based on the Gradient Boosting Decision Tree Method. Remote Sens. 2018, 10, 185. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Statist. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Wei, Z.; Meng, Y.; Zhang, W.; Peng, J.; Meng, L. Downscaling SMAP Soil Moisture Estimation with Gradient Boosting Decision Tree Regression over the Tibetan Plateau. Remote Sens. Environ. 2019, 225, 30–44. [Google Scholar] [CrossRef]
Wang, L.; Zhou, X.; Zhu, X.; Dong, Z.; Guo, W. Estimation of Biomass in Wheat Using Random Forest Regression Algorithm and Remote Sensing Data. Crop J. 2016, 4, 212–219. [Google Scholar] [CrossRef]
Tran, T.N.; Afanador, N.L.; Buydens, L.M.C.; Blanchet, L. Interpretation of Variable Importance in Partial Least Squares with Significance Multivariate Correlation (sMC). Chemom. Intell. Lab. Syst. 2014, 138, 153–160. [Google Scholar] [CrossRef]
Chandrashekar, G.; Sahin, F. A Survey on Feature Selection Methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
Li, R.; Wang, D.; Zhu, B.; Liu, T.; Sun, C.; Zhang, Z. Estimation of Grain Yield in Wheat Using Source–Sink Datasets Derived from RGB and Thermal Infrared Imaging. Food Energy Secur. 2023, 12, e434. [Google Scholar] [CrossRef]
Han, L.; Yang, G.; Dai, H.; Xu, B.; Yang, H.; Feng, H.; Li, Z.; Yang, X. Modeling Maize Above-Ground Biomass Based on Machine Learning Approaches Using UAV Remote-Sensing Data. Plant Methods 2019, 15, 10. [Google Scholar] [CrossRef]
Wang, L.; Chen, S.; Li, D.; Wang, C.; Jiang, H.; Zheng, Q.; Peng, Z. Estimation of Paddy Rice Nitrogen Content and Accumulation Both at Leaf and Plant Levels from UAV Hyperspectral Imagery. Remote Sens. 2021, 13, 2956. [Google Scholar] [CrossRef]
Yang, B.; Ma, J.; Yao, X.; Cao, W.; Zhu, Y. Estimation of Leaf Nitrogen Content in Wheat Based on Fusion of Spectral Features and Deep Features from Near Infrared Hyperspectral Imagery. Sensors 2021, 21, 613. [Google Scholar] [CrossRef] [PubMed]
Fu, Y.; Yang, G.; Pu, R.; Li, Z.; Li, H.; Xu, X.; Song, X.; Yang, X.; Zhao, C. An Overview of Crop Nitrogen Status Assessment Using Hyperspectral Remote Sensing: Current Status and Perspectives. Eur. J. Agron. 2021, 124, 126241. [Google Scholar] [CrossRef]
Lu, J.; Eitel, J.U.H.; Engels, M.; Zhu, J.; Ma, Y.; Liao, F.; Zheng, H.; Wang, X.; Yao, X.; Cheng, T.; et al. Improving Unmanned Aerial Vehicle (UAV) Remote Sensing of Rice Plant Potassium Accumulation by Fusing Spectral and Textural Information. Int. J. Appl. Earth Obs. Geoinf. 2021, 104, 102592. [Google Scholar] [CrossRef]
Fu, Y.; Yang, G.; Song, X.; Li, Z.; Xu, X.; Feng, H.; Zhao, C. Improved Estimation of Winter Wheat Aboveground Biomass Using Multiscale Textures Extracted from UAV-Based Digital Images and Hyperspectral Feature Analysis. Remote Sens. 2021, 13, 581. [Google Scholar] [CrossRef]

Figure 1. Map of study area and treatments.

Figure 2. Processing workflow for UAV-based images. (GCP: ground control point).

Figure 3. Methodology workflow in this study.

Figure 4. Pearson correlation analysis between wheat agronomic traits (AGB and LNC) and remote sensing indices ((a): color indices, (b): vegetation indices).

Figure 5. The scatterplot between the measured and predicted values of wheat AGB was estimated by a combination of different variable-screening algorithms and machine learning algorithms. The diagonal line represents a 1:1 relationship. (a) CARS-GBDT; (b) CARS-MLR; (c) CARS-RFR; (d) IRIVs-GBDT; (e) IRIVs-MLR; (f) IRIVs-RFR; (g) RF-GBDT; (h) RF-MLR; (i) RF-RFR.

Figure 6. The scatterplot between the measured and predicted values of wheat LNC was estimated by a combination of different variable-screening algorithms and machine learning algorithms. The diagonal line represents a 1:1 relationship. (a) CARS-GBDT; (b) CARS-MLR; (c) CARS-RFR; (d) IRIVs-GBDT; (e) IRIVs-MLR; (f) IRIVs-RFR; (g) RF-GBDT; (h) RF-MLR; (i) RF-RFR.

Figure 7. Estimation of model accuracy of AGB (a) and LNC (b) by different combination algorithms.

Table 1. The flight details during the sampling periods in 2020 and 2021. LNC and AGB represent the leaf nitrogen content and aboveground biomass.

Experiment	Dates	Stage	Field Measurements
Exp.1	17 March 2020	Jointing	LNC, AGB
	02 April 2020	Booting	LNC, AGB
	25 April 2020	Flowering	LNC, AGB
Exp.2	24 March 2021	Jointing	LNC, AGB
	09 April 2021	Booting	LNC, AGB
	29 April 2021	Flowering	LNC, AGB

Table 2. Color indices and calculation method.

Color Indices	Calculation	Reference
Visible Atmospherically Resistance Index	$V A R I = (g - r) / (g + r - b)$	[37]
Excess Red Index	$E x R = 1.4 \times r - g$	[38]
Excess Green Index	$E x G = 2 \times g - r - b$	[38]
Green Leaf Index	$G L I = (2 \times g - r - b) / (2 \times g + r + b)$	[39]
Excess Green Minus Excess Red	$E x G R = 3 \times g - 2.4 \times r - b$	[40]
Normalized Green-Red Difference Index	$N G R D I = (g - r) / (g + r)$	[41]
Normalized Green-Blue Difference Index	$N G B D I = (g - b) / (g + b)$	[41]
Modified Green-Blue Vegetation Index	$M G R V I = (g^{2} - r^{2}) / (g^{2} + r^{2})$	[16]
Red-Green-Blue Vegetation Index	$R G B V I = (g^{2} - b r) / (g^{2} + b r)$	[16]
Red-Green Ratio Index	$R G R I = r / g$	[42]

Note: r, g, b calculation method is shown in the Equations (3)–(5).

Table 3. Vegetation index and calculation method.

Vegetation Indices	Calculation	Reference
Ratio Vegetation Index	$R V I = N I R / R$	[43]
Normalized Difference Vegetation Index	$N D V I = (N I R - R) / (N I R + R)$	[43]
Green Normalized Difference Vegetation Index	$G N D V I = (N I R - G) / (N I R + G)$	[44]
Difference Vegetation Index	$D V I = N I R - R$	[45]
Re-normalized Difference Vegetation index	$R D V I = (N I R - R) / \sqrt{(N I R + R)}$	[46]
Water Index	$W I = ρ_{970} / ρ_{900}$	[47]
Water Band Index	$W B I = ρ_{950} / ρ_{900}$	[48]
Modified Red-Edge Simple Ratio Index	$mRESR = (ρ_{750} - ρ_{445}) / (ρ_{705} - ρ_{445})$	[49]
Modified Red-Edge NDVI	$mRENDVI = (ρ_{750} - ρ_{705}) / (ρ_{750} + ρ_{705} - 2 {* ρ}_{445})$	[50]
Normalized Pigment Chlorophyll Index	$N P C I = (R - B) / (R + B)$	[48]
Red Edge NDVI	$R E N D V I = (ρ_{750} - ρ_{710}) / (ρ_{710} - ρ_{680})$	[51]
Relative Index	$R I = ρ_{735} / ρ_{720}$	[52]
Vogelmann Red Edge Index	$V R E I = ρ_{740} / ρ_{720}$	[53]
Atmospherically Resistant Vegetation Index	$A R V I = (N I R - 2 * R + B) / (N I R + 2 * R - B)$	[54]
Plant Senescence Reflectance Index	$P S R I = (ρ_{650} - ρ_{500}) / ρ_{750}$	[55]
Modified Chlorophyll Absorption in Reflectance Index	$M C A R I = [(R E - R) - 0.2 * (R E - G)] * (R E / R)$	[56]
Transformed Chlorophyll Absorption Ratio	$T C A R I = 3 * [(R E - R) - 0.2 * (R E - G) * (R E / R)]$	[57]
Optimized Soil Adjusted Vegetation Index	$O S A V I = 1.16 * (N I R - R) / (N I R + R + 0.16)$	[58]

Note: In the table, R, G, B, NIR, RE are the spectral reflectance of red band (650 nm), green band (560 nm), blue band (450 nm), near-infrared band (840 nm), and red edge (730 nm), respectively. The

ρ_{i}

indicates i band spectral reflectance.

Table 4. Descriptive statistics of AGB and LNC from calibration and validation datasets across wheat growth stages.

Stages	AGB (t/hm²)					LNC (mg/g)
Stages	Number	Range	Mean	SD	CV (%)	Number	Range	Mean	SD	CV (%)
	Calibration
Jointing stage	72	1.98–6.47	4.12	0.89	21.53	72	19.24–49.74	37.12	10.20	27.47
Booting stage	72	5.25–12.37	8.68	1.67	19.18	72	16.94–49.40	35.51	9.83	27.68
Flowering stage	72	8.22–16.59	12.06	2.05	16.98	72	15.01–42.74	30.30	8.45	27.89
	Validation
Jointing stage	24	2.06–6.31	4.05	1.29	31.83	24	19.35–47.74	36.70	10.14	27.63
Booting stage	24	5.48–11.31	8.59	1.77	20.59	24	18.68–46.51	35.52	9.62	27.10
Flowering stage	24	8.42–16.55	11.97	2.52	21.05	24	15.92–42.70	30.60	8.68	28.37

Note: SD means standard deviation. CV means the ratio of standard deviation to mean value.

Table 5. The results of the variables selected in each model.

Variable	CARS-AGB	IRIVs-AGB	RF-AGB	CARS-LNC	IRIVs-LNC	RF-LNC
R			√
G		√	√
B			√
r
g		√
b	√
VARI
ExR
ExG
GLI
ExGR				√
NGRDI
NGBDI	√
MGRVI
RGBVI
RGRI				√
RVI
NDVI		√
GNDVI	√	√	√	√		√
DVI	√	√		√	√
RDVI				√		√
WI		√	√
WBI			√	√	√	√
mRESR		√			√
mRENDVI				√	√	√
NPCI				√		√
RENDVI		√	√			√
RI	√		√	√	√
VREI				√
ARVI		√				√
PSRI		√	√		√	√
MCARI		√		√	√
TCARI		√	√	√	√	√
OSAVI	√					√

Note: √ means that this variable is selected in the model.

Table 6. Comparison of performance of models developed based on different algorithms for estimation of agronomic traits (AGB and LNC).

Agronomic Traits	Algorithms	Calibration			Validation
Agronomic Traits	Algorithms	R²	RMSE (t/hm²)	NRMSE (%)	R²	RMSE (mg/g)	NRMSE (%)
AGB	CARS-GBDT	0.61	2.26	27.22	0.52	1.89	23.03
	CARS-MLR	0.43	2.74	33.03	0.48	1.96	23.91
	CARS-RFR	0.65	2.16	26.01	0.58	1.76	21.47
	IRIVs-GBDT	0.66	2.10	25.32	0.49	1.95	23.74
	IRIVs-MLR	0.51	2.54	30.66	0.41	2.09	25.50
	IRIVs-RFR	0.75	1.80	21.75	0.67	1.56	19.06
	RF-GBDT	0.70	1.98	23.87	0.55	1.84	22.40
	RF-MLR	0.53	2.50	30.14	0.44	2.05	24.99
	RF-RFR	0.79	1.67	20.19	0.68	1.54	18.74
LNC	CARS-GBDT	0.94	2.50	7.28	0.81	3.51	10.25
	CARS-MLR	0.91	2.90	8.44	0.78	3.74	10.91
	CARS-RFR	0.95	2.32	6.75	0.87	2.88	8.42
	IRIVs-GBDT	0.93	2.65	7.73	0.80	3.57	10.40
	IRIVs-MLR	0.88	3.37	9.83	0.75	4.01	11.68
	IRIVs-RFR	0.92	2.87	8.37	0.83	3.34	9.75
	RF-GBDT	0.87	3.61	10.51	0.76	3.94	11.49
	RF-MLR	0.82	4.19	12.23	0.69	4.45	12.98
	RF-RFR	0.89	3.31	9.64	0.78	3.77	10.99

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, D.; Li, R.; Liu, T.; Sun, C.; Guo, W. Estimation of Agronomic Characters of Wheat Based on Variable Selection and Machine Learning Algorithms. Agronomy 2023, 13, 2808. https://doi.org/10.3390/agronomy13112808

AMA Style

Wang D, Li R, Liu T, Sun C, Guo W. Estimation of Agronomic Characters of Wheat Based on Variable Selection and Machine Learning Algorithms. Agronomy. 2023; 13(11):2808. https://doi.org/10.3390/agronomy13112808

Chicago/Turabian Style

Wang, Dunliang, Rui Li, Tao Liu, Chengming Sun, and Wenshan Guo. 2023. "Estimation of Agronomic Characters of Wheat Based on Variable Selection and Machine Learning Algorithms" Agronomy 13, no. 11: 2808. https://doi.org/10.3390/agronomy13112808

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu