*Article* **Winter Wheat Yield Prediction Using an LSTM Model from MODIS LAI Products**

**Jian Wang 1, Haiping Si 1, Zhao Gao <sup>2</sup> and Lei Shi 1,\***


**Abstract:** Yield estimation using remote sensing data is a research priority in modern agriculture. The rapid and accurate estimation of winter wheat yields over large areas is an important prerequisite for food security policy formulation and implementation. In most county-level yield estimation processes, multiple input data are used for yield prediction as much as possible, however, in some regions, data are more difficult to obtain, so we used the single-leaf area index (LAI) as input data for the model for yield prediction. In this study, the effects of different time steps as well as the LAI time series on the estimation results were analyzed for the properties of long short-term memory (LSTM), and multiple machine learning methods were compared with yield estimation models constructed by the LSTM networks. The results show that the accuracy of the yield estimation results using LSTM did not show an increasing trend with the increasing step size and data volume, while the yield estimation results of the LSTM were generally better than those of conventional machine learning methods, with the best R2 and RMSE results of 0.87 and 522.3 kg/ha, respectively, in the comparison between predicted and actual yields. Although the use of LAI as a single input factor may cause yield uncertainty in some extreme years, it is a reliable and promising method for improving the yield estimation, which has important implications for crop yield forecasting, agricultural disaster monitoring, food trade policy, and food security early warning.

**Keywords:** winter wheat; yield estimation; LSTM; LAI; deep learning

#### **1. Introduction**

Wheat is an important crop in China, and its yield is directly related to the development of the national economy. Timely, accurate, and wide-ranging monitoring and forecasting of wheat yields is of great practical significance for national economic development and food policy formulation [1,2]. Due to its large coverage area and short detection period, satellite remote sensing provides a new technical tool for large-scale crop estimation and is rapidly becoming the most widely used technology in crop estimation.

At present, the methods of crop yield estimation using remote sensing technology can be broadly classified into three categories according to the characteristics of the models used: (1) the empirical modeling method; (2) the mechanistic modeling method; and (3) the semiempirical (semi-mechanistic) modeling method. The empirical model directly uses spectral vegetation indices or canopy remote sensing inversion parameters to establish relationships with crop yields, which are characterized by their simplicity and ease, involving fewer crop yield formation mechanisms, and the relationships are generally established using conventional machine learning methods, such as support vector machine and random forest, with NDVI or leaf area index (LAI) as input parameters [3–8]. Such relationships are usually localized and difficult to generalize to other agricultural areas. Semi-empirical models and semi-mechanical models are also known as parametric models, among which the light energy utilization efficiency model is the most widely used [6,9–11], but some

**Citation:** Wang, J.; Si, H.; Gao, Z.; Shi, L. Winter Wheat Yield Prediction Using an LSTM Model from MODIS LAI Products. *Agriculture* **2022**, *12*, 1707. https://doi.org/10.3390/ agriculture12101707

Academic Editor: Koki Homma

Received: 25 August 2022 Accepted: 12 October 2022 Published: 17 October 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

237

parameters are difficult to quantitatively simulate. Mechanical models fully consider the mechanism of crop yield formation, but their solution process is complex and requires more input parameters, and some necessary parameters in the operation are difficult to obtain at a regional scale; meanwhile, most of these models estimate crop yields at the field scale [12–15]. Although the models work well for archiving yields, the accuracy may be reduced when scaled to the national level. Many a priori parameters are required in regional estimation. Due to the heterogeneity of the ground surface, the accuracy of the ground parameters is generally low, especially in the case of small farmland in China, resulting in low regional accuracy. Moreover, the computational process is complicated and requires many parameters, which can be limited in practical use.

In recent years, deep learning has been successfully applied to several fields, such as image recognition and language translation [11,16–22]. Compared with traditional machine learning methods, deep learning techniques often achieve better performance. CNN and recurrent neural network (RNN) are more widely used models in neural networks and have also been applied to crop yield estimation and prediction [19,23–29]. LSTM is a special kind of RNN [30,31], due to its recursive structure and gating mechanism that regulates the entry and exit of information into and out of cells, as well as its processing of sequential data. The LSTM has feedback connections and can handle the input sequences of arbitrary length and is often preferred in the classification, processing, and prediction based on time series data. Several studies used LSTM for crop yield prediction with impressive results. LSTM not only captures trends in the data but also describes the dependencies of the time series data. Tian et al. built an LSTM model by integrating the meteorological data and two remote sensing indices (vegetation temperature condition index (VTCI) and LAI) to estimate wheat yield in Guanzhong Plain [32]. Jeong et al. used water-related indices and the maximum temperature as inputs for rice yield prediction using an LSTM model, which showed reliable early prediction accuracy [16]. Sun et al. used a CNN-LSTM model to predict the end-of-season and in-season yields of soybean in the county. The input data for the model included meteorological data and MODIS surface temperature (LST) [24]. The LSTM model was shown to have high prediction accuracy for crop yield estimation, but all of the above methods for estimating crop yield use multiple data as input parameters, and it is difficult to obtain non-remote sensing data in some regions [33–35]. There are a large number of mature remote sensing products for LAI data, so this study mainly considered using LAI remote sensing products as single model input data and what kind of accuracy could be achieved when using deep learning algorithms for yield estimation.

In this study, LSTM was used to estimate the winter wheat yield at the county scale based on the relationship between time series LAI products and winter wheat yield. Considering the simplicity of obtaining LAI data, the model input parameters were only the leaf area index to verify the accuracy that could be achieved under the influence of a single factor. Moreover, the time step of the input remote sensing data of the model was considered so as to determine the accuracy in different time data.

#### **2. Materials and Methods**

#### *2.1. Study Area*

Henan Province is located in eastern central China (31◦23 ~36◦22 N, 110◦21 ~116◦39 E). Figure 1 is a schematic diagram of the location of the study area. Henan Province is located between the warm temperate zone and the subtropical zone. The terrain of Henan Province is high in the west and low in the east, with mountains above 1000 m above sea level in the west and plains below 100 m in the east. Mountains and hills account for 44.3% of the total area, and plains account for 55.7%. The total wheat output of Henan Province ranks first in China, accounting for more than 28% of the country's total wheat output. The sowing time of wheat varies from north to south by nearly two weeks, and there is a large gap in yield in different regions. The topography of Henan Province is complex and diverse, the topography is low in the east and high in the west, with significant differences; the surface morphology is complex and diverse. Due to the influence of landforms and

monsoons, Henan Province has a wide variety of soil types and large differences in climate resources, resulting in significant differences in crop yields in different regions. According to the production conditions of producing areas and the climatic characteristics of the wheat growth period, the wheat-growing areas in Henan Province can be divided into five regions.

**Figure 1.** The geographic location of the study area and classification results (The red box on the left is the location of Henan Province. (**a**)—false color composite with the MODIS09 data; and (**b**)—land classification, green represents the winter wheat region).


5. The wheat area in the central and eastern part of Henan Province includes the irrigated land in the middle- and high-yield wheat areas in the north-central part of Zhumadian, Luohe, Zhoukou, Shangqiu, and Pingdingshan Mountain.

The topography and climate within each of the above production areas are relatively consistent; so, we developed a winter wheat yield model for each production area. The actual yield data for winter wheat in this study were provided by the Henan Provincial Bureau of Statistics.

#### *2.2. MODIS LAI*

LAI is defined as the area of unilateral green leaves per unit of ground area in a broadleaf canopy and half of the total needle surface area per unit ground area in a coniferous canopy. The LAI product selected for this study was MCD15A2H. MCD15A2H is an 8-day composite product with a total of 46 scenes per year and a spatial resolution of 500 m. The inversion algorithm for the Moderate Resolution Imaging Spectroradiometer (MODIS) LAI product was a look-up table constructed based on a three-dimensional radiative transfer model. When the main algorithm failed, a backup algorithm using an empirical relationship between NDVI and the canopy LAI was triggered to estimate the LAI for each pixel, and the look-up table was used to compare whether the observed and simulated canopy top BRFs were within a given biologically relevant range. All canopy/soil patterns and corresponding LAI values that differed between the modeled and observed BRFs within a given level of uncertainty were considered acceptable solutions. The product used for this study was version 6, and the final result was the true LAI.

All satellite data are archived in HDF-EOS format, and the MODIS Reprojection Tool (MRT) software provided by NASA enables the user to read the HDF-EOS format. This software supports the performing geo-transformations to different coordinate systems or cartographic projections and writing the output to other file formats (GeoTIFF). All data are initially projected onto an integer sine wave (ISIN) mapping grid. These data were corrected to UTM coordinates using MRT and resampled using the nearest neighbor algorithm. Non-wheat fields were masked using a land cover classification, and then the corresponding areas were cut out of the remotely sensed imagery using SHP data based on the extent of each city. The MODIS LAI products include the quality control (QC) information designed to help users make the best use of these data. Each QC layer has a large amount of quality information associated with each pixel, whether the pixel is labeled as cloudy, clear, or in cloud shadow. In a subsequent study, MODIS LAI selected highquality pixels derived using the master algorithm under cloud-free conditions. Figure 2 shows the time series results of the MODIS LAI for the entire winter wheat growing region in Henan, and the maximum value of the LAI was approximately 2. Considering that LAI is at a low level in the early and late stages of wheat growth, and that these growth stages do not have a particularly strong influence on yield formation, we used LAI from flowering to maturity each year as input data.

**Figure 2.** The mean curve of MODIS LAI in Henan Province from 2003 to 2016. The red line is the error bars plotted using standard deviation; blue dots represent the mean; between the two green dotted lines is the LAI for one year.

#### *2.3. Mapping of Wheat Distribution*

This study focused on winter wheat, which first required the extraction of winter wheat distribution areas from remote sensing images. We used the product produced by Dong [36]. The product was produced by first synthesizing monthly NDVI maxima and using FROM-GLC as a priori knowledge to obtain crop distribution information; then, we used the standard seasonal growth curve of winter wheat combined with the time-weighted dynamic time warping (TWDTW) to determine the area of winter wheat. The accuracy of the final product was higher than 89.30% and 90.59% for producers and users, respectively. To maintain consistency with the resolution of MODIS data, the classification results were finally resampled to 500 m using the mode resampling method. Mode resampling selects the value with the highest frequency of occurrence among all sampling points, and the results maintain the real state of the ground surface to some extent. The classification results are shown in Figure 1b.

#### *2.4. LSTM*

The recurrent neural network (RNN) is a type of neural network with short-term memory capabilities. In a cyclic neural network, a neuron cannot only receive information from other neurons but also its own information, forming a network structure with loops. Compared with feedforward neural networks, recurrent neural networks are more in line with the structure of biological neural networks. Recurrent neural networks have been widely used in tasks such as speech recognition, language modeling, and natural language generation. The parameter learning of recurrent neural networks can be learned by the backpropagation algorithm over time. The backpropagation algorithm with time transmits the error information step by step in the reverse order of time. When the input sequence is relatively long, there will be the problem of gradient explosion and disappearance. In order to solve this problem, people have made many improvements to the cyclic neural network. The most effective means of improvement is to introduce a gating mechanism, one of which is called a long short-term memory network (LSTM). LSTM is a variant of a cyclic neural network, which can effectively solve the problem of gradient explosion or the disappearance of a simple cyclic neural network.

The ingenuity of LSTM is that, by increasing the input threshold, the forgetting threshold, and output threshold, the weight of the self-loop is changed. As such, when the model parameters are fixed, the integration scale at different times can be dynamically changed, thereby avoiding the problem of gradient disappearance or gradient expansion.

The LSTM network introduces a gating mechanism to control the path of information transmission. The three "gates" are the input gate *it*, the forget gate *ft*, and the output gate *ot*. The functions of these three gates are as follows:

(1) The forgetting gate *ft* controls how much information needs to be forgotten to control the internal state *ct*−<sup>1</sup> at the last moment;

(2) The enter gate *it* controls the candidate state *<sup>c</sup><sup>t</sup>* at the current moment and how much information needs to be saved;

(3) The output gate *ot* controls how much information of the internal state *ct* at the current moment needs to be output to the external state *ht*.

When *ft* = 0 and *it* = 1, the memory unit clears the historical information and writes the candidate state vector *<sup>c</sup>t*. However, the memory unit *ct* is still related to the historical information at the previous moment. When *ft* = 1 and *it* = 0, the memory unit will copy the content of the previous moment without writing new information.

Figure 3 shows the cyclic unit structure of the LSTM network. The calculation process is: (1) first use the external state *ht*−<sup>1</sup> at the previous moment and the input *xt* at the current moment to calculate the three gates and the candidate state *<sup>c</sup>t*; (2) combine the forget gate *ft* and the input gate *it* to update the memory unit *ct*; and (3) combine the output gate *ot* to transfer the information of the internal state to the external state *ht*.

**Figure 3.** Cyclic cell structure of the LSTM network.

By means of LSTM cyclic units, the whole network can be built up with long-distance temporal dependencies. It can be succinctly described as

$$
\begin{bmatrix}
\overline{c}\_t \\
 o\_t \\
\dot{h}\_t \\
f\_t
\end{bmatrix} = \begin{bmatrix}
tanh \\
\sigma \\
\sigma \\
\sigma
\end{bmatrix} \left( w \begin{bmatrix}
\chi\_t \\
h\_{t-1}
\end{bmatrix} + b \right) \tag{1}
$$

$$c\_t = f\_t \bigodot c\_{t-1} + i\_t \bigodot \overline{c}\_t \tag{2}$$

$$h\_t = o\_t \widehat{\mathbb{C}\lambda} \tan \mathbf{h}(c\_t) \tag{3}$$

In the LSTM network, a memory unit c can capture a key piece of information at a certain moment and can store this key information for a certain time interval. The lifetime of information stored in memory unit c is longer than that of short-term memory h but much shorter than that of long-term memory. Figure 4 shows the workflow from LAI data processing to LSTM model building and yield estimation.

**Figure 4.** Overall structure of the LSTM model for wheat yield estimation.

#### *2.5. Model Evaluation Metrics*

We chose root mean square error (RMSE), coefficient of determination (R2), and mean absolute percentage error (MAPE), the three metrics in Equations (4)–(6). To evaluate the performance of the model, the difference between the predicted and actual statistical values of the model was calculated. A smaller RMSE indicates a better performance of the model, and a larger R2 indicates a higher regression accuracy of the model. The lower the value of MAPE and RMSE, the higher the accuracy of the obtained predictive model. MAPE measures the error in percentage and specifies the average percentage deviation between the forecast value and the actual implementation [37]. Usually, the fit of the model is perfect when the MAPE value is below 10% and when it is in the range from 10% to 20%, the model fit is good. In the range of 20–30%, the error level is acceptable and when it exceeds 30%, the model is a poor fit and should be rejected [38].

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} (y\_i - y\_i)^2} \tag{4}$$

$$\mathbf{R}^2 = 1 - \frac{\sum\_{i=1}^n (\mathcal{g}\_i - y\_i)^2}{\sum\_{i=1}^n (\mathcal{y}\_i - \mathcal{y}\_i)^2},\tag{5}$$

$$\text{MAPE} = \frac{100\%}{n} \sum\_{i=1}^{n} \left| \frac{\mathcal{Y}\_i - \mathcal{Y}\_i}{\mathcal{Y}\_i} \right| \,\tag{6}$$

where *yi* represents the actual yield, *y*ˆ*<sup>i</sup>* is the predicted yield, and *n* is the sample size.

#### **3. Results and Discussion**

#### *3.1. Yield Estimation under Different Time Steps and Input Combinations*

We used winter wheat LAI time series data from 2003 to 2015 as training samples and modeled them with LSTM to obtain the winter wheat yield prediction results for 2016 and compared the predicted yield with the statistical yield in 2016 to verify the prediction accuracy of the model. Moreover, considering the characteristics of the LSTM model and the requirements of the yield prediction task, the input data required for training were processed as subsequently described, and the winter wheat growth data from March to the end of May each year were selected so that the input data for one year had 12 LAI values containing data from the flowering stage to the maturity stage. The fitted data of the model changed at different input steps, which also had an impact on the accuracy of the prediction results. To obtain the input step with the highest accuracy, the input steps from 1 to 6 were compared. The overall RMSE for Henan Province under the six step-size scenarios is shown in Figure 5. The solid red line in the figure is the 1:1 line, and the estimation results were mostly evenly distributed on both sides with the solid line as the center, indicating that the model had a good prediction for the yield. However, when the yield exceeded 7000 kg/ha, the estimation accuracy of the model tended to decrease, and the yield prediction tended to be underestimated, which is consistent with the systematic trend of underestimation at high yields in previous studies, mainly due to the relatively low proportion of data from high production areas in the sample. We first constructed LSTM models for five winter wheat production areas and finally obtained the winter wheat yield estimation results for the whole of Henan Province. A–E in Table 1 show the RMSE, R<sup>2</sup> and MAPE of the five grain-producing regions at different step sizes, and F shows the statistical results for the whole Henan Province. Figure 6 shows the RMSE histograms of the LSTM model for the five regions and the whole of Henan Province under different step lengths. When the step size was short, the predicted yields of the LSTM model for different production areas had great instability compared with the statistical yields in the field, but when the step size increased to 3 and 4, the accuracy started to improve and stabilized. However, the accuracy decreased again when the step size was 6. The LSTM model achieved optimal values of R2 and RMSE at a step size of 4, with results of 88% and 532.16 kg/ha, respectively, while the value of MAPE was 4.44%, which was at a very high level of fit. Although the problem of

LSTM was effectively solved in terms of gradient disappearance or explosion compared with the general RNN networks, the winter wheat yields were not closely related to yields from many years ago, and to some extent, were only strongly correlated with historical yields from the last three or four years, and there may be a loss of accuracy if the current results are fitted with data from a longer period of time ago.

**Figure 5.** Comparison of yield estimation results for LSTM models based on different time steps: (**A**) one time step; (**B**) two time steps; (**C**) three time steps; (**D**) four time steps; (**E**) five time steps; and (**F**) six time steps. The probability distributions of the statistical and estimated yields are plotted on the right and top of the Y axis, respectively.

**Table 1.** The accuracy evaluation of the LSTM prediction model was compared between different time steps when 12 LAI data were input.


The respective highest accuracy estimation results in different regions differed significantly in time steps. For the whole of Henan Province, the overall accuracy may not be optimal if the same step was used for yield estimation. The lowest value of RMSE can be seen in Figure 6 in the southwestern production area, with an RMSE of approximately 400, and the highest in time steps was in southwestern Henan, with approximately 700. There was a correlation between the winter wheat yield and time series LAI data, but the interannual correlation was to some extent not a more accurate result by modeling with more data, and there are many factors affecting the wheat yield, including variation in variety, temperature, and topography, which can all cause yield fluctuations, although all these factors can affect the LAI time series to some extent. However, when using time series LAI data alone as input parameters, it is not better to use more data, and it is not better to set a higher time step; these should be quantitatively and individually tested for different regions and not generalized.

**Figure 6.** The model performance (predicted RMSE) using different time steps for the whole growing season. (**A**) one time step; (**B**) two time steps; (**C**) three time steps; (**D**) four time steps; (**E**) five time steps; and (**F**) six time steps.

#### *3.2. Yield Estimation with Different Input Time Series Data*

In the process of winter wheat yield estimation, it is always desirable to predict the yield as early as possible. Therefore, we considered shortening the time range of the LAI data used and the used data from the plucking stage to the filling stage for modeling so that their yield could be predicted 16 days before maturity. The LAI data used in this study were only six per year, which halved the amount of data compared to previous studies, making it easier to obtain the data, especially the high spatial resolution satellite data. The results of the comparison between the predicted yield and the actual statistical yield are shown in Figure 7, from which there was no significant decrease in the prediction accuracy for all of Henan compared to the 12 data, and the accuracy of individual time steps increased. The best time step for the overall accuracy occurred at 2, where the R2 and RMSE were 87% and 522.32 kg/ha, respectively, while MAPE was 5.67%. A–E in Table 2 shows the RMSE, R2 and MAPE of the five grain-producing regions at different step sizes, and F shows the statistical results for the whole Henan Province; Figure 8 shows the histogram of RMSE. However, it can also be seen that the value of RMSE gradually increased with increasing step size. This may be due to the fact that LAI data were strongly correlated only in the last two years, and the larger the time difference, the lower the correlation between LAI and yield; moreover, the interannual LAI was not strongly correlated with early and late yield formation, so in the process of yield estimation using time series LAI, a higher accuracy was obtained by using data from the nodulation to filling stage. It has been shown that the accumulation of dry matter in winter wheat is mainly concentrated at the nodulation and gestation stages, and the LAI in this period was closely correlated with the yield formation of winter wheat, which is also more consistent with the results of this study. It should also be noted that the overall RMSE accuracy was the best except for the case of step size 2. The RMSE of the remaining steps was significantly lower compared to the model with 12 data; therefore, using as many growing period data as possible would also make the model more robust when using time-series LAI for yield estimation.

**Figure 7.** Comparison of the yield estimation results for LSTM models based on different time steps (**A**) one time step; (**B**) two time steps; (**C**) three time steps; (**D**) four time steps; (**E**) five time steps; and (**F**) six time steps. The probability distributions of the statistical and estimated yields are plotted on the right and top of the Y axis, respectively.

**Table 2.** The accuracy evaluation of the LSTM prediction model was compared between different time steps when 6 LAI data were input.


**Figure 8.** The model performance (predicted RMSE) using different time steps for only one period of the growing season. (**A**) one time step; (**B**) two time steps; (**C**) three time steps; (**D**) four time steps; (**E**) five time steps; and (**F**) six time steps.

#### *3.3. Performance Comparison with Machine Learning Methods*

Here, four machine learning methods (random forest, support vector regression, partial least squares regression, and XGBoost) were used to construct county-level wheat yield

models for each agroecological zone. Random forest (RF) is a supervised machine learning algorithm based on integration learning [39]. Different subsets are randomly drawn from the provided data and used to build several different decision trees and integrate the results of one decision tree according to Bagging's rules. Support vector regression (SVR) is a regression algorithm that is a variant of SVM in regression analysis [40]. SVR also considers maximization intervals but considers points within the decision boundary so that as many sample points as possible lie within the interval. The partial least squares regression (PLSR) [41] algorithm is a regression modeling method for multiple dependent variables Y on multiple independent variables X. The algorithm considers extracting as many principal components as possible from Y and X in building the regression and maximizing the correlation between the principal components extracted from X and Y, respectively. XGBoost [42] is a scalable machine learning system that adds to the objective function of each iteration regular term to further reduce the risk of overfitting, XGBoost is an all-in-one machine learning algorithm.

The yield prediction of winter wheat in Henan Province was constructed using the four methods mentioned above. The prediction model first used winter wheat LAI time series data from 2003 to 2015, followed by yield prediction for 2016, and the R2 and RMSE of the prediction results are shown in Table 3. For the whole of Henan Province, the best performance among the four methods was the SVR with R2, RMSE and MAPE of 0.76, 725.8 kg/ha and 6.33%, respectively, and the worst was PLSR with R2, RMSE and MAPE of 0.7 and 809.1 kg/ha and 7.74%, respectively. Compared with these machine learning methods, the prediction results of the LSTM had better accuracy and performance both for individual wheat growing areas and for the whole of Henan province.

**Table 3.** Accuracy evaluations comparison among different methods.


The prediction accuracy of the four machine learning methods was the lowest in the southwest Henan region, which is a mountainous and hilly area with complex topography, a small winter wheat growing area, and large yield variation, and there may be a lack of yield accuracy in this region using conventional methods to construct the model. Compared with these, using LSTM model for winter wheat yield prediction had a superior performance. Compared to algorithms such as SVR and RF, the accuracy of estimation was not sufficient, because the ability to analyze complex nonlinear relationships between long time series variables is not as good as LSTM, resulting in poor model performance. These machine learning methods do not consider the time correlation between winter wheat yields in the modeling process, and the estimation for each year's yield is conducted independently, while LSTM takes into account the time series correlation of yields and also can better handle the nonlinear relationship, so it has higher accuracy compared to machine learning. Overall, a better performance capability can be obtained using LSTM models for forecasting time series data.

#### **4. Conclusions**

In this study, considering the complexity of data collection, we used LAI as a single input variable and five models, including four machine learning models (RF, SVR, PLSR, and XGBOOST) and one deep learning model (LSTM) to predict the winter wheat yield in Henan Province in 2016. In general, the LSTM model had superior performance compared with the machine learning models. Moreover, considering the characteristics of the LSTM, the time step of the modeled data as well as the growth period data were analyzed, and the

time step needs to be analyzed for different growth regions, while using only the necessary growth period data can also obtain a high prediction accuracy; however, for the robustness of the model, the more growth period data are used accordingly, the better. To date, winter wheat yield prediction based on remote sensing images has been carried out at the county level. However, the determination of crop yield remains a challenge because the variability and uncertainty within the region is unknown. The results of our study on winter wheat yield prediction at a regional scale using publicly available data, using LAI as an input variable for determining crop yield, can potentially be applied to crop yield estimation in regions with sparse observational data and worldwide.

**Author Contributions:** Conceptualization, J.W.; resources, Z.G.; writing—original draft preparation, J.W.; writing—review and editing, J.W., L.S. and H.S. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China (NO.42101362, 31501225); the Natural Science Foundation of Henan Province of China (NO.222300420463).

**Institutional Review Board Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Shanjun Luo 1, Xueqin Jiang 2,\*, Weihua Jiao 3, Kaili Yang 1, Yuanjin Li <sup>1</sup> and Shenghui Fang 1,\***

<sup>1</sup> School of Remote Sensing and Information Engineering, Wuhan University, Wuhan 430079, China


**Abstract:** A precise forecast of rice yields at the plot scale is essential for both food security and precision agriculture. In this work, we developed a novel technique to integrate UAV-based vegetation indices (VIs) with brightness, greenness, and moisture information obtained via tasseled cap transformation (TCT) to improve the precision of rice-yield estimates and eliminate saturation. Eight nitrogen gradients of rice were cultivated to acquire measurements on the ground, as well as six-band UAV images during the booting and heading periods. Several plot-level VIs were then computed based on the canopy reflectance derived from the UAV images. Meanwhile, the TCT-based retrieval of the plot brightness (B), greenness (G), and a third component (T) indicating the state of the rice growing and environmental information, was performed. The findings indicate that ground measurements are solely applicable to estimating rice yields at the booting stage. Furthermore, the VIs in conjunction with the TCT parameters exhibited a greater ability to predict the rice yields than the VIs alone. The final simulation models showed the highest accuracy at the booting stage, but with varying degrees of saturation. The yield-prediction models at the heading stage satisfied the requirement of high precision, without any obvious saturation phenomenon. The product of the VIs and the difference between the T and G (T − G) and the quotient of the T and B (T/B) was the optimum parameter for predicting the rice yield at the heading stage, with an estimation error below 7%. This study offers a guide and reference for rice-yield estimation and precision agriculture.

**Keywords:** yield estimation; rice; unmanned aerial vehicle (UAV); tasseled cap transformation; precision agriculture

#### **1. Introduction**

As the largest grain crop in the world and a staple food for over half of the global population, the research on rice is of crucial importance for agricultural systems and food production [1]. Rice-yield data are vital reference indicators for species selection and breeding, determined by the combination of genes and the growth environment. The accurate prediction of rice yields, and especially at the regional level, is of great relevance to guaranteeing food security and sustainable agricultural development, and it is concerned with the elaboration of major policies for national livelihoods [2].

The conventional methods for crop-yield estimates include field sampling [3] and the crop-growth model [4]. The field survey is a devastating assessment method. Although the accuracy of the results can be maintained through a comprehensive investigation, it is undoubtedly a laborious and lengthy task [5]. Crop-growth models incorporate multiple data sources and approaches, which greatly compound their complexity due to the many model input parameters [6]. The remote estimation of yields is a technology that can be used to develop a connection between crop spectra and yield data. Remote sensing (RS) provides a convenient way to efficiently acquire spectral data of vegetation canopies in

**Citation:** Luo, S.; Jiang, X.; Jiao, W.; Yang, K.; Li, Y.; Fang, S. Remotely Sensed Prediction of Rice Yield at Different Growth Durations Using UAV Multispectral Imagery. *Agriculture* **2022**, *12*, 1447. https:// doi.org/10.3390/agriculture12091447

Academic Editors: Jibo Yue, Chengquan Zhou, Haikuan Feng, Yanjun Yang and Ning Zhang

Received: 10 August 2022 Accepted: 9 September 2022 Published: 13 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

a nondestructive manner, which carries considerable valuable information regarding the interaction between the canopy and solar radiation, such as the vegetation absorption and scattering [7]. Vegetation-canopy spectra are intimately associated with crop growth, and especially in the visible ranges affected by pigmentation and the near-infrared (NIR) bands subject to the cell tissue and canopy structure [8]. Therefore, the vegetation indices (VIs) derived from these bands have been frequently adopted to estimate the vegetation phenotypic parameters, such as the leaf-area index (LAI), biomass, chlorophyll content, and nitrogen content [9,10]. In general, remote estimates of crop yields using VI-based methods have become mainstream [11].

As for the data source, it is an influential factor in crop-yield estimations. Yield estimations using ground-based measurement spectra are hardly adequate for large areas, and the real-time forecasting requirements and performance are regionally limited [12]. Satellite imagery can be appropriate and cost-effective data for crop monitoring at the regional scale [13]. However, cloud coverage is pervasive during the pivotal crop-growing season, and thus, sufficient spatial- and temporal-resolution data may not be available for precision agriculture. The emergence and development of unmanned aerial vehicles (UAVs) and lightweight sensors can be complementary between satellites and groundbased sensors [14]. Because UAVs have easy access to dynamic data, they have enormous potential to solve and refine strategies for the challenges encountered in agriculture. Despite some shortcomings, such as the flight time, load capacity, and weather situation, UAVs are expected to be applied with high frequency in agriculture from now on due to the valued information gained and effective implementation [15]. The spectral information gained from UAV-based multispectral or hyperspectral data has been broadly applied for cropgrowth monitoring and parameter estimation [16]. Moreover, multispectral images, free from the information redundancy and complicated processing of hyperspectral data, have a red-edge band that RGB digital images lack, and their centimeter-level spatial resolutions make multispectral sensors preferred devices in precision agriculture [17].

The reproductive stages of rice can be divided into the tillering, jointing, booting, heading, filling, and maturity stages. During the booting and heading stages, the rice plant progressively accomplishes the conversion from nutritional to reproductive growth, and the appropriate parameters for the yield estimation differ at different stages [18]. In practical terms, it is imperative to access early and precise rice-yield data prior to harvest for market decisions and policymaking. In the early stages of rice growth, the leaves are not yet fully grown, and the variations in the later growth process make it difficult to estimate the yield accurately. However, it may be too late to use data collected at the later stages for yield estimation, as some effective measures need to be scheduled in advance. Moreover, the appearance of rice spikes during mid-to-late growth can interfere with the spectral characteristics of rice, as the color of the spikes eventually turns yellow, causing the overall spectral pattern of the rice to deviate from the normal green vegetation. Zhou et al. revealed that the presence of spikes increased the challenge of yield prediction in the late reproductive stage of rice [19]. Duan et al. also noted that the reduced predictive ability during the heading stage may be related to the uneven penetration of spikes into the sensor field [20]. Hence, the booting and heading stages are suitable for rice-yield estimation, but the heading stage needs to overcome the effect of the spikes.

The tasseled cap transformation (TCT), a viable pioneer of feature-detection algorithms, is a linear-conversion technique that is commonly utilized in the areas of vegetation, soil, and land-cover mapping [21]. The vast majority of the variations in the spectra of a single scene can be interpreted in terms of the brightness, greenness, and humidity retrieved from TCT [22]. Therefore, TCT was exploited to extract the brightness, greenness, and moisture components of the rice fields to provide potentially valuable variables for yield estimation.

In our experiment, the canopy spectral data of the paddy field was remotely measured from both the ground and UAV-mounted platforms, which had quite high spatial resolution, and thus, well-reflecting variations of field. Meanwhile, the LAI and chlorophyll-content data (SPAD) in the same period were obtained. Unlike previous studies, we compared the

yield-estimation performance of the ground and UAV-based parameters at different periods, and we combined the TCT parameters to improve the accuracy of the rice-yield estimation without saturation. With rice grown under different nitrogen-fertilizer treatments, our objectives were: (1) to compare the ability of the rice-yield estimation at the booting and heading stages; (2) to accurately estimate the rice yield by ground measurements and UAV data; (3) to explore improving the VI-based approach for rice-yield estimation by integrating the brightness, greenness, and wetness retrieval from TCT.

#### **2. Materials and Methods**

#### *2.1. Study Area and Experimental Design*

The study area was located in Wuxue City, Hubei Province, China (Figure 1a). It has a humid subtropical monsoon climate with long plant production cycles and abundant rainfall. It is suitable for the comprehensive development of agriculture, forestry, animal husbandry, and fishing. As shown in Figure 1b, the identical rice variety was cultivated in 24 plots with different N-fertilizer-application levels, with a whole area of about 480 m<sup>2</sup> and a total of 7920 rice plants. The ridges between each plot were covered with white plastic film to isolate the mixing of water in the field. There were eight N-fertilizer gradients, and three replications, applying N0, N3, N5.5, N8.5, N11, N14, N16.5, and N19.5 (unit: kg/ha). Two important growth periods (the booting and heading stages) were selected for the UAV flight experiments. In the former period, no spikes appeared, and in the later period, almost all spikes were clearly present. The conditions were strictly identically controlled, except for differences in the amount of the nitrogen-fertilizer application. Field maintenance, including weeding and pest control, was performed by professionals throughout the growing season.

**Figure 1.** Study area and rice-plot settings: (**a**) experimental-area location; (**b**) nitrogen-gradient layout of rice plots.

#### *2.2. Ground-Data Acquisition*

The LAI, canopy chlorophyll content (CCC), and canopy height (CH) are significant indicators for characterizing crop yields [23,24]. Therefore, the SunScan canopy analysis system (Delta Inc., Cambridge, UK) was applied to measure the LAI of each plot. The five-point sampling method was performed at the four corners and center of each plot, and the average value was taken as the canopy LAI of each plot. At each location where the LAI was measured, three rice plants were selected, and the leaf chlorophyll was measured in the upper, middle, and lower parts of the plant using the SPAD-502 chlorophyll meter (Konica, Minolts Sensing Inc., Osaka, Japan); the mean value was recorded as the leaf chlorophyll content of each plot. The CCC is generally expressed using the product of the LAI and SPAD [25]. At each location where the LAI and SPAD were observed, three rice plants were randomly selected, and the height of the rice was measured using a millimeter ruler; the final CH was the mean value of all the readings. Rice seeds were collected by hand

harvesting when the rice was fully mature. The seeds were then separated and left to dry in the sun until there were no variations in their weights. All the sun-dried seeds in each plot were weighed independently to obtain the rice yield of each plot.

#### *2.3. Canopy Reflectance Derived from UAV Images*

The UAV flights were implemented before the ground measurements. As shown in Figure 2, a multirotor UAV (S1000, SZ DJI Technology Co., Ltd., Shenzhen, China) equipped with a six-band MCA camera (Mini-MCA 6, Tetracam, Inc., Chatsworth, CA, USA) was employed to collect multispectral images of the rice at the booting (13 August 2015) and heading (29 August 2015) stages. The drone flights were conducted between 11:00 a.m. and 1:00 p.m. local time, thus ensuring minimal variation in the solar zenith angle. The multispectral camera has a center band of 490@10, 550@10, 670@10, 720@10, 800@20, or 900@20 nm. The UAV is also equipped with a three-axis gimbal to ensure that the camera is always shooting vertically downward. Four gray plates (reflectances of 6%, 12%, 24%, and 48%) were placed in the camera field of view for the radiometric calibrations to obtain the reflectance data. To prevent reflectance errors caused by solar-illumination variations, panoramic photographs of the entire study area were taken with a UAV flight altitude of 60 m and an image spatial resolution of approximately 0.03 m.

**Figure 2.** UAV reflectance-data-acquisition system: (**a**) UAV; (**b**) fixed reflectance grey plates for radiometric calibration; (**c**) Mini-MCA 6 multispectral camera.

The classical linear-radiometric-calibration method was utilized to transform the DN values of the multispectral images into reflectances to ensure the comparability of the data from different periods [26]. The reflectance was computed as follows:

$$\mathbf{R\_i = DN\_i \times G\_i + O\_i \ (i = 490, 550, 670, 720, 800, and 900)}\tag{1}$$

$$
\begin{pmatrix} 0.06\\ 0.12\\ 0.24\\ 0.48 \end{pmatrix} = \begin{pmatrix} \text{DN}\_{0.06} \\ \text{DN}\_{0.12} \\ \text{DN}\_{0.24} \\ \text{DN}\_{0.48} \end{pmatrix} \times \text{G}\_{\text{i}} + \text{O}\_{\text{i}} \tag{2}
$$

where Ri represents the calculated reflectance of the ith band, DNi is the digital number of the ith band in the original multispectral images, and Gi and Oi represent the gain and offset values of the ith band, respectively.

For the 24 rice plots, we determined the maximum region of interest (ROI) suitable for each plot (equal to 10,000 pixels), and then the plot-level reflectance was the average of all the pixels in that rectangle.

#### *2.4. VI Calculation Based on UAV Data*

Several commonly available VIs obtained using combinations of visible, red-edge, and NIR bands are shown in Table 1. These VIs were selected for their good performance in crop-yield estimation and inversion of the phenotypic parameters.

**Table 1.** The common spectral indices selected in this paper.


#### *2.5. Tasseled Cap Transformation*

TCT, which is a quadrature conversion, provides the projection of feature messages, such as the soil and vegetation in the spectral domain, into the tasseled-cap space, following the structural characteristics of the distribution of ground information in multispectral remote sensing [34]. After the TCT was performed, the spectral dimensions could be reduced, and the information was concentrated in a few feature spaces. Its defining equation is given in Equation (3):

$$\mathbf{y} = \mathbf{A}\mathbf{x} + \mathbf{b},\tag{3}$$

where y is the vector after the TCT; A is the unit quadrature matrix and the coefficient matrix of the TCT; x is the gray value of the image, or the apparent reflectance of the sensor; b is served as an offset vector to avoid negative values after the transformation.

When TCT was performed on six-band UAV images, the results were composed of three factors: the brightness (B), greenness (G), and third component (T). All three of these variables are intimately associated with the surface landscape. The B component represents the variation information of the reflectance, which is a weighted sum of six bands and reflects the overall brightness variation of the surface object. The G component is vertical to the B component, and it also shows the contrast between the visible band (especially the red band) and NIR band, showing the variation in the greenness of the ground vegetation, which is closely related to the ground-vegetation cover, LAI, and biomass. The T variable reflects the moisture characteristics of the soil and vegetation. Similar to calculating the plot-level reflectance, the plot-level TCT parameters were obtained by defining a rectangle (ROI).

#### *2.6. Accuracy Evaluation Using Leave-One-Out Cross-Validation*

The yield-prediction model was assessed by employing the leave-one-out crossvalidation (LOO-CV) method to reduce the reliance on a single random fraction of the calibration and validation dataset [35]. In this paper, the iterative process was repeated 22 times to ensure that each piece of data was engaged in the validation (the yields of two plots were removed as a result of serious problems during harvesting). The adjusted R2, RMSE, and MRE were selected as the final accuracy metrics [11].

#### **3. Results**

#### *3.1. Rice-Yield Estimation using Ground Measurements at Different Stages*

In this study, the rice yield at the booting and heading stages was predicted based on the plot-level LAI, CH, and CCC measured on the ground. Rice-yield data from 24 plots were compared and analyzed, two of which were removed due to obvious errors, and the remaining 22 yield data, along with the corresponding ground data, were applied for modeling analysis. The Shapiro–Wilk test was chosen to check the normality of the data before modeling the rice yield.

In Table 2, the ground measurements (yield, LAI, CH, and CCC) approximately followed a normal distribution (*p* > 0.05). Then, the phenotypic parameters and yield of the rice measured on the ground at the booting and heading stages were fitted by least-squares regression (Figure 3). It was found that the LAI was a good fit for the yield at the booting stage (R2 = 0.569), but relatively poor at the heading stage (R<sup>2</sup> = 0.468).

**Table 2.** Data description and normality test.


**Figure 3.** Fitting of ground data to yield at different growth stages: (**a**–**c**) booting stage; (**d**–**f**) heading stage.

By comparing the yield-estimation performance of the CH, LAI, and CCC, the results of a Pearson correlation analysis showed that the correlation (r) between the LAI and yield was improved by integrating SPAD data at the booting stage (Table 3), and the yield fit was significantly improved (R<sup>2</sup> = 0.622 vs. 0.569) (Figure 3). However, at the heading stage, the correlation decreased after combining SPAD data (R2 = 0.468 vs. 0.347). The yield estimation with the CH was the poorest of the two periods, and especially at the heading stage. Therefore, the ground-measured LAI and SPAD data (CCC) were not suitable for predicting the yield of rice at the heading stage, but they had a good fit at the booting stage.


**Table 3.** Accuracy comparison of different regression models.

\*\* indicates that the correlation is significant at the 0.01 level (two-tailed).

#### *3.2. Rice-Yield Estimation Using TCT Parameters*

The change in rice from booting to heading is a process from nutritional to reproductive growth. Spikes basically do not appear on the surface of the rice field during the former period. In contrast, spikes progressively emerge after about two weeks. In addition to the different apparent information, there is also significant spectral diversity in rice at these two stages (Figure 4). The spectral characteristics of the rice at the booting stage were consistent with those of typical green vegetation, but they changed significantly at the heading stage. The reflectance of the rice canopy during the heading stage was significantly higher in the visible–NIR range. The appearance of rice spikes has a great influence on the spectral properties of rice.

**Figure 4.** Spectra and field photos of rice at different stages: (**a**) spectral curves of rice at different stages; (**b**) actual view of rice at booting stage; (**c**) actual view of rice at heading stage.

Given the obvious change in the color and texture of the rice canopy caused by the appearance of panicles, paddy-field images at the booting and heading stages were obtained through a UAV equipped with a six-band Mini-MCA camera. Subsequently, the brightness,

greenness, and third-component maps of the spectral-dimension reduction were retrieved by TCT (Figure 5).

**Figure 5.** TCT-component images: (**a**) brightness at booting stage; (**b**) greenness at booting stage; (**c**) third component at booting stage; (**d**) brightness at heading stage; (**e**) greenness at heading stage; (**f**) third component at heading stage.

It can be noted that the TCT-component diagrams at the booting and heading stages showed a similar variation pattern. The brightness-component maps showed the overall variation in the rice reflectance throughout the experimental area, which was remarkably brighter than the greenness and third-component ones. In a single growth stage, the brightness of each plot varied with the different nitrogen-gradient conditions: the less nitrogen fertilizer applied, the darker the image. The gray distributions of the greenness and third-component images were contrary to that of the brightness-component image, which showed that the more nitrogen application applied, the darker the image. On account of the lighter color of the panicle compared with the leaf, the uneven occurrence of panicles was reflected by different greenness performances during the same growth period. For the third-component maps, the color at the booting stage was darker than that at the heading stage, reflecting the water status of the paddy fields and rice.

After completing the normality test (Table 2), a strong correlation (r > 0.5) was shown between the TCT parameters and rice yield during the booting period (Table 3). However, in the latter stage, the correlation between the yield and the brightness and greenness components was significantly lower. A linear fit of the yield and TCT parameters revealed a satisfactory result for the third component at both stages (R<sup>2</sup> values more than 0.5), but saturation was present at the booting stage (Figure 6). Meanwhile, no saturation was observed when using the brightness and greenness to predict the yield, but the performance was poor (R2 values below 0.5 at the booting stage, and below 0.2 at the heading stage).

**Figure 6.** Fitting of TCT parameters to yield at different growth stages: (**a**–**c**) booting stage; (**d**–**f**) heading stage.

#### *3.3. Rice-Yield Estimation Combining TCT Parameters and VIs*

The correlation analysis of the yield vs. UAV-based VIs and TCT-based parameters (VI × Brightness, VI × Greenness, and VI × Third Component) was performed to compare the precision of the yield prediction at different growth stages (Figure 7). The results suggested that there was a strong correlation between the VIs and the yield at the booting stage (r > 0.7), while at the heading stage, except for the EVI2, NDRE, and SAVI, the correlation of the VIs vs. the yield decreased to some extent, and especially the CIgreen vs. yield. Multiplied by the TCT parameters, some of the VIs had a stronger correlation with the yield, which was more obvious at the heading stage. At the heading stage, the brightness improved the correlation between the CIred edge, CIgreen, NDRE, WDRVI, and MTCI and the yield. The greenness only improved the correlation of the CIred edge and CIgreen vs. the yield, and the third component basically improved the correlation between all the listed VIs and the yield. Based on the correlation analysis, the yield estimation of the rice was carried out on 22 samples at the booting and heading stages: (1) yield vs. Vis; (2) yield vs. VI × Brightness; (3) yield vs. VI × Greenness; (4) yield vs. VI × Third Component. The adjusted R2 and RMSE were used to evaluate the performance of the yield prediction.

The yield-estimation results of the rice at the booting stage are shown in Table 4. The best-estimated yield parameter in the VIs was the WDRVI, with an adjusted R2 of 0.634. The prediction results of the VI × Brightness and VI × Third Component were improved to a certain degree. In contrast, the performance of the VI × Greenness was worse. After combining the TCT parameters, the optimal yield-estimation variable was the CIgreen. Several models with the best fitting effect were selected for analysis (shown in Figure 8). Except for the CIgreen × Brightness, the other models (WDRVI, CIgreen × Greenness, CIgreen × Third Component) were saturated with different degrees, of which the WDRVI was the most obvious one.

**Figure 7.** Correlation coefficients between parameters of VI and TCT combinations and yield (\*\* indicates that the correlation is significant at the 0.01 level).


**Table 4.** Yield-estimation models incorporating TCT parameters and VIs at the booting stage.

**Figure 8.** Well-performing yield-estimation models incorporating TCT parameters and VIs at the booting stage.

The rice-yield-prediction results at the heading stage are shown in Table 5. Compared with the booting stage, the parameters with the best fitting performance changed, indicating that the sensitivity of the VIs to the yield varied after the emergence of panicles. The VI with the best fitting performance in this period was the SAVI (adjusted R2 = 0.600). Multiplied by the TCT parameters, the estimated result of the VI × Third Component had a certain degree of improvement. However, the fitting effects of the VI × Brightness and VI × Greenness were worse. Similarly, the models with good fitting effects were selected for analysis (Figure 9). Except for the SAVI, the other models (WDRVI × Brightness, CIred edge × Greenness, NDRE × Third Component) did not show significant saturation. Therefore, the most appropriate parameter to predict the rice yield at the heading stage was the NDRE × Third Component (Adjusted R2 = 0.612, RMSE = 0.272).

**Table 5.** Yield-estimation models incorporating TCT parameters and VIs at the heading stage.


**Figure 9.** Well-performing yield-estimation models incorporating TCT parameters and VIs at the heading stage.

The TCT parameters were transformed to further improve the accuracy of the riceyield estimation and reduce the model saturation. On the one hand, Figure 6 reveals that the third component was the best fit for the yield at the booting and heading stages, while the brightness and greenness components were poorly fitted for the yield. On the other hand, the yield was positively correlated with the brightness and third component, and negatively correlated with the greenness component. In addition, the fitting model of the third component and the yield had an obvious saturation phenomenon at the booting stage, which did not exist in the other components. Therefore, two new parameters of the difference between the third component and greenness (T − G) and the quotient of the third component and brightness (T/B) were constructed to fuse the various features of the TCT parameters. The correlation between the new parameters and the yield was significantly enhanced at the booting and heading stages (Figure 7). The rice-yield-prediction results of the VIs incorporating the new TCT parameters are shown in Tables 4 and 5. At the booting stage, the VIs incorporating T – G and T/B had high yield-estimation accuracy (RMSE < 0.276 in the VI × (T − G) model, and RMSE < 0.269 in the VI × (T/B) model). Figure 10 shows the yield-simulation models of the VIs incorporating the newly constructed parameters, and there was high accuracy in all the models at both periods (R2 values are

more than 0.6). Nevertheless, all the models at the booting stage had obvious saturation, but none at the heading stage. This indicated that the VIs combining the information of the brightness, greenness, and wetness had good suitability for estimating the rice yield at the heading stage: high accuracy and low saturation.

**Figure 10.** Well-performing yield-estimation models incorporating TCT combination parameters and VIs at different stages: (**a**–**d**) booting stage; (**e**–**h**) heading stage.

At length, the LOO-CV method was used to verify the model of the rice-yield estimation at the heading stage, and the results are shown in Figure 11. The model estimation errors of the NDRE × (T − G), NDVI × (T − G), SAVI × (T/B), and EVI2 × (T/B) were less than 7%.

**Figure 11.** Accuracy-assessment results of the VI × (T − G) and VI × (T/B) models at the heading stage: (**a**) NDRE × (T − G); (**b**) NDVI × (T − G); (**c**) SAVI × (T/B); (**d**) EVI2 × (T/B).

#### **4. Discussion**

The main purpose of this paper is to improve the accuracy of rice-yield estimation and reduce the saturation of the models by using the information on the brightness, greenness, and wetness obtained from TCT and combining the UAV-based VIs. The results demonstrated that the VIs incorporating the TCT parameters had good potential to solve these two problems. In crop-yield-estimation studies, an increasing number of parameters have been used in conjunction with VIs. For example, variables such as the canopy texture [36], canopy height [24,37], canopy coverage [24], and temperature [38] are frequently fused by machine-learning methods to improve the crop-yield-estimation accuracy [39]. However, this approach is too complex, and the models have limited robustness. In this paper, we combine the advantages of the VIs and TCT parameters in a simple way through a quadratic operation, which is both easy and has significant accuracy improvement.

The reason for selecting the research period of rice in this paper was that the morphology was not completely stable at the tillering stage and jointing stage, and the leaves and

stems changed greatly in a short time. Furthermore, the filling stage and ripening stage were close to the harvesting stage, and thus the yield data obtained was of little value. At the booting and heading stages, the rice gradually completed the transition from vegetative growth to reproductive growth. At the booting stage, there were almost no panicles in the rice canopy, while at the heading stage, with the continuous growth of the rice, the panicles gradually appeared until they covered the whole canopy. Other than that, there was no significant change in this stage relative to the booting stage (Figure 4b,c). Wang et al. proposed that the single-growth-stage model (RNDVI) (880, 712) at the booting stage was most suitable for the yield estimation of rice, with an R2 of 0.75 [18]. Duan et al. pointed out a new method integrating UAV-based VIs and abundance information retrieved from spectral mixture analysis to improve the yield-estimation precision of rice at the heading stage [11]. Zhang et al. put forward that the estimation of the grain yield during the early to mid-growth stages was significant for the initial diagnosis of rice and the quantitative regulation of topdressing [40]. Kawamura et al. demonstrated that the booting stage might be the optimum time for in-season rice-grain assessment [41]. Zhou et al. held that the booting stage was determined as the optimal period for grain-yield estimation using Vls at a single stage for both digital images and multispectral images [19]. Therefore, based on the principle of prediction possibility and time advance, the optimum growth period for rice-yield simulation was determined to be the booting stage, but the heading stage also had great potential for high-precision estimation.

We tried to compare the effects of different data sources on the rice-yield estimation in various stages by collecting the ground data (CH, LAI, and CCC) and UAV remote-sensing images at the booting and heading stages. Peng et al. remotely predicted the yield of oilseed rape based on LAI estimation, with good performance [42]. Hence, the rice yield was first estimated by LAI data. The results showed that the predictive ability of the LAI at the booting stage was significantly better than that at the heading stage (R2 = 0.569 vs. 0.468) (Figure 3). Liu et al. utilized the LAI integrated with SPAD (LAI × SPAD) data to demonstrate the potential of estimating rice yields [25]. The LAI × SPAD data and rice yield in this paper were also used for regression analysis, and the results showed that they significantly enhanced the ability to predict the rice yield at the booting stage, with an obvious improvement compared with the LAI (Figure 3). However, at the heading stage, the CCC reduced the prediction ability of the rice yield, even worse than the simulation ability of the LAI, which indicated that the appearance of panicles at the heading stage weakened the predicted potential of the LAI and SPAD. Liu et al. also deemed that the CCC completely derived from the green leaves of rice had a good correlation with the yield [25]. Consequently, it was reasonable to speculate that the main reason for the decline in the yield-estimation ability at the heading stage was the emergence of panicles because the SunScan canopy analysis system was used in the LAI measurement. According to its measuring principle, panicles and stems were also a part of the LAI output information, which was probably unrelated to the yield estimation.

With the improvements in RS technology, more crop-canopy images with different spatial scales can be obtained, including multispectral and hyperspectral images [17,43]. VIs calculated by the combination of different bands is one of the most used methods for yield estimation [44]. The eight plot-level VIs (NDVI, CIred edge, CIgreen, EVI2, NDRE, WDRVI, MTCI, and SAVI) were extracted from the multispectral images at the booting and heading stages of rice. Then, the VIs and yield data were fitted by the least-squares method. The results showed a good performance at the booting stage, with a minimum RMSE of the WDRVI of 0.254 (Table 4). The simulation results of the other VIs listed at this stage were also satisfactory (RMSE < 0.283). However, there was a most prominent problem of vulnerability to saturation in the VI-based simulation models. The apparent saturation phenomenon exhibited that most of the WDRVI values were concentrated around 0.75 (Figure 8). At the heading stage, the ability of the VIs to predict the yield decreased significantly. Except for the SAVI and EVI2, the simulation accuracies of the rest of the VIs were very poor (RMSE > 0.3) (Table 5). The CIgreen of the fitting model, in particular, had

an adjustment R<sup>2</sup> of 0.305. The calculation of the CIgreen combined with the reflectance of the green band, and the appearance of panicles, largely reflected the green characteristics of the rice, thus affecting the correlation with the yield. According to the simulation results of the SAVI, the saturation phenomenon was still very distinct (Figure 9). In general, whether it was the booting stage or heading stage, the rice yield simulated by the VIs was inevitably saturated.

The appearance of panicles at the heading stage will lead to changes in the ricecanopy color and other characteristics, which, in turn, have a direct impact on the canopy reflectance. Therefore, the TCT method was used to extract the brightness, greenness, and wetness information of the rice at the booting and heading stages to improve the yieldestimation accuracy and eliminate the saturation. It was found from the TCT-component maps (Figure 5) that the brightness image at the booting stage was darker than that at the heading stage, while the greenness and wetness images at the heading stage were darker than those at the booting stage. This is because the reflectance of the rice canopy at the heading stage was significantly higher than that at the booting stage, and the brightness map was a direct mirror of the reflectance. Due to the appearance of panicles, the greenness of the rice-canopy leaves at the heading stage was replaced by the light color of some of the panicles, resulting in a decrease in the greenness. According to the water requirement of rice, it is necessary to irrigate enough water at the booting stage, and at the heading stage, the water in the paddy fields should be drained off irregularly. Thereby, the wetness of rice fields at the heading stage would decrease. A correlation analysis and regression analysis were performed on the TCT parameters and yield data—(Table 3 and Figure 6). The results showed that the brightness and greenness had poor simulation effects on the yield, while the wetness had a better effect. However, the saturation appeared in the simulation model of the wetness at the booting stage, but it did not exist at the heading stage. In a word, the direct use of the brightness, greenness, and wetness information was not enough to accurately simulate the rice yield.

Combining the different advantages of the VIs and TCT parameters to simulate the rice yield (high precision and low saturation), the method of VIs multiplied by TCT components was employed in this paper. Although some of the parameters were well simulated, the simulation accuracy of the VI × Greenness models at the booting stage, the VI × Brightness, and the VI × Greenness models at the heading stage were lower than those of the VI models. Moreover, there were different degrees of saturation in the fitting models at the booting stage, but none at the heading stage. In combination with the characteristics of different TCT parameters (correlation and saturation), VI × (T − G) and VI × (T/B) were established to estimate the yield of rice at different stages. The models at the booting stage were still saturated, while the simulation models at the heading stage showed high precision with no obvious saturation, and estimation errors below 7%. Consequently, the VIs, which combined the information of the brightness, greenness, and wetness, were suitable for estimating the rice yield at the heading stage.

In this paper, we developed a new approach to estimating rice yields at the booting and heading stages using the integration of VIs and the brightness, greenness, and wetness information retrieved from UAV multispectral images. This method is simple and feasible, but it has crucial reference significance for the yield estimation of rice and similar crops. Moreover, the theoretical and technical support was provided for the crop-yield estimation with evident changes in the canopy over time. In the future, we will further set up more dense nitrogen-fertilizer gradients to explore the best nitrogen-application amount for rice. In terms of the LAI measurement, some new instruments (for example, the LI-3100C table leaf-area meter, LI-COR, USA) will be used to avoid the impact of the panicles and stems on the output of the LAI, and more realistic LAI data will be obtained to improve and validate the accuracy of rice-yield estimations by ground-measurement data. Concurrently, this method will be applied to satellite data and other crops to enable the rapid, nondestructive, and high-precision estimation of crop yields over larger areas.

#### **5. Conclusions**

In this study, we developed a technique to improve the estimation of rice yields at the booting and heading stages using UAV-based VIs and TCT-based parameter data. The groundmeasurement data could only be used to predict the rice yield at the booting stage, and the prediction ability was lost at the heading stage due to the uneven occurrence of panicles. The UAV-based VIs had similar prediction performances to the ground measurements. Although the accuracy was high at the booting stage, the yield-estimation models were seriously saturated. To improve the prediction accuracy and reduce the saturation of the models, TCT was applied to eliminate the effect of the panicle emergence at the heading stage on the yield estimation. The TCT-component images at the booting and heading stages of the paddy fields were produced based on the six-band UAV images, including the brightness, greenness, and third component (wetness). It was more accurate to use the integration of the plot-level VIs and TCT-parameter information to estimate the rice yield than using VIs alone. Among all the parameters, the CIgreen × (T − G), NDRE × (T − G), WDRVI × (T/B), and NDVI × (T/B) at the booting stage, and NDRE × (T − G), NDVI × (T − G), SAVI × (T/B), and EVI2 × (T/B) at the heading stage, were the most accurate indicators for the rice-yield estimation under different nitrogen-fertilizer treatments, with estimation errors below 7%. The VIs, which combined the brightness, greenness, and third component, were more suitable for estimating the rice yield at the heading stage, with their advantages of high accuracy and low saturation. This paper can provide theoretical and technical support for crop-phenotype-parameter extraction and precision agriculture.

**Author Contributions:** Conceptualization, S.L. and X.J.; methodology, S.L.; software, W.J.; validation, S.F.; writing—original draft preparation, S.L.; writing—review and editing, S.L., K.Y. and Y.L.; funding acquisition, S.F. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Key Research and Development Project (2016YFD0101105), the Phenomics Research and New Variety Creation of Hybrid Rice Based on UAV Remote Sensing (2020BBB058), and the National High-tech Research and Development Program (863 Program) (2013AA102401).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


### *Article* **Weed Detection in Peanut Fields Based on Machine Vision**

**Hui Zhang, Zhi Wang, Yufeng Guo, Ye Ma, Wenkai Cao, Dexin Chen, Shangbin Yang and Rui Gao \***

College of Information and Management Science, Henan Agricultural University, Zhengzhou 450046, China **\*** Correspondence: ronnierrui@henau.edu.cn; Tel.: +86-13673626972

**Abstract:** The accurate identification of weeds in peanut fields can significantly reduce the use of herbicides in the weed control process. To address the identification difficulties caused by the cross-growth of peanuts and weeds and by the variety of weed species, this paper proposes a weed identification model named EM-YOLOv4-Tiny incorporating multiscale detection and attention mechanisms based on YOLOv4-Tiny. Firstly, an Efficient Channel Attention (ECA) module is added to the Feature Pyramid Network (FPN) of YOLOv4-Tiny to improve the recognition of small target weeds by using the detailed information of shallow features. Secondly, the soft Non-Maximum Suppression (soft-NMS) is used in the output prediction layer to filter the best prediction frames to avoid the problem of missed weed detection caused by overlapping anchor frames. Finally, the Complete Intersection over Union (CIoU) loss is used to replace the original Intersection over Union (IoU) loss so that the model can reach the convergence state faster. The experimental results show that the EM-YOLOv4-Tiny network is 28.7 M in size and takes 10.4 ms to detect a single image, which meets the requirement of real-time weed detection. Meanwhile, the mAP on the test dataset reached 94.54%, which is 6.83%, 4.78%, 6.76%, 4.84%, and 9.64% higher compared with YOLOv4-Tiny, YOLOv4, YOLOv5s, Swin-Transformer, and Faster-RCNN, respectively. The method has much reference value for solving the problem of fast and accurate weed identification in peanut fields.

**Keywords:** weed identification; YOLOv4-Tiny; attention mechanism; multiscale detection; precision agriculture

#### **1. Introduction**

Peanut is one of the leading oil crops in the world and is vital to global oil production. However, weed competition [1] is an essential factor restricting peanut production, reducing peanut production by 5–15% owing to annual grass damage. Research has shown that peanut production in farmlands with 20 weeds per square meter is 48.31% less than a no-weed control group. In addition, weeds facilitate the breeding and spread of diseases and insect pests, resulting in the frequent emergence of peanut diseases and insect pests [2]. The conventional weeding method of spraying pesticides incurs a significant amount of pesticide waste and causes irreversible pollution to the farmland. Owing to the development of precision agriculture [3], the investigation of site-specific weed management [4] for weed prevention and control has intensified gradually. An efficient detection and identification method for peanuts and weeds is necessary to achieve accurate weed control and management in the farmland.

Currently, many methods are proposed for weed detection, including remote sensing analysis [5], spectral identification [6], and machine vision identification [7]. The equipment required for remote sensing analysis and spectral identification methods is expensive and difficult to promote in agricultural production. The machine vision identification method has been widely used in weed identification because of its low cost and high portability. Bakhshish et al. [8] used Fourier descriptors and invariant moment features to form a shape feature set and implemented weed detection based on artificial neural networks. Rojas et al. [9] extracted the texture features of weeds using the gray-level co-occurrence matrix. They used principal component analysis to reduce the dimensionality of the

**Citation:** Zhang, H.; Wang, Z.; Guo, Y.; Ma, Y.; Cao, W.; Chen, D.; Yang, S.; Gao, R. Weed Detection in Peanut Fields Based on Machine Vision. *Agriculture* **2022**, *12*, 1541. https:// doi.org/10.3390/agriculture12101541

Academic Editor: Michele Pisante

Received: 6 August 2022 Accepted: 22 September 2022 Published: 24 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

features and finally used a support vector machine algorithm to complete the classification. Although these methods achieve the identification of crops and weeds, they rely excessively on the manual design and selection of image features, are susceptible to environmental factors such as lighting, and have poor stability and low recognition accuracy.

The development of deep learning technology [10] has enabled convolutional neural networks to reveal deeper features in images, which possess stronger generalization ability than manually selected features. Gai et al. [11] proposed an improved YOLOv4 model for fast and accurate detection of cherry fruit in complex environments. Khan et al. [12] established a weed identification system for pea and strawberry fields based on an improved Faster-RCNN, whose maximum average accuracy for weed recognition was 94.73%. Sun et al. [13] used YOLOv3 to identify Chinese cabbages in a vegetable field. They employed image processing methods to tag plants around Chinese cabbages as weeds. To detect weeds in a carrot field, Ying et al. [14] incorporated deep separable convolutions and an inverted residual block structure into YOLOv4 and replaced its backbone network with MobileNetV3-Small, which improved the recognition speed of the model; however, the average recognition accuracy was only 86.62%. The studies mentioned above indicate that although deep learning can solve the problem of manual feature design in conventional image processing methods, the following issues remain: 1) although using a deep-seated network model for weed detection improves the recognition accuracy, the recognition speed cannot satisfy real-time requirements owing to its large volume; 2) improving the recognition speed by trimming the model network renders the model insensitive to smaller target recognition and reduces its recognition accuracy.

In this study, peanuts and six types of weeds were used as recognition objects, and a weed recognition model based on the improved YOLOv4-Tiny [15] was developed to address the issues above. First, based on YOLOv4-Tiny, CSPDarkNet53-Tiny [16] was used as the backbone network of the model to ensure real-time detection performance; next, a multiscale detection model was implemented by introducing the detailed information of shallow-layer features in an FPN [17] to improve the ability of smaller target recognition. In addition, an ECA [18] module was used to calibrate the effective feature layer to enhance key information pertaining to weeds in the image. Finally, the soft-NMS [19] function was used in the output prediction layer to replace the NMS [20] function to filter the prediction box.

#### **2. Materials and Methods**

#### *2.1. Materials*

#### 2.1.1. Data Acquisition

The weed images used in this study were obtained from peanut fields in more than 20 areas in Henan Province, China. A Fuji Finepixs4500 camera was used to capture artificial images with a resolution of 2017 × 2155 in JPG format; 855 images were obtained, including those of a single weed, sparsely distributed weeds, and overgrown weeds. The images were captured at 7:00, 13:00, and 17:00 via high-angle overhead shots from approximately 70 cm relative to the ground. Based on investigation and screening, the weed types selected were Portulaca oleracea, Eleusine indica, Chenopodium album, Amaranth blitum, Abutilon theophrasti, and Calystegia. No imbalance was indicated between any two types of weeds. The shape and color of the six weeds are shown in Figure 1.

**Figure 1.** Shape and color of six weeds. (**a**) Portulaca oleracea, (**b**) Eleusine indica, (**c**) Chenopodium album, (**d**) Amaranth blitum, (**e**) Abutilon thophrasti, (**f**) Calystegia hederacea.

#### 2.1.2. Data Enhancement and Annotation

Overfitting in the training set caused by excessively small data sizes was prevented using the following methods: image horizontal and vertical flip, brightness increase and decrease (randomly increase or decrease the original brightness by 10%–20%), and Gaussian noise addition (variance σ = 0.05) for random image enhancement [21]. Figure 2 shows an example of the effect of data enhancement. The data enhancement method was only used in the training set. The expanded dataset contained 3355 images. Information regarding weeds and peanuts in the image was annotated using the LabelImg software. The annotation format was Pascal VOC2007, and the file type was .xml. The dataset was categorized into training and test sets. The number of pictures in each dataset is shown in Table 1.

**Figure 2.** Data enhancement.


**Table 1.** Dataset after data enhancement.

*2.2. Methods*

2.2.1. EM-YOLOv4-Tiny Network

YOLOv4-Tiny comprises four components: an input layer, a backbone network, an FPN, and an output prediction layer. The images received were uniformly scaled to a size of 416×416. The features were extracted from CSPDarkNet53-Tiny and then sent to the FPN for feature fusion. The location and category information of the target was obtained in the output prediction layer. CSPDarkNet53-Tiny primarily comprises a CBL module and a cross-stage partial (CSP) module [22]. The CBL module comprises a convolutional layer, batch normalization, and a Leaky Relu [23] activation function in series. It is the smallest module in the overall network structure and is used for feature control splicing and sampling. The CSP module is an improved residual network structure that can segment the input feature map into two components: the main component stacks the residual, and the other is fused in series with the main component after some processing. CSPDarkNet53- Tiny contains three CSP modules: CSP1, CSP2, and CSP3. As the dimensions of the output feature map are reduced, the location information in the CSP module becomes increasingly vague, the detailed information becomes increasingly scarce, and the ability to detect smaller targets is gradually weakened. To solve these problems, a path connected to the CSP2 layer in the FPN was added, while the output characteristics of the CSP2 layer were fused with the upsampling results in the channel dimension to form an output focused on the detection of smaller targets. The EM-YOLOv4-Tiny network structure is shown in Figure 3.

**Figure 3.** EM-YOLOv4-Tiny network Structure, where Conv is convolution, BN is batch normalization, Leak Relu is activation function, Maxpool is maximum pooling, ResUnit is the residual unit, Upsample is upsampling, ECA is efficient channel attention module, Contact is the feature fusion method of adding channel numbers, Yolo Head is the prediction anchor, CBL is series fusion module of Conv, BN, and Leak Relu, and CSP is cross-stage partial module.

To further improve the detection accuracy, the ECA module was used repetitively to process the effective features in the FPN. The attention module suppressed the background information in the image and enhanced the key information through weight calibration [24]. Regarding the predicted output, the EM-YOLOv4-Tiny network yielded three outputs of different scales, namely 13 × 13, 26 × 26, and 52 × 52.

#### 2.2.2. ECA Attention Mechanisms

Multiscale prediction for hierarchical detection was utilized in this study to detect smaller targets. Although shallow features have smaller receptive fields, which can enable better detection of smaller targets, they result in considerable irrelevant noises, thus affecting the network's ability to assess the importance of information obtained from an image. By introducing the ECA attention module into the neck section of YOLOv4-Tiny, the weed features in the image could be further enhanced while irrelevant background weights were suppressed.

In the ECA network, the input features were first pooled globally, and a single numerical value was used to represent the characteristics of each channel. Next, a fast one-dimensional convolution [25] of size k was performed to assign weights for each channel to realize information exchange between channels. Finally, the weight proportion of each channel was generated using the sigmoid function [26], and features with channel attention were obtained by merging with the original input features. More details about the ECA network can be found in Appendix A.

#### 2.2.3. Use of Complete Intersection over Union Loss

Owing to the scale invariance and non-negativity of the IoU [27], the latter is typically set as the bounding box loss function in conventional target detection networks. Specifically, IoU refers to calculating the ratio of the prediction box and the real box, which can better reflect the quality of the regression box. However, using IoU as the loss function still has some problems. On the one hand, when the positions of two bounding boxes do not intersect (IoU = 0), the loss function will become non-differentiable. On the other hand, when the overlap rate of prediction frames is the same, IoU cannot accurately reflect the location information of both.

Therefore, the CIoU [28] was used in this study as the loss function for training. Additionally, the overlap degree and the distance between the prediction and real boxes were considered comprehensively, and the aspect ratio of the prediction box was added as a penalty term to stabilize the regression results. More details about CIoU loss can be found in Appendix B.

#### 2.2.4. Soft-NMS Algorithm for Filtering Prediction Boxes

For the output and prediction of YOLOv4-Tiny, the NMS algorithm filters redundant prediction boxes around the target to be detected. The NMS algorithm deletes prediction boxes whose confidence is below the preset threshold, filters boxes that belong to the same category, and obtains the highest score in a specific area; hence, it effectively eliminates redundant bounding boxes. However, in cases involving dense weed growth or severe mutual occlusion between weeds and peanuts, the NMS algorithm deletes prediction boxes that belong to other targets, thus resulting in missed detections. To solve this issue, the soft-NMS instead of the original NMS was used in this study. When multiple prediction boxes appeared around a weed, their scores were multiplied by a weighting function to weaken those that overlapped with the box with the highest score. In this regard, the Gaussian [29] function was used as the weighting function, and the calculation is as follows:

$$\mathbf{Score\_{i}} = \mathbf{Score\_{i}} \cdot \mathbf{e}^{-\frac{\mathrm{lsI}\left(\mathbf{C\_{i}},\mathbf{B}\right)}{\sigma}} \tag{1}$$

where Scorei represents the score of the current box, Ci represents the current bounding box, and B represents the prediction box with the highest score. The greater the overlap between the prediction box and the box with the highest score, the stronger the weakening ability of the weighting function and the lower the score assigned to it.

#### 2.2.5. Model Performance Evaluation Indices

In this study, indices typically used in multiclass target detection models, such as precision, recall rate, mean average precision (mAP), and F1 value, were used to evaluate the model performance.

Precision indicates the proportion of correct detections in all the prediction boxes, and Recall indicates the proportion of correctly detected label boxes in all label boxes.

$$\text{Precision} = \frac{\text{TP}}{\text{TP} + \text{FP}'} \tag{2}$$

$$\text{Recall} = \frac{\text{TP}}{\text{TP} + \text{FN}'} \tag{3}$$

where TP represents the number of correctly detected weeds; FP represents the number of incorrectly detected weeds; and FN represents the number of missed detections of weeds.

AP represents the average precision of a class of detected objects, and mAP is the mean average value of AP for all classes.

$$\text{AP} = \int\_{1}^{0} \text{Precision} \,\text{d Recall},\tag{4}$$

$$\text{mAP} = \frac{1}{N} \sum\_{1}^{N} AP(k) \tag{5}$$

The F1 value can be regarded as a harmonic mean of Precision and Recall, as follows:

$$\text{F1} = 2 \times \frac{Precision \times Recall}{Precision + Recall} \tag{6}$$

The evaluation indices selected in this study were calculated based on a threshold of 0.5. In the follow-up experiments, the mAP was used as the primary performance evaluation index of the model.

#### 2.2.6. Model Training

The software and hardware environment of model training and testing are shown in Table 2. In order to further improve the recognition accuracy of the model, this study used a transfer learning method to initialize the weights of the model. Before model training, the EM-YOLOv4-Tiny network was pretrained with the Pascal VOC dataset, and the weight file with the highest map in the training results was used as the pretraining weight to initialize the model. Meanwhile, the K-means [30] algorithm was used to cluster the anchor boxes in the dataset, and a total of 9 anchor boxes with different sizes were obtained: (19, 31), (56, 62), (90, 82), (103, 158), (149, 125), (175, 217), (250, 171), (241, 291), and (320, 335). This makes the true size of the anchor frame closer to the size of the weed to be detected. During training, the number of samples in each batch was set to 16, and the loading of the entire training set was considered an iteration. The adaptive moment estimation algorithm was used to optimize the model, the initial learning rate was set to 0.001, and the cosine annealing algorithm was employed for attenuation. After 150 iterations, the model converged.


**Table 2.** Training and test environment configuration table.

#### **3. Results**

*3.1. Performance Evaluation of EM-YOLOv4-Tiny*

Based on the standard of the MS COCO dataset provided by Microsoft, weeds with a resolution lower than 32×32 were defined as smaller targets. Several types of weeds exist in peanut fields, with some being smaller in morphological appearance than others. The standard YOLOv4-Tiny network tends to misdetect when identifying smaller targets. Based on the comparison results of EM-YOLOv4-Tiny and YOLOv4-Tiny using the same test set as shown in Table 3, the recognition precision rates of the EM-YOLOv4-Tiny for smaller targets and all targets were 89.65% and 94.54%, respectively, which surpassed the precision rates of the original network by 10.12% and 6.83%, respectively. The improved network combined the location and detailed information of the shallow-layer feature and improved the ability to identify smaller weeds via the addition of a channel attention mechanism, which suppresses the abundant noise in smaller receptive fields. The recognition performances before and after the network improvement are shown in Figure 4. The EM-YOLOV4-Tiny network included a new scale output in the neck section while the backbone network structure of the model remained unchanged, and the average inference time of each image increased to only 4.4 ms, indicating that the proposed network maintained a high inference speed while improving the recognition precision.

**Table 3.** Comparison of detection results of YOLOv4-Tiny and EM-YOLOv4-Tiny.


**Figure 4.** Comparison of detection results of YOLOv4-Tiny and EM-YOLOv4-Tiny, where (**a**–**c**) represent the recognition effect of the YOLOv4-Tiny model, and (**d**–**f**) represent the recognition effect of the EM-YOLOv4-Tiny model.

#### *3.2. Performance Comparison of Improved Methods*

To further demonstrate the effectiveness of the improved method in enhancing the model performance, different modules were benchmarked against the original YOLOv4- Tiny target detection network. The results are shown in Table 4.


**Table 4.** Influence of different improved modules on YOLOv4-Tiny network.

scale3 represents an improved strategy for employing multiscale detection in the network.

After obtaining the anchor box using the K-means clustering algorithm, the mAP and F1 values of the model were 1.2% and 2% higher than the original values, respectively, indicating a better match in size between the anchor box and the target to be detected. When using the soft-NMS algorithm to filter the prediction box, the recognition precision decreased. Still, the recall rate increased by approximately 10%, indicating the effectiveness of soft-NMS in improving missed detections. When a new functional layer was added to focus on detecting smaller targets, the detection time increased slightly, but the mAP and F1 values increased by approximately 3%. When the ECA attention mechanism was introduced into the network, the noise caused by shallow features was reduced, and Recall increased by 3%. In general, the proposed methods improved the weed detection performance of the network.

#### *3.3. Performance Comparison of Different Attention Mechanisms*

To further verify the advantages of the channel attention mechanism used in this study, under the same experimental conditions, the SE attention mechanism and CBAM attention machine were used as controls at the same location as the network. The experimental results are shown in Table 5.


**Table 5.** Performance comparison after using different attention modules.

Base represents the combined model obtained by using methods of K-Means, multiscale strategy, and soft-NMS, and its result can be found in Table 4.

Compared with the ECA attention network, the SE network uses a full connection to realize information exchange between channels, which increases the computational load and causes feature loss due to dimensionality reductions. The CBAM network is a convolutional block attention module that introduces location information in the channel dimension using the global maximum pool. However, it is limited to local range information instead of long-range dependent information. As shown in Table 5, after different attention mechanisms were added, each performance index improved compared with those of the original model. Among them, the ECA attention module outperformed the others; its mAP was higher than those of the other two attention modules by 2.22% and 1.39%, respectively, implying that the ECA network is more suitable for the model used in this study.

Similarly, in order to further explore the impact of the attention module on the weed detection model, the grad cam method was used in this study to visually analyze the features of the networks before and after adding the attention mechanism. From the detection results in Figure 5, we know that when the attention mechanism module is not added, the network will appear to pay attention to the background information when performing the detection. In contrast, the network incorporating the attention mechanism pays more attention to the information of the object to be detected through the recalibration of the weights. Comparing the feature visualization results of the three attention networks, the ECA network used in this study shades darker on the small target weeds in the images, indicating more attention to the information of small targets.

**Figure 5.** Visual heat map of attentional mechanisms, where (**a**) represents the original image; (**b**) represents the results of using the base model; (**c**) represents the results of using the base model and the SE attention mechanism; (**d**) represents the results of using the base model and the CBAM attention mechanism; (**e**) represents the results of using the base model and the ECA attention mechanism.

#### *3.4. Comparison of Performance with Different Network Models*

To verify the efficiency and practicability of the proposed model, several classical target detection models, such as YOLOv4, YOLOv5s, and the Faster-RCNN, were used to test the efficiency of weed detection. In the comparison experiments, strict control was exerted over the parameters. Specifically, 416×416 images were used uniformly as the input to the training network, and identical training and test sets were used throughout the experiments. The results are shown in Table 6.


**Table 6.** Performance comparison results of multiple target detection networks.

As shown in Table 6, the average recognition accuracy of all types of networks for weeds exceeded 85%. The mAP of the EM-YOLOv4-Tiny network proposed herein was 94.54%, and its F1 value was 0.9, which is higher than those of the other four target detection networks. Because the test set contained a few smaller target weeds, the Faster-RCNN network did not construct an image pyramid and was insensitive to the detection of smaller targets, resulting in a low Recall and a mAP of only 87.71%. Compared with YOLOv4, the proposed network introduced multiscale detection and the attention mechanism based on

YOLOv4-Tiny, whose mAP and F1 were 4.78% and 10% higher than those of the YOLOv4 network, respectively. Moreover, the volume and number of parameters of the proposed model were much smaller than those of the original YOLOv4 network, indicating that the improved network preserved the merit of lightness. The lightweight YOLOv5s and EM-YOLOv4-Tiny exhibited similar model volumes and testing times; however, the mAP of YOLOv5s was only 87.78%, which was similar to that of the original YOLOv4-Tiny. Although the lightweight network had a simple structure, it was susceptible to overlooking occluded and smaller targets during detection.

Transformer-based target detection networks like Swin-Transformer and DETR were also trained and tested on the dataset in this study. The recognition accuracy is generally better than that of the CNN-based network. Still, the size of the model and the slow detection speed is not conducive to the deployment and development of embedded devices. It is worth mentioning that the Transformer structure is on an unstoppable trend to overtake the CNN structure in the existing studies. In future research, this study will also consider incorporating the Transformer structure into EM-YOLOv4-Tiny, working to improve the accuracy of the model further.

#### *3.5. Comparison of Performances under Different Scenarios*

To evaluate the robustness of the model in different scenarios, three different datasets were prepared based on the different growth densities of peanuts and weeds: single weed, sparsely distributed weeds, and overgrown weeds. The test results obtained using the proposed network on the three datasets are shown in Table 7 and Figure 6. The experimental results show that the proposed model performed favorably in terms of weed detection under different growing conditions and accurately located peanuts and various weeds via boundary box regression. The average recognition accuracies of the three datasets mentioned above were 98.48%, 98.16%, and 94.3%, respectively, with a mean value of 96.98%. When the density of peanuts and weeds was high, the model accurately identified occluded weeds while demonstrating excellent recognition of small target weeds.


**Table 7.** Performance comparison results of models in different scenarios.

**Figure 6.** Effect of model recognition under different scenarios.

#### **4. Discussion**

#### *4.1. Deep Learning for Weed Detection*

In this study, the target detection technology based on deep learning was used to detect weeds in peanut fields and achieved good results. In similar weed detection work, many researchers [31,32] used unmanned aerial vehicles (UAVs) with intelligent sensors to detect weeds in the field. The UAV can cover a large area in a short time and generate a weed map of the field to guide the weeding device to the designated area for weeding. However, producing a weed map is very challenging due to the similarity of the crops and the weeds. In contrast, deep learning technology can automatically learn the discriminant characteristics between crops and weeds through a deep convolution neural network, which can better solve the problem of weed detection in a complex environment. Hussain [33] used the improved YOLOv3-Tiny network model to detect two kinds of weeds in the wild blueberry field, and the F1 values of the two kinds were 0.97 and 0.90, respectively. This also shows the great potential of the deep learning method in the field of weed detection. However, the actual agricultural production environment is often changeable and uncontrollable. The proposed method may also have certain limitations when the application scenario changes, such as a large increase in weed species and extreme weather. Although deep learning technology has a strong learning and adaptive ability, it must be combined with many other technologies to contribute to agricultural development.

#### *4.2. Challenge of Small Target Detection*

Small target detection has always been a research hotspot in the field of target detection. Multiscale detection and feature fusion are the most commonly used methods to solve the problem of small target detection. In this study, the idea of multiscale detection and the attention mechanism were introduced into YOLOv4-Tiny, which improved the recognition ability of the model for small target weeds. The multiscale feature learning method improves the sensitivity of the original network to small target detection by fusing the details of shallow features. The attention module recalibrates the input features with weights, which makes up for the defect that the receptive field of shallow features is small and easily produces noise. However, the existing feature fusion methods, such as concatenation, cannot fully take into account the feature information of the context, which also leads to the model missing or falsely detecting weeds on some small targets. In agricultural production, many application scenarios for small target detection will also exist. The pests are too small and mostly have protective colors, making pest detection a challenge in the pest control process. The accurate identification and positioning of small fruits and vegetables is also key to fruit and vegetable picking. Therefore, small target detection remains a more significant challenge in agriculture. Fortunately, the detection regarding small targets has been ongoing. Wei et al. [34] used a Path Aggregation Feature Pyramid Network (PAFPN) structure to fuse the multiscale features obtained by the Attention Mechanism Network to get high-level multiscale semantic features. The global feature fusion method, like PAFPN, is better than the local feature fusion method in small target detection. Therefore, in subsequent research we will consider adding appropriate feature fusion algorithms to our own networks to further improve the recognition ability of the model for small targets.

#### *4.3. Limitations and Shortcomings*

Although the network proposed in this study can better identify weeds in peanut fields, some noteworthy problems still need further research. First of all, the data in this study only include weeds in the peanut seedling stage, and the collected area is only in Henan Province, China. Future research will focus on collecting weed data in peanuts in other growth stages and will cover as many regions as possible. Secondly, although the network in this paper improves the recognition accuracy of the model compared with the original YOLOv4-Tiny network, it also increases the volume of the model to a certain extent. Zhang et al. [35] used the deep separation convolutional network to replace the original convolutional network, which not only improved the accuracy of the model but also reduced the number of parameters and calculations of the network. In subsequent research, we plan to introduce this method into the network of this paper. Finally, the improvement strategy of the multiscale detection and the attention mechanism has been proved to be highly practical in this study. Still, other advanced research continues, such as on the Transformer [36], the Generative Adversarial Network [37], and so on, which have attracted more and more attention. It is worth further exploring the introduction of these technologies into our own network and improving the detection performance of the model.

#### **5. Conclusions**

To rapidly and accurately identify various types of weeds in peanut fields, a weed recognition method named EM-YOLOv4-Tiny was proposed. Based on YOLOv4-Tiny, multiscale detection and the attention mechanism were introduced, the CIoU was used as the loss function for training, and the soft-NMS method was used to screen the prediction box to improve the model performance in identifying small targets. The proposed model shows better recognition accuracy than Faster-RCNN, YOLOv5s, YOLOv4, and Swin-Transformer. In addition, the volume of the EM-YOLOv4-Tiny model was 28.7 M, and the single detection time was 10.9 ms, which rendered the model suitable for the embedded development of intelligent weeding robots.

In future work, this research will transplant the constructed model to a suitable embedded device for testing and select an intelligent spraying device to complete the precise weeding in the peanut field. In addition, the model will also be used in applications on smartphones so that farmers can better understand field information and make timely decisions.

**Author Contributions:** Conceptualization, H.Z.; methodology, H.Z. and Z.W.; software, Y.G.; validation, H.Z., Z.W., and Y.M.; formal analysis, Y.G.; investigation, Y.M.; resources, Y.G.; data curation, Z.W., Y.M., D.C., and W.C.; writing—original draft preparation, Z.W.; writing—review and editing, H.Z. and Z.W.; visualization, S.Y.; supervision, H.Z.; project administration, R.G.; funding acquisition, H.Z. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Key Research and Development program of Henan Province (No. 212102110028) and Henan Provincial Science and Technology Research and Development Plan Joint Fund (No. 22103810025).

**Institutional Review Board Statement:** Not applicable.

**Data Availability Statement:** The datasets used and/or analyzed during the current study are available from the corresponding author upon reasonable request.

**Acknowledgments:** The author would like to thank all contributors to this study.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

The ECA network structure is shown in Figure A1. In the ECA network, a fast one-dimensional convolution with a convolution kernel k was performed to realize local cross-channel interactions, which reduced the computational workload and complexity of the entire connection layer. A positive interaction occurred between the channel dimension C and the convolution kernel size k, i.e., a larger C resulted in a larger k. The relationship between the two can be expressed as follows:

$$\mathbf{C} = \mathcal{Q}(\mathbf{k})\tag{A1}$$

C is typically measured in an exponential multiple of 2. Therefore, the relationship between the two can be more reasonably expressed as follows:

$$\mathbf{C} = \mathcal{Q}(\mathbf{k}) = \mathcal{Q}^{(\gamma \times \mathbf{k} - \mathbf{b})},\tag{A2}$$

Here,

$$\mathbf{k} = \boldsymbol{\varphi}(\mathbf{C}) = \left| \frac{\log\_2(\mathbf{C})}{\mathbf{r}} + \frac{\mathbf{b}}{\boldsymbol{\chi}} \right|\_{\text{odd}} \tag{A3}$$

where |n|odd represents the odd number closest to n, with γ and b being 2 and 1, respectively.

**Figure A1.** ECA network structure, where C is the channel dimension of the input data, H is the height of the input data, and W is the width of the input data. GAP denotes global average pooling, and k denotes the size of the convolution kernel using fast one-dimensional convolution.

#### **Appendix B**

As shown in Figure A2, the CIoU bounding box regression loss function directly minimizes the normalized distance between the predicted box and the real target box, taking into account the overlapping area of the detection box as well as the distance from the center point of the detection box. The measurement parameter of the consistency of the aspect ratio between the detection frame and the real target frame is also added to make the model more inclined to optimize in the direction of the dense overlapping area.

**Figure A2.** CIoU diagram, where r represents the center point distance d of the two detection boxes, and d represents the distance between the diagonals of the smallest rectangle containing the two detection boxes.

The loss function of the CIoU is calculated as follows:

$$\text{CIo} \text{I} \text{I}\_{\text{Loss}} = 1 \text{ -- CIo} \text{U} = 1 \text{ -- IoU} + \frac{\rho^2(b, c)}{d^2} + \text{av} \tag{A4}$$

where d represents the distance between the diagonals of the smallest rectangle containing the two boxes; b and c represent the coordinates of the central points of the real and prediction boxes, respectively; *ρ*2(b, c) is the function for solving the Euclidean distance between the two mentioned points; and av is the penalty term for border scale.

The a in Equation (7) is the parameter used to balance the ratio, and v is the parameter that measures whether the ratio of the true frame is consistent with the predicted frame. The calculation of both is as follows:

$$\mathbf{v} = \frac{4}{\pi^2} \left\{ \arctan \frac{w^c}{h^c} - \arctan \frac{w^b}{h^b} \right\}^2 \tag{A5}$$

$$\mathbf{a} = \begin{cases} \begin{array}{c} 0 \\ \frac{v}{(1 - \operatorname{IoI}) + v'} \end{array} \text{if } IoI < 0.5\\ \begin{array}{c} \frac{v}{(1 - \operatorname{IoI}) + v'} \text{if } IoI \ge 0.5' \end{array} \end{cases} \tag{A6}$$

where *wc* and *hc* represent the width and height of the prediction box, and *wb* and *hb* represent the width and height of the real box.

#### **References**


**Xueqin Jiang 1,2,†, Shanjun Luo 2,†, Qin Ye 1,\*, Xican Li <sup>3</sup> and Weihua Jiao <sup>4</sup>**


**Abstract:** Soil is one of the most significant natural resources in the world, and its health is closely related to food security, ecological security, and water security. It is the basic task of soil environmental quality assessment to monitor the temporal and spatial variation of soil properties scientifically and reasonably. Soil moisture content (SMC) is an important soil property, which plays an important role in agricultural practice, hydrological process, and ecological balance. In this paper, a hyperspectral SMC estimation method for mixed soil types was proposed combining some spectral processing technologies and principal component analysis (PCA). The original spectra were processed by wavelet packet transform (WPT), first-order differential (FOD), and harmonic decomposition (HD) successively, and then PCA dimensionality reduction was used to obtain two groups of characteristic variables: WPT-FOD-PCA (WFP) and WPT-FOD-HD-PCA (WFHP). On this basis, three regression models of principal component regression (PCR), partial least squares regression (PLSR), and back propagation (BP) neural network were applied to compare the SMC predictive ability of different parameters. Meanwhile, we also compared the results with the estimates of conventional spectral indices. The results indicate that the estimation results based on spectral indices have significant errors. Moreover, the BP models (WFP-BP and WFHP-BP) show more accurate results when the same variables are selected. For the same regression model, the choice of variables is more important. The three models based on WFHP (WFHP-PCR, WFHP-PLSR, and WFHP-BP) all show high accuracy and maintain good consistency in the prediction of high and low SMC values. The optimal model was determined to be WFHP-BP with an R2 of 0.932 and a prediction error below 2%. This study can provide information on farm entropy before planting crops on arable land as well as a technical reference for estimating SMC from hyperspectral images (satellite and UAV, etc.).

**Keywords:** soil moisture content; spectral processing technology; hyperspectral; principal component analysis; feature parameters extraction

#### **1. Introduction**

Soil moisture content (SMC) is the carrier of material and energy cycle in the soil system, which has an important influence on soil characteristics, vegetation growth and distribution, and the regional ecosystem [1,2]. Meanwhile, the SMC is related to soil nutrient contents by facilitating organic matter decomposition [3], enhancing carbon sequestration [4], and resulting in an increase in crop yield [5]. In agriculture, a timely and effective grasp of the distribution and future trend of soil moisture in the field is of great significance to effectively save water resources, improve the utilization efficiency of agricultural water and sustainable utilization of water and soil resources, and effectively monitor and control farmland drought in real time [6,7].

**Citation:** Jiang, X.; Luo, S.; Ye, Q.; Li, X.; Jiao, W. Hyperspectral Estimates of Soil Moisture Content Incorporating Harmonic Indicators and Machine Learning. *Agriculture* **2022**, *12*, 1188. https://doi.org/ 10.3390/agriculture12081188

Academic Editor: Jose L. Gabriel

Received: 25 July 2022 Accepted: 8 August 2022 Published: 10 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

285

The traditional artificial SMC measurement method, which is based on point and laboratory measurement, has high precision but the limited scope, a large workload, low efficiency, and high cost, and is difficult to meet the actual needs of SMC monitoring [8,9]. Remote sensing and satellite data have been widely used in monitoring soil and crop systems, such as soil organic matter [10], crop evapotranspiration [11], water stress [12], and yield monitoring [13]. In the case of soil moisture, researchers have reported that hyperspectral imagery has more advantages over regular satellite-based multi-spectral imagery owing to the higher information level stored in the hyperspectral images [14]. Accordingly, hyperspectral remote sensing (HRS) technology has been widely used in SMC monitoring research due to its advantages of large area, non-contact, and timeliness, making up for the shortcomings of traditional methods [15]. HRS can be used for large-scale non-destructive monitoring by analyzing the spectral variation characteristics of different soil properties, which is more suitable for assessing and mapping the spatial variation of soil properties [16]. As a robust stoichiometric means, soil spectroscopy has been proven to be an effective alternative to wet chemistry in soil environmental quality monitoring [17]. However, there are obvious spectral noise and serious scattering phenomena in the original soil spectral data obtained by HRS [18]. There is inevitably noise unrelated to SMC in the soil hyperspectral, which will increase the detection difficulty of SMC. In addition, HRS contains huge amounts of data. Therefore, more thorough denoising and variable optimization become the key to establishing a model with higher accuracy [19].

In the aspect of hyperspectral data preprocessing, many studies have been carried out, such as reciprocal, logarithm, and first differential studies [20–22]. Because the soil spectral curve is the comprehensive expression of the interaction and superposition of various substances, the determination of characteristic bands is not only difficult, but also has a high degree of uncertainty and weak denoising. Subsequently, scholars used spectral denoising methods to process hyperspectral data, such as Savitzky–Golay filtering, median operation, moving average, etc. However, for white noise, especially random and low-frequency signals, these methods are difficult to remove noise without affecting the effective signal [23]. The wavelet packet transform (WPT) can compress the signal while retaining the original information and has been gradually used in the estimation of soil properties and achieved certain results. For example, Gu et al. found that the high-frequency coefficient generated by wavelet transform and random forest algorithm can be used to invert soil organic matter content [24]. Given the above spectral pretreatment technologies, some new methods for estimating SMC still need to be explored.

In the study of SMC estimation, the estimation accuracy of SMC depends on the selection of characteristic variables and the estimation model. At present, there are two kinds of models for estimating soil composition based on soil spectral properties: the physical model based on mechanism information and the statistical model based on experience. In the mechanism model method, the quantitative change mechanism of soil reflectance caused by different water content is very complex, and its inversion effect and adaptability of results are limited [8]. However, the widely used statistical model has the advantages of being simple and direct and can obtain accurate and stable results. At present, the estimation of soil characteristics by soil spectra mostly adopts stepwise multilinear regression [25,26], principal component regression [27], neural network regression [16,28], support vector machine regression [17,29], and partial least squares regression [30,31]. The relationship between SMC and soil hyperspectral is complex and has great nonlinearity and randomness. Its spectral characteristics are difficult to be explained by several bands. Therefore, the simple regression model has certain deficiencies in dealing with nonlinear, heteroscedasticity, multicollinearity, and other complex problems, and it is difficult to obtain good estimation accuracy [32]. In SMC estimation, these methods inevitably lead to missing or redundant information, which directly affects the results. There is a need to explore approaches that can overcome these obstacles, such as machine learning. The neural network model has a strong nonlinear approximation ability, can effectively establish the global nonlinear mapping relationship between input and output [33–35], and

has advantages in data fitting, function approximation, and other aspects [36,37]. Good results have been achieved by using the neural network model to estimate soil composition. For example, Pellegrini et al. obtained satisfactory results by using the artificial neural network in estimating soil microbial biomass [16].

In this paper, the hyperspectral data of different types of soils were measured to analyze the variation trend of reflectance with different SMC. Meanwhile, some spectral processing technologies and PCA were employed to extract characteristics variables for estimating SMC of mixed soils. This non-destructive estimation technique is simple, fast, and time efficient. Finally, the PCR, PLSR, and back propagation (BP) regression models were constructed and compared with the spectral-index models. The use of machine learning makes full use of its nonlinear learning characteristics to achieve accurate estimation of SMC under different conditions. Our objectives are (1) to compare the role of characteristic parameters obtained by different spectral processing techniques in estimating SMC, (2) to compare the performance of different regression algorithms in estimating SMC, and (3) to compare the importance of the selection of characteristic variables with the selection of regression models and to construct the SMC high-precision prediction model suitable for mixed soil scenarios.

#### **2. Materials and Methods**

#### *2.1. Study Area*

The study area is located in Hengshan County, Northern Shaanxi Province, China. As shown in Figure 1, the sampling areas are located in the Loess Plateau of Northern Shaanxi, adjacent to the Mu Us Desert in the north and the Loess hill in the south. The region has a temperate semi-arid continental monsoon climate with a year-round average daily temperature of 8.6◦, and the general characteristics of temperature and rainfall are large inter-annual and inter-monthly variations. The soil types mainly include sandy and loessial soil (SS and LS). The sampling points of different soil types are evenly distributed in the whole study area as far as possible. The main tributaries in the area include the Wuding River, Dali River, etc. Due to these geographical factors, the experimental area is not only rich in soil types, but also has great differences in SMC, which is of great significance for the study of SMC estimation.

**Figure 1.** Study region and soil sampling area (the blue dots show the sampling areas).

#### *2.2. Soil Spectral Measurement*

The collected soil samples are quickly measured for spectral data in the laboratory. The soil spectral reflectance was measured using the ASD Field Spec FR spectrometer (Analytical Spectral Devices, Inc., Boulder, CO, USA), with a wavelength range of 350–2500 nm. The soil samples were placed in a black vessel (with a diameter of 8 cm and a depth of 2 cm) in turn, and their surface was scraped flat. A 50 W halogen lamp was used as the light source, and the distance between the light source and the experimental sample is 0.5 m. The distance between the spectrometer probe and the soil sample was 0.2 m. Before each spectral measurement, the diffuse reflection standard reference plate was used for calibration. Four spectral curves were collected for each soil sample, and their arithmetic mean value was taken as the spectral data of the soil sample.

#### *2.3. Determination of SMC*

To obtain more accurate and regionally representative SMC data, the destructive sampling approach was recommended [38]. The areas with flat terrain, exposed surface, and no shelter were selected as the sampling areas. About 20 sampling points were determined in total in the sampling areas (Figure 1). In addition, different soil types were considered in sampling, and a total of 84 soil samples were collected. The soil samples were collected from the surface soil with a depth of 0.2 m. They were brought back to the laboratory through aluminum boxes to avoid water evaporation. The soil samples placed in the aluminum box were dried in the oven (105 ◦C) until the weight did not change, and the SMC was measured by the drying method. The calculation formula is as follows:

$$\text{SMC} = \frac{M\_1 - M\_2}{M\_2 - M\_3} \times 100\% \tag{1}$$

where *M*<sup>1</sup> is the total weight of the aluminum box and soil before drying, *M*<sup>2</sup> is the total weight of the aluminum box and soil sample after drying, and *M*<sup>3</sup> is the weight of each aluminum box after drying.

#### *2.4. Spectral Indices Construction*

Since the strong absorption of water leads to changes in reflectance, spectral indices with some physical significance calculated from the reflectance of different bands have been proposed for predicting SMC. Due to the unambiguous physical significance, some spectral indices have been proposed to predict SMC. However, these parameters inevitably remain somewhat regional and generalized. To compare with the method presented in this study, we selected some common two- and three-band spectral indices (Table 1).


**Table 1.** The common spectral indices selected in this paper.

#### *2.5. Spectral Processing Technologies*

Spectral preprocessing is very useful for feature extraction and noise removal [30]. For example, WPT can perform a more detailed decomposition and reconstruction of high and low-frequency information of hyperspectral data [19]. This information processing result has no redundancy or omission, which is more conducive to spectral information noise reduction and original information retention, so it is widely used. In this research, the decomposition and reconstruction of the spectral data by WPT were performed according to the following steps.

(i) Wavelet packet analysis. The wavelet master function used in the study was Db10 [42], by which the noise-bearing spectra were decomposed.


Spectral measurements are susceptible to factors, such as observation angle and illumination, making the signal-to-noise ratio of spectral data comparatively poor. After differential processing, not only can the influence of changes in illumination conditions on the target spectra be reduced, but also the background can be partially eliminated, thus better strengthening the spectral variance and highlighting the target characteristics. The first-order differential (FOD) treatment can improve the spectral sensitivity and eliminate the influence of the partial environmental background to reveal the spectral characteristics of the soil interior. The FOD was calculated as follows.

$$\text{Ref}'(\lambda\_i) = \left[ \text{Ref}(\lambda\_{i+1}) - \text{Ref}(\lambda\_{i-1}) \right] / (\lambda\_{i+1} - \lambda\_{i-1}),\tag{2}$$

where λ*i*−1, λ*i*, and λ*i*+1 are the wavelengths of adjacent bands and Ref is the first-order differential value.

However, none of these traditional methods can obtain robust and noiseless characteristic variables. Harmonic decomposition (HD) transforms hyperspectral data from the time domain to the frequency domain in the form of sine and cosine phase superposition, and finally obtains parameters such as residual term, amplitude, and phase. The calculation method is shown in Figure 2. These variables can reveal the average value and variation of the energy, and the position of the maximum value in different bands of the spectra.


**Figure 2.** Illustration of the harmonic decomposition algorithm in pseudo-code.

#### *2.6. Model Construction and Validation*

After the correlated characteristic variables (WF and WFH) were obtained by spectral processing technologies (WPT, FOD, and HD), they need to be dimensionally reduced to remove redundancy. The principal component analysis (PCA) method can recombine original variables into a group of new comprehensive variables unrelated to each other to achieve feature extraction and dimension reduction [43]. When performing PCA, the components whose cumulative variance contribution rate exceeds 95% of the variable is taken as the new characteristic variable in this study.

It is very important to determine the regression model based on the relationship between independent and dependent variables for accurate estimation of SMC. Principal component regression (PCR) is one of the common methods to solve the problem of collinearity in logistic regression analysis [44]. It integrates the information of variables with high correlation into the principal component with low correlation through principal component transformation and then replaces the original variable to participate in regression calculation. Partial least squares regression (PLSR) is more commonly used as a linear multiple regression analysis method [45]. By analyzing the relationship between the prediction matrix X (independent variable) and the response matrix Y (dependent variable), the initial input data are projected into a potential space, and then many potential variables are extracted by using orthogonal structure, and the linear relationship between these new variables and Y is found. This method does not directly consider the regression modeling of the dependent variable and independent variable, but comprehensively screens the information in the variable system, and selects several new components with the best explanatory ability for the system for regression modeling. Through such information screening, the noise that has no explanatory effect on the dependent variable is eliminated. Backpropagation (BP) neural network is a widely used nonlinear modeling method in the artificial neural network, which is suitable for data prediction [46]. The learning process is composed of forwarding propagation and backpropagation. In the forward propagation process, input data are gradually processed from the input layer to the output layer through the hidden layer. If the data error obtained by the output layer is not within the allowed range, the error is backpropagated and the weight of each neuron is adjusted layer by layer by the gradient descent method. Until the error meets the specified requirements, it has a better estimation effect for complex nonlinear prediction. In this paper, we choose these three methods to conduct regression modeling for spectral characteristic parameters and SMC and compare their advantages and disadvantages.

Hyperspectral estimation of SMC based on spectral processing technologies and PCA mainly includes the following four steps (Figure 3):


$$\text{RMSE} = \sqrt{\sum\_{i=1}^{n} \frac{\left(y\_i - \hat{y}\_i\right)^2}{n}},\tag{3}$$

$$\text{MAE} = \frac{1}{n} \sum\_{i=1}^{n} \left| y\_i - \hat{y\_i} \right|, \tag{4}$$

where *yi* is the true value, *y*ˆ*<sup>i</sup>* is the predicted value, and n is the number of samples.


**Table 2.** Descriptive statistics of SMC in soils.

**Figure 3.** Flowchart indicating experimental methodology (WPT: wavelet packet transform; FOD: firstorder differential; HD: harmonic decomposition; WFP: WPT-FOD-PCA; WFHP: WPT-FOD-HD-PCA).

#### **3. Results**

#### *3.1. Comparison of Hyperspectral Characteristics of Soils with Different SMC*

Some spectral curves over the whole moisture content range were randomly selected for comparison. Hyperspectral curves of different soil types (LS, SS, and MS) are shown in Figure 4. The spectral curves of different soil types have similar shapes and the absorption characteristics of water at 1450 nm and 1960 nm dominate the spectral characteristic curves of soil. For LS, the reflectance of all observation bands generally decreases with the increase of SMC (Figure 4a). However, for SS and MS, the variation of spectral reflectance with SMC does not show a consistent variation law (Figure 4b,c). For these three different soil types, the sensitivity of spectral reflectance to SMC is low in visible and near-infrared bands, and the change is more obvious in other bands. Moreover, the characteristic of mineral absorption at 2200 nm is obvious when SMC is low but disappears gradually with the increase of SMC.

**Figure 4.** Hyperspectral curves of different soil types: (**a**) LS; (**b**) SS; (**c**) MS.

#### *3.2. Estimation of SMC by Conventional Spectral Indices*

The SMC data in the calibration set were adopted as the dependent variables, and five commonly available spectral indices (EVI, TVI, DSI, NDMI, and SARVI) were applied as independent variables to construct the inverse models using linear regression and the PLSR method, and the validation results were shown in Figure 5. The results showed that the selected spectral indices had limited accuracy in predicting the SMC of mixed soil types. Except for TVI, the remaining four indices exhibited varying degrees of overestimation or underestimation at different SMC. Although TVI did not demonstrate overestimation or underestimation (the regression line was close to the 1:1 line), the model errors were large and the points deviating from the 1:1 line were more clustered. Compared with the individual spectral indices inversion results, the PLSR model based on five indices had a higher accuracy (R<sup>2</sup> over 0.8 and error below 4%). In addition, the model did not exhibit local overestimation or underestimation.

#### *3.3. Correlation Analysis between Spectral Data and SMC*

The correlation analysis between the original spectral data and the processed data of the original spectra (including WPT, FOD, and HD) and SMC was performed. The results are shown in Figure 6. SS, LS, and MS indicate the Pearson correlation coefficient (r) between the original spectra of different soil types (LS, SS, and MS) and the corresponding SMC. WO and WF represent the correlation between the WPT of original spectral data and FOD after WPT and SMC, respectively. The original spectral reflectance of SS is highly correlated with SMC except for the visible bands (|r| > 0.6, *p* < 0.01). The correlation between LS and SMC becomes much weaker (|r| < 0.5, *p* < 0.01). For MS, the correlation is between SS and LS (about 0.5, *p* < 0.01). Therefore, for the estimation of SMC of MS, parameters with higher correlation need to be extracted. Compared with the original spectra, WO does not significantly improve the correlation with SMC. Although WF cannot improve the correlation with SMC in all bands, it can produce parameters with a strong

correlation in many characteristic bands. Finally, 180 characteristic bands were selected from WF data with |r| > 0.6 to estimate SMC.

**Figure 5.** The comparison of measured and estimated SMC: (**a**) EVI; (**b**) TVI; (**c**) DSI; (**d**) NDMI; (**e**) SARVI; (**f**) PLSR model of five spectral indices.

**Figure 6.** The Pearson correlation coefficient between spectra and SMC.

#### *3.4. Harmonic Characteristic Parameter Acquisition*

The feature parameters of harmonic spectra (WFH: remainder, amplitude, and phase) were acquired by decomposing the selected WF data of MS. The correlation between these extracted components and SMC was computed. To keep consistent with the number of characteristic parameters of the selected WF data, the number of harmonic decompositions was determined to be 180. Figure 7 demonstrates the correlation between harmonic characteristic parameters and SMC of MS.

**Figure 7.** The Pearson correlation coefficient between harmonic characteristic parameters and SMC.

The result of correlation analysis reveals that the variables at the beginning and end of the decomposition numbers have a strong correlation with SMC (|r| close to 0.8, *p* < 0.01). The figure is roughly symmetrical in the center. Furthermore, the correlation coefficient shows a periodic change of alternating positive and negative values. Except for the beginning and the end, the correlation between other characteristic parameters close to the middle and SMC is weak (|r| < 0.5, *p* < 0.01). Since the correlation of characteristic parameters is periodic, half of the parameters (A0/2, Ah=1,2,4, Bh=1,2,3, C h=1,2,3, and ϕh=1) with high correlation with SMC (|r| > 0.7) were selected.

#### *3.5. Dimension Reduction of Characteristic Parameters Based on PCA*

After extracting the characteristic parameters through a series of spectral processing techniques (including WPT, FOD, and HD), WF and WFH data were obtained. Since many relevant characteristic parameters are included (180 of WF and 11 of WFH), it is necessary to simplify these parameters. To reduce the redundancy of variables and the input of the models, WF and WFH were processed by the PCA method, and the first five variables of the PCA results (PCA1-5) were chosen as the input characteristic variables of the SMC estimation models. The results of PCA are shown in Table 3.

**Table 3.** The PCA results in eigenvalue and variance contribution rate.


It turns out that the contribution rates of cumulative variance of the first five principal components of WF and WFH were 95.915% and 99.147%, respectively. The PCA performance of WFH data is better than that of WF data. PCA1-5 of WFH data roughly includes the harmonic characteristic variable information before processing, which not only retains a large amount of original data information, but also effectively compresses the original data. According to all PCA results, two characteristic variables were established: WFP (PCA of WF) and WFHP (PCA of WFH).

#### *3.6. SMC Estimation and Model Validation Using Spectral Processing Technologies and Harmonic Indicators*

Three regression estimation models (PCR, BP, and PLSR) were selected to explore the validity of characteristic variables and the accuracy of the soil moisture estimation models. Based on the modeling of WFP and WFHP, six SMC prediction models were established: WFP-PCR, WFHP-PCR, WFP-BP, WFHP-BP, WFP-PLSR, and WFHP-PLSR. For the BP neural network model, the topology of the model was finally determined as 5-3-1 after several debugging. That is, the number of nodes in the input layer is 5, the number of hidden layers is 5, and the output result layer is 1. Meanwhile, the times of iterations, adaptive learning rate, momentum factor, and the learning error were set as 3000, 0.01, 0.9, and 0.001, respectively. The precision and error of the modeling set and validation set are shown in Table 4. The WFHP has better performance than WFP for the PCR, PLSR, and BP models in calibration and validation datasets. For the same regression model, BP neural network has the highest accuracy than PCR and PLSR. In all similar models, the accuracy of the validation set is slightly lower than that of the modeling set.

To further observe the effect of different variables and different methods on the estimation of different SMC, the scatter diagram of the estimated and measured value of SMC in the validation dataset is shown in Figure 8. Each row represents different regression models of similar characteristic variables (WFP or WFHP), and each column represents the same regression model of different characteristic variables (PCR, PLSR, or BP). The red dotted line indicates the 1:1 line. It can be found that the WFP-based models are prone to underestimation when the SMC exceeds 10% (below the 1:1 line), while the WFHP-based models can accurately estimate SMC in the whole range (almost overlaps with the 1:1 line). For the same characteristic variable, the effect of PLSR and BP is significantly better than that of PCR (closer to the 1:1 line).



**Figure 8.** The comparison of measured and estimated SMC: (**a**) WFP-PCR; (**b**) WFP-PLSR; (**c**) WFP-BP; (**d**) WFHP-PCR; (**e**) WFHP-PLSR; (**f**) WFHP-BP.

Compared with the traditional spectral indices prediction results (Figure 5), the validation accuracy of all models, except the WFP-PCR model, was higher with an error below 3%

(Table 4 and Figure 8). This indicated that there was a great potential for spectral variables based on spectral processing techniques upon SMC estimation for mixed soil types.

#### **4. Discussion**

Traditional soil moisture measurements using neutron scattering, drying method, and resistance method have been part of many agricultural studies [47–49]. While these measurements provide accurate results, they are tedious, time consuming, and laborious, making it difficult to scale in large areas [50]. Compared with traditional soil moisture monitoring methods, remote sensing has incomparable advantages such as a large area and being a macroscopic, real-time, and dynamic method [30]. The hyperspectral sensor can detect the subtle changes in surface characteristics, and hyperspectral quantitative inversion provides an effective technical means for dynamic monitoring of regional SMC [9,19]. However, obtaining the best characteristic variables of SMC estimation of mixed soil types has always been difficult in modeling. In SMC estimation, the original soil spectral reflectance data contain much noise and a lot of redundant information, which cannot be used directly to estimate SMC. There are many differences in spectral characteristics of different soil types. For example, in SS spectral analysis, the reflectance of all bands decreases with the increase of SMC overall (Figure 4), showing a strong negative correlation (Figure 5). In LS, except for SMC, the variation rule of reflectance is not obvious due to the difference in organic matter content, grain size distribution, mineral composition, and soil color [51], thus reducing the correlation with SMC. However, the small content of these substances in SS has a small impact on reflectance. Therefore, it is difficult to establish a general SMC estimation model. In most cases, it is necessary to carry out the spectral transformation on the original soil spectral reflection data, such as reciprocal, logarithm, FOD, etc. to extract characteristic bands or parameters to obtain feature variables [52]. However, these methods have a low noise reduction function and cannot deal with data background and noise well, which directly affects the accuracy of subsequent estimation.

In this paper, the results of several traditional spectral indices for estimating SMC showed that both univariate linear regression models and multivariate PLSR models had significant errors. Therefore, it is necessary to explore the variables and methods for SMC estimation in mixed soil types.

Through correlation analysis, it can be found that the correlation between WF and SMC is significantly higher than that of the original spectra and SMC (Figure 6). It shows that the FOD spectra can eliminate some effects of background and atmosphere, but still cannot achieve satisfactory results. In this paper, the HD method was adopted. The soil spectra were converted to frequency spectra to obtain harmonic characteristic parameters based on Fourier transform theory to effectively reduce the uncertainty of spectral parameter calculation. Furthermore, harmonic parameters can better reflect soil spectral changes caused by subtle changes in soil internal components. Compared with traditional spectral parameters, harmonic characteristic parameters (remainder, amplitude, and phase) are more correlated with SMC (Figure 7). Finally, 11 harmonic characteristic parameters with high correlation (|r| > 0.7) were selected. Based on the FOD and HD, the PCA method was applied to reduce the dimensionality of data and two kinds of feature parameters were gained: WFP and WFHP.

In parameter estimation studies using empirical models, PLSR, BP, and PCR all showed good effects [16,28,30]. To explore the applicability of the two types of characteristic parameters extracted in this paper (WFP and WFHP), these three models were used for comparison of estimation. The results show that WFPH is superior to WFP in SMC estimation in these three models (Table 4 and Figure 8). When selecting the same characteristic parameters (WFP or WFHP), the effects of PLSR and BP models are significantly better than PCR. The advantage of the PLSR model is that it can strengthen the error convergence ability of the model when the sample size is not particularly sufficient, while BP is a nonlinear distribution that can better reflect SMC and is mainly good at nonlinear prediction. Soil spectra are a comprehensive reflection of various soil properties, and the selection of estimation

model alone cannot effectively solve the problem of accurate estimation of SMC. Therefore, it is necessary to explore some common and stable characteristic parameters to establish a more robust and suitable SMC inversion model. The harmonic characteristic parameters constructed in this paper can transform complex signals in the time domain into simplified signals in the frequency domain, which can not only suppress or eliminate ground object background noise, but also highlight the spectral characteristics of the ground object with low order harmonic components to achieve the effect of data compression. Therefore, the SMC prediction ability of the three models (BP, PLSR, and PCR) was effectively improved. Moreover, the advantage of harmonic variables in predicting SMC also reflects that they can accurately predict different SMC, including low and high values, while WFP parameters are underestimated at high values of SMC (Figure 8).

To further check the performance of the optimal model (WFHP-BP) in this paper for SMC estimation in different soil types, the validation models for single soil types are shown in Figure 9. It can be found that the estimation accuracy of SMC is better than that of mixed soil types in both SS and LS, and neither of them shows local overestimation and underestimation. This may be because single soil types are more consistent physically or chemically and thus receive less interference from other factors. Since the BP neural network model has a nonlinear learning capability, the estimated values of SMC for different soil types did not appear to be overestimated or underestimated.

**Figure 9.** The comparison of measured and estimated SMC of different types of soil.

This study provided effective parameters and methods for nondestructive estimation of SMC in mixed soil types, and future research should be devoted to using satellite imagery as an alternative to ground-based measurements because of its large area, economy, time savings, and high temporal resolution, which can provide a data source for real-time field SMC mapping.

#### **5. Conclusions**

In this paper, a feature extraction method based on spectral processing technologies (WPT, FOD, and HD) and PCA was proposed, and three regression prediction methods (PCR, PLSR, and BP) were combined to compare the accuracy and applicability of SMC estimation for mixed soil. It is observed that for SS with less impurity, the variation of spectral reflectance can well describe the difference in SMC. However, for LS and MS, the spectral reflectance cannot be directly used to predict the SMC due to the influence of organic matter content, grain size distribution, mineral composition, and soil color. After WPT and FOD transformation using the original spectral data, two sets of data can be obtained after HD: WF and WFH. Meanwhile, the PCA method was utilized to reduce the dimensionality of these two datasets to obtain two sets of characteristic parameters: WFP and WFHP. The results of three regression models (WFP-PCR, WFHP-PCR, WFP-PLSR, WFHP-PLSR, WFP-BP, and WFHP-BP) indicated that the WFHP-based models showed better performance than that of WFP-based models. Among the different regression methods, BP neural network has the highest accuracy as a result of its nonlinear prediction ability. The best prediction model is WFHP-BP (R<sup>2</sup> = 0.932, RMSE = 2.311, MAE = 1.834 for the validation dataset). Moreover, harmonic variables have advantages in predicting SMC values in a larger range. This study can provide a theoretical basis and technical support for establishing SMC inversion models suitable for various types and a large range of soils. Future research should focus more on the use of satellite remote sensing data and on proposing physical or chemical indicators of soils that are more suitable for SMC estimation.

**Author Contributions:** Conceptualization, S.L. and X.J.; methodology, X.J.; software, S.L.; validation, Q.Y.; writing—original draft preparation, X.J. and S.L.; writing—review and editing, X.L.; visualization, W.J.; funding acquisition, Q.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Natural Science Foundation of China (Grant No.52078350), the National Key Research and Development Program of China (Grant No.2018YFB0505400), and the Natural Science Foundation of Shanghai (Grant No.22ZR1465700).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland www.mdpi.com

*Agriculture* Editorial Office E-mail: agriculture@mdpi.com www.mdpi.com/journal/agriculture

Disclaimer/Publisher's Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Academic Open Access Publishing

mdpi.com ISBN 978-3-0365-9799-7