2.1. Overview of the Study Area
The test area was located in the ecological unmanned farm of Shandong University of Technology, Zhutai Town, Linzi District, Zibo City, Shandong Province (36°57′15″ N, 118°12′50″ E,
Figure 1). The altitude of the test site is about 27 m. The site is dominated by plains and has a flat terrain. This area has a temperate, semi-humid, continental monsoon climate. The annual average temperature is about 13.2 °C. The average annual rainfall is 650~800 mm. The annual sunshine hours are about 2100 h, and the annual effective accumulated temperature is about 2600 °C. These conditions are suitable for the growth of crops such as maize and wheat (The average data on accumulated temperature and precipitation in this study are derived from:
https://www.cma.gov.cn/, accessed on 1 January 1981 to 1 January 2010).
The two-year trial took place at different plots in the same area. Three maize varieties were selected in this experiment. In 2020, we used Jinyangguang 6 for the single-variety test, and five 1m × 1m sampling plots were set up using the five-point sampling method. In 2021, we used Jinyangguang 6, Chunyu 985, and Nongxing 207 for multi-variety experiments, with three replicates for each one, and five 1 m × 1 m sampling plots were set up for each replicate using the equidistant sampling method. We used machine sowing for the two years. The maizerow spacing was 60 cm, and the plant spacing was 22.5 cm. Organic fertilizer and compound fertilizer were applied as base fertilizers before sowing. In the whole process of the experiment, we adopted the way of field unified management.
2.2. Multi-Source Data Collection and Preprocessing of Summer Maize
In order to ensure the validity of the model, we obtained multi-source remote sensing data for two different plots: multi-period and multi-variety. Multi-source data for the summer maize horn stage (20 July), tasseling stage (9 August) and bubbling stage (21 August) were collected by the ground sensor and UAV. Finally, we acquired the hyperspectral and multispectral spectra, the visible light data, leaf area data, relative chlorophyll content data and field-accumulated temperature data for the study area.
The Yaxin1242 leaf area meter (Beijing Yaxin Liyi Technology Co., Ltd., Beijing, China,
Figure 2a) can quickly measure parameters such as crop leaf area and perimeter. It is convenient and fast, and does not require any calibration before being used. Therefore, we selected Yaxin1242 to measure the true value of the leaf area. When collecting the test data, two well-growing plants were selected from each test plot as the target plants. According to the total number of leaves, they were divided into upper, middle and lower layers. A healthy leaf was selected for each layer as the collection object (
Figure 2b). The Yaxin1242 leaf area meter was measured three times and recorded in the sampling record sheet. We also recorded the total number of plants in the sample area, the total number of leaves, and the GPS information of the sampling plants in the collection sheet. The GPS information collection uses XAG XRTK3 (Guangzhou Jifei Technology Co., Ltd., Guangzhou, China). The device was designed based on GNSS-RTK. When used, it adopts the form of cloud connection to the base station, with a measurement accuracy of 1~2 cm. Using the RTK, the position information of the ground calibration point can be accurately recorded.
A PSR1100-f non-imaging hyperspectral measuring instrument (Spectral Evolution, Haverhill, MA, USA,
Figure 2c) was used to collect hyperspectral maize samples at the sampling point. Its measurement range was 320~1100 nm, and the sampling interval was 1.5 nm. Through the format conversion tool (SED-to-CSV converter), the data can be resampled to 1nm resolution and stored directly as a .csv file. The collection time was from 11:00 pm to 14:00 pm. Before the collection, a standard white plate was used for calibration to remove the influence of the dark current. The sampling plants were selected in the same way as the plants selected for leaf area collection. During the collection, the probe was located 0.5 m away from the crop canopy and was perpendicular to the ground. Each sampling point was collected from five times, and the average of the spectral curves was used as the value of this point.
SPAD502 Plus (produced by Koni Minolta, Japan,
Figure 2d) measures relative chlorophyll content by measuring the light transmittance of leaves. The instrument is easy to carry and use, and the measurement is accurate and reliable. Therefore, this study used this instrument to measure the chlorophyll content during field measurement. The trial was conducted over two years, and the test area was the same during these two years. During the experiment, for each measurement, the same corn plant as the leaf area measurement was selected as the measurement object, and each leaf was measured five times during the measurement process. We used the average value as the area of the leaves.
MS600 Pro multispectral camera (Yusense, Inc., Qingdao, China,
Figure 2e) contains six single-band channels. The single-band channels and spectral resolutions were 450 nm@35 nm, 555 nm@25 nm, 660
[email protected] nm, 710 nm@10 nm, 840 nm@30 nm and 940 nm. The pixel resolution was 1280 × 960, and the storage format was .tif. The multispectral data of the UAV were acquired at 10:00–12:00 pm on the same day, under cloud-free and windless conditions. The UAV’s flying height was 70 m. The flight speed was 4 m/s. The heading overlap was 80% and the lateral overlap was 70%. The camera was exposed using a scheduled exposure. Using the time-exposure technique, we acquired two sets of standard whiteboard images before and after each flight for data collection.
We used the DJI Phantom 4RTK (DJI-Innovations Inc., Shenzhen, China,
Figure 2f) with a camera (model, DJI FC6310R), a 1-inch CMOS sensor, a single-channel visible light image in the data format .jpg, and a pixel resolution of 5472 × 3684 to obtain high-precision visible light data in the sampling area. Then, the high-precision point cloud data were generated based on the UAV data.
All the hyperspectral data were smoothed and filtered by a locally weighted regression method (Lowess). The first derivative of the original spectrum was also solved. Then, the PIX4D mapper was used to stitch visible light data and multispectral data to obtain multispectral raster images and DSM data for each period. The LAI data of each sampling area were obtained by Formula (1).
where
LAIj is the leaf area index of each plot,
j is the number of plots (
j = 1,2,3,4……);
i is the number of leaves (
i = 1,2,3,4……);
Ni is the total number of leaves in each layer;
Si is the mean leaf area of each layer representing leaves (m
2);
n is the total number of plants in the sample point; S is the plot area, (m
2). A total of 150 datasets were obtained over two years (each set of data includes hyperspectral, multi-spectral, plant height, accumulated temperature, LAI and SPAD data of a sampling point). Finally, we combined the original RGB image and the data of adjacent sampling points. Then, we compared the spectral curve with other data. The result showed that when the crop varieties and growing conditions were consistent, the spectral curve of the sampling point was abnormally high and low, and the spectral curve had too many jagged peaks. Therefore, these eight spectral curves were eliminated, and 142 sets of valid data were finally obtained.
2.3. Extraction of Multi-Source Sensitive Features of Summer Maize
The calculation of the effective plant height was conducted using the method developed by Niu et al. [
23]. Since the land in this area is relatively flat, the bare ground height can be calculated directly from the DSM image. First, we randomly obtained 20 bare-ground data groups of the DSM image in each period through the python language, combined with the Geospatial Data Abstraction Library (GDAL). Then, we took the average value as the true value of the bare land. Finally, we subtracted the bare land value from the DSM value of the plant to obtain the final effective plant height (canopy height model (CHM)). Gao et al. extracted the plant height of wheat by subtracting the bare DSM value from the drone DSM data, which proved the reliability of the method [
24]. By analyzing the measured plant height at the sampling point and using the above method to calculate the plant height, we found the plant height and CHM to have a good correlation and the regression model R
2 to be above 0.8. This indicates that the above method has a high accuracy for CHM extracted from remote-sensing images.
The effective accumulated temperature (growing degree days (GDD)) refers to the sum of the effective temperature for a certain crop growth period. This denotes the difference between the daily temperature and the lower-limit temperature of crop growth during the growth period. Generally, the lower-limit temperature of maize growth in North China is 8~10 °C [
25,
26,
27], so this study selected the lower-limit temperature of maize as ten degrees Celsius. We obtained the daily temperature of each period from the weather station located in the field, and used the following formula to calculate the accumulated temperature:
where
GDDi represents the effective accumulated temperature at each stage, °C;
Ti represents the daily average temperature during the growth period, °C; T is the lower limit temperature of maize, °C; n is the number of growing days.
For the multi-spectral UAV data, based on previous studies [
12,
18,
19,
28,
29,
30,
31,
32,
33,
34,
35], this study calculated NDVI, RVI, GNDVI, DVI, SAVI and RDVI, which represented a total of six vegetation indices related to chlorophyll content. We also calculated WDRVI, GRVI, NDVI, RVI, PVI, PBI, EVI, OSAVI, MSR and TGDVI, i.e., a total of ten vegetation indices related to leaf area index. All the vegetation index calculation equations are shown in
Table 1. Then, we screened the LAI- and SPAD-sensitive vegetation indices.
Hyperspectral data can express detailed information, but they contain plenty of data redundancies. Therefore, based on the previous studies, this study analyzed the hyperspectral data acquired via the correlation analysis. We selected high-correlation VIs with LAI and SPAD as sensitive features. These include the hyperspectral red edge, the
λr position, the position of the highest point of near-infrared reflectance
λnir, the position of the yellow edge
λy, the first-order differential value
Dy in the yellow edge, the first-order differential value
Db in the blue edge, the first-order differential value
Dr in the red edge, the green peak reflectivity
Rg and the red valley reflectance
Rr [
36,
37]. The definition of each spectral feature is shown in
Table 2, and the sensitive feature was further optimized to build an inversion model.
Using the UAV visible light data and GPS information of the sampling area, Python and GDAL were used to construct sampling point vector files. After performing a comparison, we found that 120 pixels can cover the entire plant at the sampling point. Therefore, the vector file was set as a rectangle containing 120 pixels. This method was used to extract the vegetation index data of sampling points in each period.
Correlation analysis can effectively express the closeness between variables and targets, and so the LAI- and SPAD-sensitive features are preferred in correlation analysis. In the correlation analysis, when the p value of the parameters is less than 0.05, we consider the correlation between the parameters to be significant. However, at the same time, we referred to the correlation coefficient R-value for correlation division. It is generally considered that, under the condition of significant parameters, the R-value between the parameters is weakly correlated between 0.3 and 0.5, the R-value between 0.5 and 0.8 is moderately correlated, and when the R-value is more than 0.8, it is highly correlated. Therefore, parameters with a correlation greater than 0.5 are usually selected to build regression models.
2.4. Construction and Evaluation Method of Summer Maize Growth Parameter Inversion Model
Although the deep learning model can obtain good inversion accuracy, the training time and equipment requirements are high, and the model generalization is poor. Therefore, machine learning models and statistical linear regression models were selected for inversion model construction in this study to obtain inversion models with good generalization ability and accuracy.
Multiple linear regression (MLR) can explain the same dependent variable through multiple independent variable parameters, and the prediction effect is more realistic than if a single parameter were used [
28]. In this paper, we used environmental data, multispectral data, hyperspectral data and multi-source data to construct the model. To express the influence of each parameter in detail, we selected and verified the forward method to be used create the LAI and SPAD inversion models. The forward process determines whether variables should be introduced by sequentially introducing variables into the empty model and calculating the F value of the model at a given α (α = 0.25) level. During the model testing, the model will calculate the model R
2 after each variable is introduced. After that, variables are introduced into the model; the model with the largest R
2 will be used as the final model and output.
Random forest regression (RF) is based on a decision tree. By voting or combining each uncorrelated weak decision tree, the model obtains a strong decision tree during all results; this is regarded as the final model result [
29]. This method has the advantages of good data adaptability, a quick training speed, and the prevention of overfitting. It is often used for data classification and regression. In this study, the number of RF model iterations was set to 200; the step size was 1. To reduce unnecessary training time, the training effect is best when the depth of the decision tree is determined to be 3. Then, environmental data, multispectral data, hyperspectral data and multi-source data were used as input parameters in the RF model to construct an RF inversion model.
Partial least-squares regression (PLSR) is widely used in various fields because of its reliability and adaptability in the multivariate data processing. This method is based on principal component analysis and principal component regression. It has good adaptability to multi-linear correlation variables. The regression model is mainly used to predict target changes [
38]. Therefore, in this research, we used the leave-one-out method for cross-validation. Then, the inversion model of LAI and SPAD was constructed based on the environmental data, multispectral data, hyperspectral data and multi-source data.
In this paper, LAI and SPAD inversion models of summer maize were constructed by combining the sensitive characteristics of maize in various periods using the above methods. All models were based on the fusion of single and multivariate data. The model accuracy was evaluated through the model root-mean-square error and coefficient of determination, and the LAI and SPAD prediction effects were analyzed through the generated prescription map.