2.3.2. Calculation of Disease Severity

With respect to ear scale, the disease severity is mainly quantified by the ratio of the diseased area on the ear to the whole ear area. Therefore, Fusarium head blight lesions were segmented from the whole ear to measure relative lesion area on ears. First, the third channel image of the original red green blue (RGB) image (Figure 3a) was processed with binarization and morphological corrosion and expansion to remove the tip of wheat and stalks in the image (Figure 3b).

**Figure 3.** Extraction of diseased spots from wheat ears, (**a**) original image; (**b**) image of wheat tip and stalk removal; (**c**) image of diseased spots extraction.

Because the three components in the RGB image were represented by a three-dimensional Cartesian coordinate system, they were highly correlated and relatively heterogeneous, resulting in a small difference between the healthy area and the diseased area that was difficult to segment. The color space of YDbDr was used to separate the brightness and color difference, which was more suitable for distinguishing between green and red yellow susceptible areas. Therefore, the RGB image after the wheat tip and stalk removal was transferred to the YDbDr color space, and a threshold segmentation method was adopted to extract the ear disease spots (Figure 3c).

The severity of the FHB is expressed by the ratio of the number of pixels in the disease spot region to the pixel number in the whole wheat ear region, as shown in Equation (2):

$$SI = \frac{S\_{\text{leisi}}}{S\_{all}} \tag{2}$$

*SI* represents the severity of FHB, *S*lesion is the number of pixels in the disease spot area, and *S*all is the number of pixels in the whole wheat ear region.

#### 2.3.3. Characteristic Band Selection

The RF algorithm was used to select the characteristic wavelengths that are sensitive to FHB. This algorithm is an ensemble learning algorithm based on multiple classification and regression trees (CARTs) proposed by Breiman [16] and is often used for characteristic wavelength selection in hyperspectral data analysis [17,18]. In this algorithm, the bootstrap resampling method is used to generate the training set; attributes are measured according to the minimum Gini index principle, and CART is gradually established. Subsequently, the classification of samples is determined by combining the voting of each decision tree. At the same time, the samples that do not appear in the training set are designated as "bag data" and are used to predict the accuracy of the algorithm.

The Gini index is an attribute splitting method based on impurity. The smaller the impurity, the worse the dispersion degree of the variables and the more information that is obtained [19]. The formula for calculating the impurity Gini index **G** is shown in Equation (3):

$$\text{G(a)}\, =\, 1 - \sum\_{i=1}^{c} \mathbf{P}\_i^2 \tag{3}$$

where c is the number of sample categories, and Pi is the probability that the sample corresponding to an attribute a belongs to category ci (ci represents the i-th category).

Because the Gini impurity index is negatively related to the available information, this study used the Gini purity index to convert the purity and available useful information into a positive correlation to more intuitively reflect the impact of features on the classification effect. The calculation formula is as follows:

$$\mathbf{G}\_{\text{purity}}(\mathbf{a}) = \sum\_{i=1}^{c} \mathbf{P}\_i^2 \tag{4}$$

Through the converted formula, the Gini purity index of characteristic f can be obtained as follows:

$$\mathbf{G(f)} = \sum\_{\mathbf{i}=1}^{\mathbf{k}} \frac{\mathbf{n}\_{\mathbf{i}}}{\mathbf{N}} \mathbf{G\_{purity}(a\_{\mathbf{i}})} \quad \tag{5}$$

where N is the number of samples, k is the number of categories of a certain attribute a, ai is a certain category of attributes, and ni is the number of samples corresponding to a certain category. The greater the purity of a feature, the stronger the ability of the feature to recognize the sample. The calculation formula for the importance measurement of the feature is as follows:

$$\mathbf{S}(\mathbf{v}) = \frac{1}{\mathbf{t}} \sum\_{\mathbf{u}=1}^{\mathbf{t}} \mathbf{G}(f\_{\mathbf{u}\mathbf{v}}) \tag{6}$$

where t is the number of training datasets in the RF, G(*f* uv) is the purity of the v-th dimension eigenvector in the u-th training dataset (v = 1, 2, 3, ...., k), and k is the overall dimension of the sample. Finally, the required characteristic wavelengths were obtained according to the positive maximum value and the negative minimum value of the importance score.

#### *2.4. Construction of Proposed New Spectral Disease Index for Indentifing Wheat FHB*

Previous studies [6,20] have shown that the disease spectral index in the form of the normalized wavelength difference is very sensitive to spectral changes caused by powdery mildew, stripe rust, and aphids. Therefore, this study used the normalized wavelength difference in combination with characteristic wavelengths to construct the exclusive FDI for each period. The calculation is carried out via Equation (7):

$$\text{FDI} = \frac{R\_{\lambda1} - R\_{\lambda2}}{R\_{\lambda1} + R\_{\lambda2}} \tag{7}$$

where *R*λ<sup>1</sup> represents the reflectance at the λ1 wavelength and *R*λ<sup>2</sup> represents the reflectance at the λ2 wavelength.

#### *2.5. Traditional Spectral Indices for Wheat FHB Detection*

Pigment content can provide information about the physiological state of leaves; consequently, a spectral index that can characterize the plant pigment content is highly related to plants' physiological and biochemical changes and is often used for non-destructive detection of plant diseases and insect pests. Sixteen commonly used spectral indexes (Table 1) were selected and compared with the FDI proposed in this study to evaluate FDI's ability to identify and distinguish infected ears.


**Table 1.** Traditional spectral indices tested in the study.

#### *2.6. Linear Regression Model and Validation*

A linear regression model was used to model the relationship between spectral indices (FDI and existing spectral indices) and the severity index (SI) at different growth stages. The evaluation indexes of the model included the root mean square error (RMSE) and the coefficient of determination (*R*2). RMSE represents the standard deviation of the difference between the predicted value and the measured value. *R*<sup>2</sup> used to measure the proportion of variation in the dependent variable that can be explained by the independent variable. The closer *R*<sup>2</sup> is to 1, the closer the regression line is to each observation point, and the better the regression fit.

To make the distribution of samples more uniform, the SI values of the samples were arranged in descending order and then divided into a training dataset and test dataset in a 3:1 proportion. Specifically, one sample was taken from a group of four samples as the test dataset, and the remaining three were used as the training dataset. In the model, using FDI as the independent variable and SI as the dependent variable, the relationship between FDI and SI in different periods was determined by regression analysis. In a single growing period, the FDI of the sample in the training set was calculated, and the linear regression equation between it and the corresponding SI was established to obtain the *R*<sup>2</sup> and RMSE in the training dataset. The SI of each sample in the test dataset was predicted by a linear regression equation and FDI, and the *R*<sup>2</sup> and RMSE of the prediction set were obtained by comparing the actual SI with the predicted SI. In addition, the samples in the combined stage were modeled and predicted, and the linear regression equation of the combined stage was used to predict the test dataset samples of the late flowering and early filling stages.
