3.2.1. Terahertz Time-Domain Spectral Analysis

The average values of the sample power spectrum and absorbance spectrum can be obtained by THz time-domain spectroscopy. Figure 5e shows the average value curve of the power spectrum of the four tomato leaf mildew grades at 0.1–2.0 THz, with clear absorption peaks observed at approximately 0.43 THz and 1.27 THz, as well as a faint absorption peak at approximately 0.53 THz. Figure 5c shows the mean absorbance curves for the four tomato leaf mildew classes at 0.1–2.0 THz, with a clear absorption peak observed at approximately 0.79 THz. For level 3 mold leaves, a relatively clear absorption peak was observed at approximately 1.89 THz. However, the other three grades of leaf mildew in leaves did not

have this absorption peak, indicating that this absorption may be an error caused by the equipment itself, and hence should not be directly judged as the peak of the absorbance sample. The identification of each sample should be achieved by mathematical modeling.

**Figure 7.** Running process of the genetic algorithm. (**a**) Selected times of each wavelength point during genetic iteration, (**b**) schematic diagram of characteristic bands screened by the genetic algorithm.

Figure 8 shows the THz frequency domain image at 0.4 THz derived from the data distribution. It can be seen that the difference between the diseased and healthy areas of the leaves is reflected by the color information corresponding to the strength of the frequency domain values, which indicates that the processed THz feature image can reflect the changes in crops from a visual perspective.

**Figure 8.** Terahertz images of tomato leaves with different disease grades.

3.2.2. Screening of the Terahertz Time-Domain Spectrum Characteristic Frequency Band

PCA enables the original spectral bands to obtain principal components through linear combination, and also determines the characteristic wavelength according to the absolute value of the loadings of the principal components. The loading refers to the correlation coefficient between the principal component and the original wavelength variable, which is used to reflect the closeness degree between the principal component and each wavelength variable [21]. Loading curves of the first three principal components of tomato leaf mildew samples are shown below in Figure 9. The absolute value of loadings at the peak and trough of each principal component curve was large and the corresponding wavelength was the characteristic wavelength. Therefore, after smoothing the power spectrum, five characteristic wavelengths were obtained: 0.413 THz, 0.752 THz, 1.394 THz, 1.457 THz, and 1.622 THz, respectively. Using the same method, the smoothed absorbance spectrum obtained six characteristic wavelengths: 0.249 THz, 0.567 THz, 0.813 THz, 1.243 THz, 1.771 THz, and 1.892 THz, respectively.

**Figure 9.** Load curves of the first three principal components of tomato leaf mildew samples. (**a**) absorbance dimension, (**b**) power dimension.

To further compare the visualized images in different frequency domains, THz frequency domain imaging was performed for five characteristic spectra, as shown in Figure 10. The images of the samples were relatively distinct at the 0.413 THz, 0.752 THz, and 1.394 THz frequencies. At the frequency of 0.413 THz, the image of the sample was the clearest and the recognition effect was the best. However, at the 1.457 THz and 1.622 THz frequencies, the sample images became blurred.

**Figure 10.** Terahertz time-domain spectral characteristic image.

The PCA method was used to establish the identification model of different tomato leaf mildew grades on the power spectrum dimension and the absorbance dimension of the THz time-domain spectrum. Table 2 shows the PCA results of the spectral data in both dimensions combined with the preprocessing of the SG smoothing algorithm. As shown in Table 2, the cumulative variance contribution of the first two principal components (PC1 and PC2) to the level variable of tomato leaf mildew was above 85% [22]. Hence, PC1 and PC2 were selected for the analysis.

**Table 2.** Prediction accuracy under each model.


According to Figure 11, it can be seen that the confidence ellipse of the absorbance data of different grades of tomato leaf mildew exhibited an intertwined state with a discrimination rate of 19.8%. This is because the recognition rate of level 1 grade tomato leaves was 84.9%, while the recognition rates of tomato leaves classed as grades 0, 3, and 5 were lower. The confidence ellipse of the power spectrum data of different grades of tomato leaf mildew also exhibited an intertwined state, with a discrimination rate of 24.7%. The above results show that the recognition rate of tomato leaf mildew using the SG smoothing preprocessing algorithm combined with the PCA model was low, and that the PCA method could not fully mine the spectral information of tomato leaves with different disease grades. Hence, it is necessary that other algorithms are used to build models to improve the prediction accuracy.

**Figure 11.** Scatter diagram of tomato leaf mildew sample distribution. (**a**) absorbance scatter, (**b**) power scatter.

#### *3.3. Single-Model Analysis*

After using the GA and PCA algorithms to reduce the dimension of the data and screen the characteristic variables, a prediction model of tomato leaf mildew disease was developed based on the screened feature variables by the BPNN method. Before the model was established, it was necessary to carry out PCA and extract the sub-vectors of the principal components to form the input of pattern recognition. During the training process of the model, the number of principal component variables affects both the accuracy and stability of the model. Too few principal component factors will lead to excessive loss of information and reduce the accuracy of the model. However, if the number of principal component factors is too great, an excessive amount of redundant information will be introduced, which both influences the robustness of the model and lengthens the data processing time [22]. Therefore, it is important to select the appropriate number of principal component factors for the establishment of the model.

Figure 12 shows the recognition results of the BPNN model training and prediction under different numbers of principal component factors. It can be seen that, initially, with the increasing number of principal component factors, the recognition rates in the training and prediction sets generally exhibited an increasing trend, while after the number of principal component factors reached 7, the recognition rates of the models stabilized, and then even exhibited a moderately decreasing trend.

**Figure 12.** Recognition results of training and prediction under different principal component factors.

Figure 13a shows the BPNN performance graph, which shows that the minimum MSE was 0.6792. Figure 13b shows the BPNN training status graph, which shows that the actual training times were 189. Figure 13c–e shows the BPNN regression analysis graph. When the test set classification index falls within the threshold of the training set classification index, the recognition result is correct. The converse indicates that the classification recognition is incorrect. The precision of the proposed model under the near-infrared hyperspectrum was determined to be R = 0.9367, while under the THz absorbance dimension it was R = 0.9573, and under the THz power spectrum dimension it was R = 0.9431. Based on the actual classification diagram and prediction classification diagram of all the test sets, it was found that the BPNN model was able to identify almost all tomato leaves with leaf mildew.

To evaluate the detection accuracy of the model, this study comprehensively evaluated the recognition results with the recognition accuracy variable *P*, which is an indicator used to measure the detection signal-to-noise ratio; that is, the percentage of the 'correct' detection results among all detection results. The calculation formula is shown below [23]:

$$P = \frac{T\_P}{T\_P + F\_P} \tag{6}$$

where *TP* represents the correctly identified tomato leaf mildew samples, and *FP* represents the incorrectly identified tomato leaf mildew samples.

In this study, tomato leaf mildew was divided into four grades, so the prediction accuracy of each level was taken as the evaluation index used for statistics. The results are shown in Table 3.

**Figure 13.** *Cont*.

**Figure 13.** *Cont*.

**Figure 13.** (**a**) Performance diagram of the backpropagation neural network, (**b**) training status of the backpropagation neural network, (**c**) regression analysis of the backpropagation neural network using the near-infrared hyperspectrum, (**d**) regression analysis of the backpropagation neural network using the THz absorbance, (**e**) regression analysis of the backpropagation neural network using the THz power spectrum.

**Table 3.** Prediction accuracy of each model.


The results show that in the model established by the characteristic variables, the overall detection accuracy of the samples was more than 90%, featuring high accuracy. The highest and lowest detection accuracy rates for the Level 1 samples were 96% and 92%, respectively. The average accuracy rate was 94.67%. Compared to Level 3, the recognition effect in Level 1 was better. Compared to Level 5, the recognition rate was slightly lower. Each model had the highest detection accuracy rate for the Level 0 samples. Hence, the PCA-BPNN model of the power spectrum dimension is the optimal model for comprehensive evaluation. Its prediction accuracy for grades 0, 1, 2, 3, and 4 was 100%, 96%, 95.45%, and 94.74%, respectively, with an overall prediction accuracy of 96.67%.
