2.4.1. Data Smoothing

The SG smoothing algorithm is commonly used in data pre-processing, which features the advantages of being simple, convenient, fast, and efficient [16]. The principle of the algorithm is to first take a window with an odd number of points in width, use the least squares method to fit through the translation of the window, and then replace the original value with the fitting value of the point in the window to achieve the effect of smoothing the data. In this study, the SG smoothing algorithm was used to preprocess the data, and the window width was 7 points/time. This algorithm can be used to effectively reduce interference signals and improve both modeling efficiency and accuracy. After the above preprocessing, the before-and-after data comparison of the spectral data of tomato leaf mildew samples was obtained, as shown in Figure 5.

**Figure 5.** Data of tomato leaf mildew samples before and after SG smoothing preprocessing. (**a**) Nearinfrared primary spectrum, (**b**) near-infrared spectra after SG smoothing, (**c**) THz absorbance spectrum, (**d**) THz absorbance spectrum after SG smoothing, (**e**) THz power spectrum, (**f**) THz power spectrum after SG smoothing.

#### 2.4.2. Characteristic Band screening

Because the collected spectral data contains many redundant and collinear information characteristics, this interferes with the extraction of effective spectral information, consequently leading to the effective spectral information extraction model being too complex and hence difficult to calculate. In this paper, a genetic algorithm (GA) and principal component analysis (PCA) were used to select the characteristic wavelength in order to reduce the influence of information redundancy and collinearity, simplify the model, and reduce the amount of calculation. The use of a GA algorithm represents an intelligent optimization method that simulates the evolutionary process that occurs by the natural selection of organisms [17]. When running the GA to screen the near-infrared hyperspectral characteristic bands in the current study, the crossover probability was set to 0.5, the population size was set to 30, and the mutation probability was 0.01. The characteristic wavelength was determined as the wavelength with the highest frequency of 100 GA iterations.

PCA is a multivariate statistical method used for analyzing correlations among multiple variables. The method converts a group of variables that may correlate with a group of linearly unrelated variables through orthogonal transformation [18]. The new variables obtained through PCA can reduce the number of variables while preserving the original feature information as much as possible. Therefore, PCA is a suitable method for the dimension reduction and feature extraction of THz time-domain spectral data.
