*3.3. Fungal Spore Recognition Model*

The large number of Raman spectral features and the existence of a large number of redundant features greatly slow down the speed of modeling and analysis, so dimensionality reduction was required before modeling and analysis. After dimensionality reduction, the Raman spectrum was modeled and analyzed by SVM and BPANN, and a classification model was established.

The SCARS algorithm uses the stability of the variable as a measure. The greater the stability value, the greater the possibility of the variable being selected, and the frequency bands selected for each iteration can be consistent. It can ensure the stability and speed of variable selection. The optimal potential frequency band variable is selected by the Monte Carlo cross-validation method, and an RMSECV value can be obtained in each cycle. Due to the large number of sampling times, in order to obtain a better combination of characteristic frequency bands, it is necessary to compare through repeated trials. The subset combination corresponding to the minimum RMSECV is obtained. When the number of cyclic samplings is set to 25, the running result tends to be stable. The running result of the algorithm is shown in Figure 8. The algorithm filtered out 69 bands, as shown in Figure 8.

**Figure 8.** Example of running result of SCARS algorithm.

PCA selects different PCA variables in full and shortened spectral intervals. The top 15 PCAs are the most significant for the raw spectral data, with a cumulative contribution rate of over 95%. Figure 9 shows the top three PCA taxonomic groups of fungi and observed preliminary taxonomic results for four diseased spores.

After the disease spore Raman spectrum was dimensionally reduced, the data were input into the classification model for classification. The data after dimensionality reduction by PCA is a matrix of 200 × 15, and the data after dimensionality reduction by SCARS is a matrix of 200 × 65. In this study, two excellent classification algorithms were selected to classify the dimensionality-reduced data. When the SVM was running, the radial basis function (RBF) kernel function was selected, and then the optimal parameters of the penalty function c of the model and the kernel parameter g of the kernel function were obtained through grid search. The optimal parameters are 0.00094, and the accuracy rates of the test set and prediction set are 94.38% and 86.63%. When using BPANN, the hidden layer transfer function was set to Tansig, the output layer transfer function to purelin, the network training function to trainbfg, the number of training iterations to be 1000, and the target error to be 0.0001 [29]. The discriminative accuracies of the training set and prediction set are 88.46% and 87.61%, respectively. Then, the Raman data reduced by PCA and SCARS were used to classify the classification model. Since the random division of the test set and the prediction set and other factors will affect the classification effect of the pull model, the

model was run five times and its average accuracy was used. The results are shown in Table 2. The calibration set and prediction set of SCARS-BPANN classification have the highest accuracy, 94.94% and 94.31%, respectively.

**Figure 9.** Top 3 PCA distribution results.
