*5.2. Dimensionality Reduction of a PCA-Based Training Set Sample*

Dimensionality reduction of the training set sample data by the PCA method not only has the advantages of reducing the dimension of the training set and improving the speed of the model training, but also has the functions of eliminating the outliers of the signal and denoising the signal. Based on the dimensionality reduction principle and analysis steps of the PCA method described above, the variance contribution rates δ*<sup>i</sup>* of the principal components of the PS1 and PS2 training set samples are calculated, according to Equation (16). The variance contribution rates δ*<sup>i</sup>* of the first 1–18 principal elements are shown in Table 3.


**Table 3.** Variance contribution of partial principal components.

According to Equation (17), the relationship between the cumulative contribution rate η*<sup>k</sup>* of the principal components of the PS1 and PS2 training set samples and the number of principal components is further plotted, as shown in Figure 4.

**Figure 4.** The cumulative contribution rate for the sample data. (**a**) The cumulative contribution rate η*<sup>k</sup>* of PS1 training set samples; (**b**) The cumulative contribution rate η*<sup>k</sup>* of PS2 training set samples.

However, there is no general method to select the optimal number of principal components to be retained. In order to retain the original information to the greatest extent, the variance contribution of principal components δ*<sup>i</sup>* is set close to 0, and the cumulative contribution rate η*<sup>k</sup>* is set close to 100%. According to Table 3 and Figure 4, from the variance contribution rates δ*<sup>i</sup>* of the principal components, it can be known that, after PCA dimensionality reduction, the information of PS1 data is concentrated in the first 1–15 principal components, and the PS2 data information is concentrated in the first 1–13. Considering the balance of the model training data, the number of principal components k after dimensionality reduction to the PS1 and PS2 data sets is determined to be 15. The data set of 1449 × 100 dimensions can be compressed to 1449 × 15 dimensions. The data set after dimensionality reduction is applied to the modeling and learning process of the training sample data set, which is divided into training samples and test samples, according to a certain proportion, as displayed in Table 4.


