*2.6. Selection of Important Wavelengths*

The principal components were determined by using the partial least square regression (PLSR) models established via the 5-fold cross-validation method to select the most significant wavelengths in different iterations. The process of selecting the important wavelengths for one quality parameter in Round I is shown in Figure 2.

Here, Spec(I-1)in corresponds to the matrix of the spectral data, which is composed of the set of wavelengths selected during the last iteration, Yk refers to the measurement value matrix of the kth quality parameter. NI, and CI in the figure correspond to the binary matrix lines in the Ith iteration, and the optimal wavelengths selected in the (i-1)th iteration, respectively, Num\_totalI is the total number of uninformative and interfering wavelengths.

According to the number of wavelengths selected in the (i-1)th iteration, a binary matrix shuffler filter, MIin, with CI columns and NI rows for Round I was generated. The value of MIin(i,j) indicates that Wavelength i is used to construct the predicting quality model j. The root mean square error RMSECVIin(:,j) for the NI possible wavelength combinations was calculated separately. Each RMSECVIin(:,j) value was set as RMSECVIin(i,j). The binary matrix MIex was obtained by inverting the elements of MIin, implying a change

in the including state of the sample spectrum for its corresponding wavelength. A new PLSR prediction model and its corresponding root mean square error RMSECVIex(i,j) was calculated when the inclusion state of wavelength j changed into the ith wavelength.

**Figure 2.** Iteration process of Round I.

The values of RMSECVIin(i,j) and RMSECVIex(i,j) of the ith wavelength combination with and without including the wavelength j were calculated according to the MIin and MIex values. RMSECVIex(i,j) and RMSECVIin(i,j) were tested via the Mann–Whitney U test with a significance level of 0.05. The difference between the two values of the wavelength j was defined as DmeanI(j). The wavelengths were classified into four types with the test level PI(j) and DmeanI(j), as shown in Table 1. Strongly informative wavelengths can be used in to drive prediction models, contrarily to weakly informative wavelengths. Interfering wavelengths create noise inside the model and lower significantly its performance, whereas uninformative wavelengths play the same role of interfering wavelengths but have a lower effect on the model performance.

**Table 1.** Variable classification rules.


When DmeanI(j) was smaller than 0, its corresponding wavelength was entered into a new iteration. When the number of uninformative and interfering wavelengths (Num\_totalI) was smaller than 0, the iteration stopped and the RMSECV value was calculated using the spectra with strongly and weakly informative wavelengths together with their quality values.

Reverse elimination was then performed. When either a strongly informative wavelength or a weakly one was eliminated, a new set of PLSR models was established and the corresponding RMSECV' values were obtained. If the RMSECV' was smaller than the RMSECV, the corresponding wavelength was eliminated and remaining wavelengths were defined as important ones.
