*2.4. Data Processing*

The PCC samples were divided into two sets by systematic sampling method; 86 of them were used as calibration sets to establish the prediction models, and 20 samples not involved in the calibration were used as validation sets for external validation of the effectiveness of the developed models. The chemical determination results of anthocyanidin content obtained by HPLC of the calibration set samples were imported into the chemometric software accompanying the instrument and processed for NIR spectroscopy to obtain a cal. file. The spectral data were preprocessed using a partial least squares regression (PLSR) method at three different wavelength bands. These three bands included 400–1100 nm, 1100–2498 nm (full band); 400–800 nm (visible band); 800–1100 nm, 1100–2498 nm (near infrared band). The pre-processing scattering model of the spectral data included no scattering processing (None), standard normal variables transformation + de-trending processing (SNV+Detrend), standard normal variation processing (SNV Only), de-trending processing (Detrend Only), standard multivariate scattering correction (Standard MSC), weighted multivariate scattering correction (Weighted MSC), and two different derivative treatments, namely, no derivative and first-order derivative were employed. The final prediction models built under different preprocessing methods were compared, and the model with the internal cross-validation correlation coefficient (1-VR) close to 1 and lower standard error of cross-validation (SECV) was selected as the best one. These two sets of data can basically reflect the prediction performance of the calibration model for unknown samples. Subsequently, samples of the

validation set were analyzed to test the predictive ability of the proposed model. The criterion was that the higher the external correlation coefficient (RSQ) value and the lower the standard deviation of prediction (SEP), the more accurate the model.
