*2.4. Data Processing*

The 160 sample spectra under each fruit diameter group were divided into a calibration set (120) and a prediction set (40) using the Kennard-Stone (K-S) algorithm. Since the NIR spectral data matrix of each fruit diameter group is 160 × 1044, to reduce the errors caused by non-experimental factors, this study used Unscrambler (Version 9.7, CAMO, City of Oslo, Norway) software to process the spectra using different pretreatment methods (Multiple scattering correction, MSC; standard normal variable transformation, SNV; Savitzky-Golay smoothing, S-G smothing). The partial least squares (PLS) method was then used to establish the apple SSC detection model.

Partial least squares regression (PLSR) is widely used in NIR spectral analysis to decompose the spectral array X and the concentration array Y simultaneously to strengthen the corresponding computational relationship and ensure the best model is obtained. The PLS regression model is shown in Equation (1):

$$Y = bX + \varepsilon \tag{1}$$

where *b* denotes the vector of regression coefficients and *e* denotes the model residuals.

The performance of the model is judged by the correlation coefficient Rp and the root mean square error value (RMSEP). Equations of Rp can be found in Equation (2) and RMSEP can be found in Equation (3).

$$\mathcal{R}\_{\mathbb{P}} = \sqrt{1 - \frac{\sum\_{i=1}^{n} (y\_i - \mathcal{Y}\_i)}{\sum\_{i=1}^{n} (y\_i - \overline{y})}} \tag{2}$$

$$\text{RMSE} = \sqrt{\frac{1}{n-1} \sum\_{i=1}^{n} \left( y\_i - \mathcal{g}\_i \right)^2} \tag{3}$$

where *n* is the number of experimental samples, *yi* is the actual value of the *i*-th sample in the prediction set measured by the standard method, *y*ˆ*<sup>i</sup>* is the predicted value of the *i*-th sample in the prediction set measured by NIR spectroscopy and mathematical model, and *yi* is the mean value of the SSC of all apples in the prediction set.

### **3. Results and Analysis**

*3.1. Sample Chemical Index Statistics Results*

The 160 apple spectra under the calibration set (120) and the prediction set (40) by the K-S algorithm were sorted and the sorted apple SSC values are presented in Table 1. The SSC range of the modeling set under each fruit size group was larger than the SSC range of the prediction set, which allows for improved forecasting of apple SSC.


**Table 1.** SSC values for apples of different fruit sizes.
