Pears are one of the most popular fruits in the world. Pears are typically used as food; not only are they sweet, juicy, and delicious, with some acidity, but they are also rich in nutrition and contain a variety of vitamins and cellulose. The tastes and textures of different kinds of pears are different. More than 60% of the world’s pears are produced in China [
1]. Consumers pay attention to the external quality of pears, including size, color, and shape, as well as to the internal quality of pears, including the sugar content, acidity, and taste. After harvest, the detection and grading of the fruit’s internal quality always plays an important role in its commercialization [
2]. The soluble solid content (SSC) not only affects the internal quality and price of fresh fruit, but also determines the fruit maturity and harvest time [
2]. The titratable acidity (TA) is often used to estimate the ripening time of pears: as fruits get closer to ripening time, their acidity decreases and their taste tends to be sweeter. The taste index (TI) is defined as the ratio of SSC to TA. This index can be used to determine the taste and ripening stage of pears. Traditional methods of detecting internal qualities of fruit are reliable, but they are destructive, time-consuming, and polluting. Thus, it is impossible for these traditional chemical measurement methods to detect internal quality of fruit rapidly and nondestructively. Therefore, the conventional physicochemical analysis methods currently used to evaluate the internal quality of fruit do not meet consumers’ requirements for fruit with consistency and high quality.
Over the past few decades, various studies have been conducted using near-infrared (NIR) or visible-NIR (vis-NIR) spectroscopy as rapid and nondestructive methods for determination of the internal quality in fresh fruit. Li et al. [
3] used the vis-NIR spectroscopy spectrometric technique to measure the SSC and firmness of pear fruit in the wavelength range of 400–1800 nm. The prediction results showed that the correlation coefficient of prediction (Rp), root mean square error of prediction (RMSEP), and residual predictive deviation (RPD) were 0.9486, 0.3244%, and 3.1598 for SCC and 0.8955, 1.1077%, and 2.2469 for firmness. These results showed that vis-NIR spectroscopy could be applied as a fast and accurate alternative method for the nondestructive determination of SSC and firmness of pears. The research team first used the long-wave infrared hyperspectral imaging in the wavelength of 1000–2500 nm to measure the SSC in pears [
4]. Results of the Monte Carlo-uninformative variable elimination-successive projections algorithm-partial least square (MC-UVE-SPA-PLS) model using 18 selected characteristic variables were an Rp of 0.88 and RMSEP of 0.35. The team also used the NIR and portable vis-NIR spectroscopy for nondestructive determination of SSC in pears, combined with an informative variable selection algorithm, and two calibration algorithms, including linear regressions of multiple linear regression (MLR) and nonlinear regression of least-square support vector machine (LS-SVM), such as MC-UVE-SPA-MLR [
5] and MC-UVE-SPA-LS-SVM [
6], were used for measurement. They also established multicultivar models for the determination of SSC in pears [
2] and conducted a comparative study for the quantitative determination of SSC, pH, and firmness of pears by vis-NIR spectroscopy [
7]. Tian et al. [
8] developed a fruit surface feature classification and multivariate regression analysis for the nondestructive prediction of SSC in pears, based on the vis-NIR transmission spectra of “Korla” pears, with a portable spectrometer instrument; the Rp and RMSEP were 0.9368 and 0.5256%, respectively. Lee and Han [
9] nondestructively detected the sugar content of Korean pears using NIR diffuse-reflectance spectroscopy, and the prediction accuracy was evaluated to be about 0.24%. Wang et al. [
10] used the vis-NIR spectroscopy combined with chemometric methods for the nondestructive detection of the juiciness of pears, and the external verification determination coefficient (Rv) was 0.93, and the root mean square error of cross validation (RMSECV) was 0.97%. Yu et al. [
11] used optical properties and diffuse reflectance in the 900–1700 nm spectral region for prediction and comparison of models for SSC determination in “Ya” pears. Xu et al. [
12] developed an application for the online determination of sugar content in pears based on variable selection in vis- and NIR spectra, with Rv = 0.880 and RMSEP = 0.459% for the validation set. Rittiron et al. [
1] used NIR spectroscopy in the short wavelength region (700–1100 nm) for rapid and nondestructive detection of water-core and sugar content in Asian pears for commercial trade. Travers et al. [
13] predicted and compared preharvest pear dry matter (DM) and SSC, based on two near-spectral ranges (680–1000 nm and 1100–2350 nm). Models based on longer NIR spectra were more successful for both parameters (DM/SSC: Rv = 0.78–0.84; RMSECV = 0.78/0.44; latent variables (LVs) = 6/7). Adebayo et al. [
14] used absorption and reduced scattering coefficients based on the vis- and shortwave (SW)-NIR wavelength range for nondestructive analysis of fruit flesh firmness and SSC in pears. Nicolaï et al. [
15] used time-resolved and continuous wave NIR reflectance spectroscopy to predict the SSC and firmness of pears. Sun et al. [
16] used online vis- and NIR spectroscopy for simultaneous measurement of brown core and SSC in pears. Wang et al. [
17] developed multicultivar models to predict the SSC and firmness of European pears (
Pyrus communis L.) using portable vis-NIR spectroscopy. Yu et al. [
18] developed a deep learning method for predicting firmness and SSC of postharvest Korla fragrant pears using vis-NIR hyperspectral reflectance imaging. Choi et al. [
19] used a portable, nondestructive tester, integrating vis-NIR reflectance spectroscopy, to detect the sugar content in Asian pears. Passos et al. [
20] nondestructively detected soluble solids content for “Rocha” pears based on vis-SWNIR spectroscopy under real-world sorting facility conditions. Liu et al. [
21] used NIR diffuse reflectance spectroscopy combined with variable selection algorithms for optimized prediction of sugar content in snow pears. Sheng et al. [
22] nondestructively measured lignin content in Korla fragrant pears, based on NIR spectroscopy, and the Rp, RMSEP, and RPD were 0.87, 1.36%, and 2.03, respectively. Fan et al. [
23], Zhang et al. [
24], Lee et al. [
25], and Li et al. [
19,
26] used vis-NIR and NIR hyperspectral imaging technology for fast and nondestructive prediction of SSC and firmness, based on the competitive adaptive reweighted sampling-successive projections algorithm-partial least square (CARS-SPA-PLS) models, and the Rp and RMSEP of the prediction sets were 0.876 and 0.491% for SSC and 0.867 and 0.721% for firmness. They also used this technology for the prediction of internal sugar content in Dangshan pears, and the Rp and RMSEP of CARS-PLS were 0.8971 and 0.3937%, and they were 0.8969 and 0.3482% for (genetic algorithm: GA) GA-SPA-PLS. They used the technology to detect the physical damage of pears as well as for the identification of the type of wax on pears, with an identification accuracy of 99.07% for calibration and 95.83% for the prediction sets. They used the technology (400–1000 nm) for nondestructive variety discrimination and prediction of SSC and firmness of pears, with a correlation coefficient (r) of 0.9977 for firmness and 0.9924 for SSC.
The partial least square regression (PLSR) model has been widely established for its calibration and prediction analysis in spectroscopy technology because of its simple operation and high prediction accuracy. Variable selection is an important step in the multivariate model, because the predictive ability of the model can be increased, and the complexity of the model can be reduced [
33]. The regression coefficient β is an important index parameter in the PLSR model, and several classical variable selection algorithms based on this regression coefficient include Monte Carlo non-information variable elimination (MCUVE), competitive adaptive reweighted sampling (CARS), and bootstrapping soft shrinkage (BOSS). Useful variables are selected according to the stability of each variable in the MCUVE methods [
29,
32,
33,
34]. A large number of sample spaces can be generated using MCUVE, and PLSR models are built for each sample space. The stability of each variable can be obtained according to the mean value and standard deviation of PLSR regression coefficient. When the stability of a variable is less than the threshold, the variable is considered to be useless and is eliminated. The variables that are selected are still large in the MCUVE application, however, and the value of threshold has a significant impact on the results of variable selection. In the CARS algorithm, variables can be selected according to the absolute value of the regression coefficient [
10,
29,
31,
32,
34,
35,
36]. The variables with smaller absolute values of the regression coefficient must be removed. The absolute value of the regression coefficient will change, however, with the change of the sample space, which can result in the eliminated variables containing useful variables. The BOSS algorithm [
29,
32] is a kind of spectral line selection method based on variable space. Submodels are established based on a large number of variable spaces generated by the weighted bootstrap sampling (WBS) method and the Monte Carlo algorithm. PLSR models are built based on the submodel, and then, the absolute value of the regression coefficient is calculated and normalized to update the weight of each variable. The variables with bigger values of weight have a greater chance of being selected in the next iteration. The BOSS algorithm, however, considers only a characteristic factor “regression coefficient” in the variables space. Therefore, the selected variables may not be optimal. In light of the variable selection method problems, we proposed a variable selection method based on the combination of variable stability and cluster analysis algorithm (VSCAA) to select the infrared spectrum variables in this work.
In summary, development of a nondestructive method combined with a variable selection algorithm for the detection of the SSC, TA, and TI properties of pears is necessary. Therefore, the objectives of this study were as follows: (1) set up an ultra-compact NIR spectroscopy system to collect the spectral data; (2) preprocess the raw spectra by different preprocessing methods to eliminate the light-scattering effects; (3) utilize and compare SiPLS, SPA, BOSS, GA, and VSCAA methods as well as their combination to select the feature variables and enhance the model’s prediction ability; and (4) evaluate the performance of the PLS model based on the independent verification datasets.