In order to facilitate the visualization of the results obtained with the 4 models, we display RMSEP and RMSEC for the multiple data sets in graphics and summarized the best prediction metrics for each data set into individual tables. Over the next paragraphs we highlight the most relevant results.
3.1. PLS and MLR Results
From
Figure 8 we can see that, in general, the SNV correction to the spectra (open/white symbols) tends to have slightly higher RMSEP than their original counterparts, with the exception of data set E (and to a lesser extent the
Small data set). In order to understand the success of SNV in bringing the predictions of subset E down to acceptable values, it is important to realize that this subset has a large proportion of small and large fruit sizes, and that the largest errors of prediction are found in these sizes. In our optical setup there are significant changes in the distance between incidence and collection spots for both small and large pears, which in turn induce changes in the spectra. However, the SNV transformation is effective in compensating these changes. On the other side, the derivative transformation, which is usually assumed to be as efficient or better than SNV to compensate multiplicative scattering effects, is not so effective in this case (only derivative + SNV is effective). One other characteristic is that the data types that get the lowest RMSEP are those that include the temperature and fruit size as extra features, in particular those based on the absorbance 1st derivative.
The bottom panel of
Figure 8 shows a comparison of the mean RMSEP for the 12 pre-processed data types using the full spectra and excluding the Chl bands.
Table 2 summarizes the most relevant results. We find that the best overall result, although by a very small margin, for the PLS model comes by computing the absorbance 1st derivative of the spectra without Chl bands, corrected by SNV and extended with temperature and fruit size information. The global result is a slight improvement on SNV-treated models with the addition of the two extra variables and a slight degradation on the other models. This is probably related to the fact that the SNV correction to the spectra without Chl produces a much more regular/uniform set of spectra than that of the SNV applied to the full spectra. The variabilities associated with the Chl peak at 680 nm and the beginning of the spectra at 500 nm decrease the effectiveness of the SNV correction.
The results based on the Big and Small sets indicate a higher predictability of the PLS using the absorbance 1st derivative extended with temperature and fruit size. For these two data sets, the full spectra version, provides the best results. Although the RMSEP of the Small data set is lower than that of the Big data set, the Prediction Gain from the latter is much more significant. This is justified by the fact that the range of SSC available in the Small set is much narrower than that of the Big set.
The RMSEP and RMSEC obtained using the MLR model show a similar pattern to that found in
Figure 8 for the PLS model with just a couple os slight increases here and there.
Table 3 shows the summary of the calibration and prediction metrics for the
EV set and
Big set.
We also would like to notice that a comparison of the results obtained with PLS and MLR is useful to understand if the PLS feature selection pre-processing is good enough to decrease the collinearity in the data (collinearity degrades the MLR performance). We tested this by applying the MLR model to the full 1024 features and we could see that the results were worst. Another useful information that sprouts from this comparison is the ability to infer if the relationship between spectra and SSC could be interpreted as linear. The similarity in the performance between both models for the EV and Big data set is an indicative that this true. PLS pulls ahead in terms of performance but by a very small margin that, for most practical applications, can be considered statistically irrelevant.
3.2. SVM and MLP Results
Table 4 and the left panels of
Figure 9 shows the error metrics found using the SVM model. By looking at the RMSEC panel, we can see that this model fits all types of data almost the same way in terms of calibration, with slight better results for the absorbance 1st derivative extended with temperature and fruit size. The RMSEP panel shows a prediction pattern that resembles that of the PLS model. For the
EV subsets, D and E present the largest errors with the SNV version of E showing better performance. The
Big and
Small data sets maintain the calibration integrity that can be seen by the close RMSEC and RMSEP values. This calibration model shows the best
EV metrics of the benchmark with an RMSEP of 1.09
Brix found using the absorbance 1st derivative extended with temperature and fruit size on the spectra without the Chl bands. This is a marginal improvement over PLS for the same variable. The SVM also provides the best metrics for the
IVBig and
Small sets with RMSEPs of 0.82
Brix (on
abs1d2) and 0.62
Brix (on
abs1d_snv2) respectively.
Figure 9 presents in the right panels the error metrics for the MLP NN model and
Table 5 shows the summary of the best data types. For this model, the best preprocessed data type for the
Big set is the absorbance 1st derivative augmented with temperature and fruit size information using the full spectral range. As for the
EV set, the results show an increased RMSEP of 1.15 obtained using the SNV corrected absorbance 1st derivative augmented with temperature and fruit size. The
Small set, shows an RMSEP of 0.70 but since the brix range values is narrower than in the other sets, this translates in a worse PG. The prediction made using the MLP shows the same general pattern obtained for the previous calibration models. Nevertheless the difference between RMSECs and RMSEPs hint that this MLP model might be suffering from some degree of overfitting or could be better optimized.
3.3. Discussion and Remarks
From the 3 predefined validation strategies, the lowest prediction errors were obtained using the
Small set with the SVM calibration model (RMSEP = 0.62
Brix). This is was somewhat expected for the reasons pointed out in the introduction, i.e., it is a much more uniform data set and the SSC distribution is narrower than in the others. A quantitative comparison with other works is hard because there is no similar data about ‘Rocha’ pear published, and even for different pear cultivars, authors used different calibration models, data pre-processing techniques and spectral ranges. That being said, we can attempt a qualitative comparison. Our results obtained with the PLS model (RMSEP = 0.68
Brix) are on par with some of the results present in the literature based on this same type of model. Using PLS applied to spectral data sampled over a broad wavelength range (350–1800 nm), ref. [
12] reported an RMSEP = 0.66
Brix for a set of 80 ‘Fengshui’ pears. Other authors that implemented more sophisticated pre-processing/modelling techniques achieved better metrics. For example recently, ref. [
14] reported an RMSEP ≈ 0.48
Brix using PLS on 240 ‘European’ pears and pre-processing the spectra using Orthogonal Signal Correction; ref. [
24] reported an RMSEP = 0.526
Brix also using PLS on a set of 120 ‘Korla’ pears that included a feature selection pre-processing using a genetic algorithm named Competitive Adaptive Reweighted Sampling developed by [
47]. Furthermore, besides PLS, SVM has also been used and showed great potential. In [
26], the authors used a combination of Monte-Carlo-Uninformative variable elimination, Successive Projections Algorithm and Least Squares SVM on 240 pears (from 3 different varieties) to achieve an RMSEP as low as 0.32
Brix. Although these results that are obtained using more advanced analysis techniques seem promising, at this point we cannot be sure if the merit of the achievement is due to algorithmic improvement or to the small sets size and their statistical homogeneity.
For the
Internal Validation strategy (
Big set) most models reported an RMSEP below ∼0.9
Brix with the exception of the MRL model that fairs a bit worse at 0.91
Brix. Overall, the SVM calibration model provides the best metrics followed by the MLP. Using
IV, all models report the best predictions based on the absorbance 1st derivative extended with temperature and size information. This is also where PG and R
are higher. A comparison between RMSEC and RMSEP values for the SVM, PLS and MLR models indicate that these models are not overfitting. In contrast, looking at
Table 5 and
Figure 9, we can see clear differences between RMSEC and RMSEP obtained using the MLP. This difference hints that the MLP calibration model is probably suffering from a slight overfitting on the
Big data set for most of the data types. A thorough/computationally expensive grid-search or a bayesian hyper-parameter optimization could potentially lead to improved results in this scenario of
Internal Validation. Our MLP results are similar to those reported by [
16] that used a more complex NN architecture, an Extreme Learning Machine and Successive Projections Algorithm pre-processing on 100 pears (Zaosu and Huangguan varieties) to achieve a RMSEP = 0.89
Brix. We view our results as an opportunity for future model improvement because we have a much higher volume of data, we can test different optimization strategies for different data types and we can probe more sophisticated NN architectures. For example, Convolutional Neural Networks (CNNs) have been recently used in the VIS-NIR spectroscopy area for compound quantification with interesting results [
48,
49,
50].
As mentioned earlier we believe that the
External Validation provides a more robust way to measure how a model would behave if deployed in the real world. In this case, for the
EV data set, the best results were obtained using the SVM model, on the SNV corrected absorbance 1st derivative extended with temperature and fruit size (
abs1d_snv2) using the spectral range that excludes the Chl bands, with a RMSEP = 1.09
Brix. The PLS and MLR models provided very similar results. When we look at the prediction of individual
EV subsets (A, B, C, D, E) we find that for most subsets the SNV correction gives higher RMSEP than their non-SNV counterparts. However, when a systematic bias appears in some chunk of the data (of experimental or biological origin), SNV correction provides a major advantage. We see this in the prediction of
EV subset E (see
Figure 8). In this specific case, the training data (ABCD) is largely derived from average size fruits while the first half of E is basically constituted by average sizes and the second half by big and small fruits (where the prediction error is higher). As we explained earlier, SNV alleviates this discrepancy. Another factor that could be influencing the higher prediction errors found for subsets D and E is the harvest season. The prediction of D and E are based on calibrations made with 3 subsets from the 2010 harvest (ABC) and only 1 from the 2011 harvest (D or E). This supports the empirical knowledge that fruit’s chemistry change from season to season.
The development of model optimization methods, based on heterogeneous data sets would also be advantageous. Moreover, according to some of the cited literature, the use of a Least Squares SVM variant and NN architectures recently developed for deep-learning could lead to better results. The downside of this last remark is that the SVM and NN optimizations are very complex (and some times tricky) to implement. Their hyper-parameters space is large and their optimization methods depend heavily on the behaviour of data. Sometimes, a slight rescaling of the data can have big impacts on these models accuracy. On the other hand, PLS and MLR can be easily optimized providing results on par of the more complex models. This tradeoff is perhaps one of the main reasons for the large base of adoption of PLS.