**3. Results and Discussion**

### *3.1. Geographical Classification of EVOO Using NMR Spectroscopy*

Figure 1 and Table 1 show 1H NMR signals of the major and some minor compounds together with their chemical shifts and their assignments to protons of the different functional groups [10,24–28]. Figure 2 and Table 2 show the major peaks obtained using 13C NMR and identified using the literature [10,13,14,17,27–33].

**Figure 1.** The major peaks of interest obtained from the nuclear magnetic resonance (NMR) of extra virgin olive oils (EVOOs) using the zg30 pulse sequence (black) and NOESY pulse sequence (red).

**Figure 2.** The major peaks of interest obtained using the 13C NMR of EVOOs (black line Maltese EVOOs, red line non-Maltese EVOOs).

Whilst the chemical shifts of the major constituents are well known and easily identified, the 1H and 13C signals of the minor oil components are only observed when their signals do not overlap with those of the main components, and when their concentrations are high enough to be detected [11]. Minor constituents which are expected to yield NMR signals include mono- and diglycerides, sterols, tocopherols, aliphatic alcohols, hydrocarbons, fatty acids, pigments, and phenolic compounds [32]. Figure 1 shows the most common 1H NMR signals of the major and some minor compounds together with their chemical shifts and their assignments to protons of the different functional groups. The main identified compounds include; cycloartenol at 0.29 and 0.54 ppm, β-sitosterol at 0.62, 0.67 ppm, stigmasterol at 0.69 ppm, wax at 0.98 ppm, squalene at 1.66 ppm, sn-1,2 diglyceryl group protons at 3.71 and 5.28 ppm, and two unknown terpenes at 4.53, 4.65, and 4.95 ppm, hexanal at 9.7 ppm, and phenolic protons at, 6.95, and 6.72 ppm. These compounds have already been observed and identified by other authors [10,11,14,17,30]. In the case of 13C NMR, the minor constituents observed were restricted to chemical shifts corresponding to squalene, with a shouldering peak at 26.6 ppm and another minor peak at 28.2 ppm attributed to the allylic methylene group [26].


**Table 1.** Chemical shifts and the corresponding chemical functional group observed for 1H NMR.

**Table 2.** Chemical shifts and the corresponding chemical functional group observed for 13C NMR.


The discriminatory models for the traceability of EVOOs from the Maltese islands coupled 1H and 13C NMR spectroscopy with chemometrics. In order to overcome the instrumental limitation and to account for scattering and other minor variations which would hinder the performance of the classification model, different kinds of spectral pretreatments were tested and compared. A total of 10 spectral pretreatment methods were used. In each case, after pretreatment, a PCA was carried in order to dimensionally reduce the number of variables into a small set of principal components whilst retaining all the information of the larger set. PCA enabled the preliminary identification of which pretreatment offered the highest variability and possible sample grouping based on the geographical origin but also enabled the identification of outliers and noise modeling.

Figure 3 shows some of the different forms of spectral pretreatments employed and the corresponding PCA plot for the first two principal components. In the case of 1H NMR, although clustering was observed in most of the spectral pretreatments, it did not fully discriminate the EVOOs of Maltese origin from those obtained from other Mediterranean countries. Only a weak clustering resembling the geographical origin was observed by using PCA. For 1H NMR, the raw data was presented in Figure 3 as these were seen as the most representative data for highlighting clustering in PCA. Other spectral transformations can be viewed in the Supplementary Materials Figures S1–S3. In the case of 13C NMR, the clustering obtained using OSC and SNV spectral transformations highly resembled the geographical origin of EVOO.

**Figure 3.** The principle component analysis (PCA) biplots (black boxes= Maltese red dots=non-Maltese) and loading plots for PC1 (black line) and PC2 (red line) for the untreated raw data for the zg30 (**A**) NOESY (**B**), 13C NMR orthogonal scatter corrections (OSC) (**C**), and 13C NMR standard normal variate (SNV) (**D**) spectra.

Inspection of the PC loadings revealed a spectral form, which suggests that the variation observed was due to the actual NMR spectra and not due to noise. In the case of zg30, it was observed that the chemical shifts observed at 0.8 and 1.2–1.25 ppm and 0.5–1.25 ppm for the NOESY experiment seem to have a larger influence on the first and second principal component separation. These observations suggest that the phytosterol content, namely β-sitosterol, campesterol, cycloartenol together with 1-eicosanol and α-tocopherol, which show chemical shifts between 0.5–1.25 ppm, have a greater influence on the variation observed along the first two principal components. In the case of zg30, other peaks observed in the 4.7–4.9 ppm range also seem to be influential, especially in the 1st PC, these peaks correspond to terpenic compounds present in EVOOs. Alonso-Salces et al., [17,30] identified three peaks at 4.57, 4.65, and 4.70 ppm, which were attributed to unknown terpenes during their study on the unsaponifiable fraction of EVOOs. For 13C NMR, inspection of the PC loading plots corresponding to the previously identified chemical shifts were found to offer the most variation, with the peak at 14 ppm assigned to the terminal –CH3 of all acyl chains explaining most of the variation in the SNV spectra.

#### *3.2. Application of PLS-DA for the Discrimination of Maltese EVOOs*

The Maltese and the non-Maltese samples were grouped in ascending order so that the first 30 samples would represent Maltese EVOOs whilst the rest corresponded to non-Maltese EVOOs. A Venetian blinds cross-validation method was then employed, which selected every sth sample from the data by making data splits such that all samples are left out exactly once (s = 5). This sampling method excluded 20% of the dataset so that they would be retained as the testing set. The remaining 80% of the dataset was used to build the training set. In the case of PLS-DA, an inspection of the variable importance plot (VIP) scores was carried out. Variables having a smaller VIP than 0.8 were removed, and an adjusted PLS model was built after the removal of these variables. The goodness of fit of the adjusted model was evaluated and compared to the original model. Table 3 shows the accuracy (% correct classification during training) and the precision (% correct classification during testing) obtained on using different spectral pretreatments for the two NMR methods. For the zg30 NMR spectra obtained after deresolve, SNV and quantile normalisation showed the best model performance with a % accuracy ranging from 93.1–87.9% and % predictability ranging from 72.7–81.8%, whilst for the NOESY experiment, spectra treated using normalisation and Savitzky-Golay showed the best performance with an accuracy of 94.8% and predictability of 90.9%. In the case of the zg30 experiment, all the spectral pretreatments showed an improvement in the % predictability when compared to the raw data, whilst in the NOESY experiment, spectra treated using SNV and detrending functions showed a lower % predictability and % accuracy when compared to actual nonpretreated raw data. This observation suggests that, in the case of NOESY, the signal suppression of the major peaks improves the signal to noise ratio, and the resulting spectra can be used without the need of extensive pretreatments. Results obtained by Longobardi et al., [18] showed that the presaturation of the dominating lipid signals resulted in an increased receiver gain which in turn resulted in a signal-to-noise gain close to 10 compared to the zg30 spectra. In the case of 13C NMR, higher rates of accuracy and predictability were observed when compared to 1H NMR methods with a % predictability ranging from 66.7–100%, with OSC reaching 100% correct classification in both the training and validation sets. The higher rates of predictability of 13C NMR spectra were attributed to a higher signal-to-noise ratio, less coupling interactions resulting in a cleaner signal, proof of this is the % predictability of the raw untreated 13C spectra with respect to 1H spectra.

The next step was to build another PLS model, this time using only variables which had a VIP score > 0.8. Table 3 also shows the results obtained by using the adjusted PLS model for 13C and 1H NMR. An improvement in the overall % accuracy and predictability of the model. Furthermore, the models obtained using only VIP > 0.8 variables showed an increase in both %X and %Y explained, and a higher % accuracy and % precision indicating enhanced model performance. In the case of the zg30 experiment, it was found that normalised spectra and Savitzky-Golay derived spectra had the

optimal performance, whilst detrended and SNV spectra had optimal performance when the whole data set was used. In the case of the NOESY experiment, the models obtained using VIP > 0.8 showed an increase in the performance when compared to those obtained with whole data.

**Table 3.** The PLS-DA analysis on both the entire (**a**) 1H NMR and (**b**) 13C NMR spectra and selected variables having a VIP > 0.8 for the different spectral pretreatments. The results obtained on the training dataset are given in terms of % accuracy of correct classification whilst for the testing data set these are given in terms of % predictability of correct classification.


These observations indicate that different spectral pretreatments are affected differently to variable selection techniques since each one of them attempts to maximise spectral variations and corrections, therefore, removal of a small number of predictors can have a devastating effect on the model performance. In the case of 13C NMR, variable selection greatly improved the discrimination with most of the pretreated spectra reaching 100% accuracy and predictability. The noticeable increase in the model performance has been attributed to the removal of redundant variables which correct for overfitting by excluding noise variables from the data, therefore, preventing them from affecting

the model. Reducing the number of variables around which the model is built also increases the model's reliability.
