*2.3. Vibrational Spectroscopy*

Before the data collection, all the olive oil samples were heated to 65 ◦C in a lab oven (Precision Standard Incubator, PR205125G, Thermo Fisher Scientific, Waltham, MA, USA) to liquefy all the samples to the same level. FT-IR Spectroscopy: Spectra of each oil sample were acquired using a portable 5500a series compact Fourier-Transform IR spectrometer (Agilent Technologies Inc., Santa Clara, CA, USA) equipped with a temperature controlled, 5-reflections ZnSe crystal attenuated total reflectance (ATR) accessory, which was set to 65 ◦C to prevent fat solidification during the spectral collection. Thermoelectrically-cooled deuterated triglycine sulfate (dTGS) detector was used to measure the amount of light absorbed by the sample. Data collection was done in duplicate. A 75 µL oil aliquot was deposited onto the heated crystal. Spectra were collected over a range of 4000–700 cm−<sup>1</sup> at 4 cm−<sup>1</sup> resolution and by co-adding 64 scans, to improve the signal-to-noise ratio. Spectral data were displayed in terms of absorbance and viewed using Resolutions Pro Software (Agilent, Santa Clara, CA, USA). Raman Spectroscopy: Olive oil samples were heated (65 ◦C) in a lab oven before the analysis. Three milliliters of olive oil sample was placed in a quartz cuvette (Hellma Analytics, Mullheim, Germany) with the 10-mm light path for Raman analysis using a WP 1064 compact benchtop Raman spectrometer (Wasatch Photonics, Durham, NC, USA). The Raman spectroscopy was equipped with an Indium Gallium Arsenide (InGaAs) detector and a laser source operating at 1064 nm. The Raman spectra were collected from 250 to 1850 cm−<sup>1</sup> with a resolution of 4 cm−<sup>1</sup> and 3 scans were co-added and averaged to improve the signal-to-noise ratio of the spectrum with an integration time of 3000 ms. Between each sample, the background spectrum was acquired to eliminate environmental variations. Spectral data were displayed in terms of scattered light by the sample and viewed using EnlightenTM software (Wasatch Photonics, Durham, NC, USA). Spectral data collection was done in duplicate.

## *2.4. Multivariate Data Analysis*

The spectral data were imported as GRAMS (.spc) and Excel (.xls) files and analyzed using Pirouette® multivariate statistical analysis software (version 4.5, Infometrix Inc., Bothell, WA, USA). FT-IR spectral data were transformed by smoothing (35 points) and taking the Savitsky–Golay second derivative (35 points with second order polynomial filter). Raman spectral data were preprocessed using mean-center and transformed taking the Savitsky–Golay second derivative (35 points with second order polynomial filter). Samples with high residual and leverage were re-evaluated and excluded if needed. The remaining samples were randomly divided into two sub-groups as calibration (80% of the total sample size) and validation (remaining 20%) sets.

Classification analyses of olive oils were performed by using soft independent modeling of class analogy (SIMCA), a supervised pattern recognition classification technique that uses previous knowledge about the category membership of samples to classify new unknown samples in one of the known classes based on its pattern of measurements [35]. The optimal number of principal components (PCs) for each class in the training set was determined by cross-validation, thus, lessening the effect of noise-laden PCs in the class model [35]. Class boundaries surrounding each class in the multivariate space represented the mean residual standard deviation of the training samples for a given class based on an F-statistic value set at a 95% specific confident interval. Interclass distances measure class separation in the multivariate space and interclass distances between groups of objects above 3.0 is regarded as significant to identify 2 groups of samples as different classes [36]. Lastly, the prediction of class membership was achieved by comparing the residual variance of an unknown to the average residual variance of the classes in the model using an F-test [37]. SIMCA only assigns unknown samples to the class for which it has the smallest residual, not forcing class assignments if the residual variance of an unknown exceeds the upper limit for every modeled class in the dataset. The sample will not be assigned to a class because it is either an outlier or comes from a class not represented in the model [37].

Partial least squares regression (PLSR) models were developed using infrared and Raman spectra and reference values obtained for fatty acid composition, free fatty acids, peroxide value, pyropheophytins, and total polar compounds. Separate PLSR models were developed for the infrared and Raman systems for each of the compounds of interest. PLSR combines features from principal component analysis (PCA) and multiple regression to solve problems involving high collinearity and to determine a set of dependent variables from a (very) large set of independent variables or predictors [38,39]. The PLSR algorithm extracts a set of orthogonal factors called "latent variables" that explains most of the variance from the X (spectra) and Y (concentration), generating an algorithm that diminishes the potential impact of large, irrelevant variations in the X matrix [39]. Leave-one-out cross-validation was applied to determine the optimal number of factors to prevent over- or under-fitting and to improve the modeling performance and the quality of the prediction [38]. The quality of the final model was evaluated based on the number of latent variables, loading vectors, standard error of cross-validation (SECV), the coefficient of determination (R-value), standard error of prediction (SEP), and outlier diagnostics, while outliers were determined using residual and Mahalanobis distances. The performances of models were determined by calculating the specificity and sensitivity based on true positive (TP, predicted result and actual label are both positive), false positive (FP, predicted result is positive while the actual label is negative), true negative (TN, predicted result and the actual label are both negative) and false negative (FN, predicted result is negative while the actual label is positive) classifiers [40].
