*2.1. Samples*

A total of 235 commercially blended EVOO samples were collected over a period of four years (2009–2012), from different Italian regions (Tuscany, Sicily and Apulia), as well as European (Spain, Portugal) and non-European (Tunisia, Turkey, Chile, Australia) countries. All the samples were stored in sealed dark glass bottles at room temperature in the dark prior to laboratory analysis. A detailed description, including the Italian regions and/or country of origin, the different cultivar composition (blend type) and number of olive oil samples for each country, is summarized in Table 1. Moreover, indicative average climate data have been reported and tentatively correlated to the oil characteristics, essentially for ease of access. (Florence-Parentola-Tuscany, Bari-Apulia, Catania-Sicily, Seville-Spain, Lisbon-Portugal, Sydney-Australia, Santiago-Chile, Tunis-Tunisia, Istanbul-Turkey). Average precipitation (monthly cumulative rainfall) and temperature were calculated for each studied area over a four-year period (2009–2012).

**Table 1.** Origins (from Italian regions and/or country), blend type and number of olive oil samples are reported in columns. Average of monthly cumulative rainfall (mm) and temperature (◦C) are reported for each country and calculated over a four-year period (2009–2012).


Values of average of monthly cumulative rainfall (mm) and temperature (°C) are cited from Climate Change Knowledge Portal (http://sdwebx.worldbank.org/climateportal/).

#### *2.2. Nuclear Magnetic Resonance Spectroscopy*

For each NMR sample preparation, 20 mg of olive oil was exactly weighed, dissolved in a volume of 0.9 mL of deuterated chloroform (CDCl3), and transferred directly to a 5 mm NMR tube. All the 1H NMR spectra were recorded on a 499.84 MHz spectrometer, operating at 11.7 T (Varian NMR UNITY INOVA Narrow Bore, workstation UNIX-based Sun Microsystems, Varian NMR Instruments, Palo Alto, CA, USA). Experiments (pulse program s2pul) were run at 298.15 K, using a 12 ms pulse 56 db (90◦ flip angle), an acquisition time of 5.82 s (64 k data points) a spectral width of 5500 Hz (11 ppm) and 16 transients. Prior to Fourier transformation, the free induction decays (FIDs) were zero-filled to 128 k and a −0.15 Hz line-broadening factor was applied.

#### *2.3. Multivariate Data Processing*

The data were Fourier-transformed, and phase and baseline corrected with ACD/NMR software (Advanced Chemistry Development, ACD/Spectrus software, version 2016.1.1, Toronto, ON, Canada). Chemical shifts were expressed in δ values relative to CHCl3 (δ 7.27 ppm) as internal reference. Spectra were segmented with a variable size intelligent bucketing width of 0.04 ppm and 50% looseness factor. The interval containing the signals of the solvent (in the range 7.60–6.90 ppm) was removed, and the sum of the remaining integrals (buckets) normalized for each spectrum. A total of 221 variables for each 1H NMR spectrum was obtained and considered for statistical analysis. Since the NMR spectra were dominated by the resonances of functional groups of all the fatty acids, each bucket row represents the entire NMR spectrum, and all the molecules present in the sample. The data table generated by all aligned bucket row reduced spectra was used for multivariate data analysis. The Pareto scaling method, which is performed by dividing the mean-centered data by the square root of the standard deviation, was then applied to the variables. Multivariate statistical analysis and graphics were obtained using SIMCA-P (version 14, Sartorius Stedim Biotech, Umea, Sweden). For multivariate statistical analyses of bucket reduced NMR spectra, different statistical procedures (principal component analysis (PCA) and orthogonal partial least squares discriminant analysis (OPLS-DA)) were used. PCA is used as a preliminary step in multivariate analysis of data. It works by reducing the dimensionality of data and reveals the presence of correlations among the samples. The principal components (PC*1*, PC*2*, . . . , PC*n*) are linear combinations of the original variables (in this case the NMR data) accounting for most of the variation in the data set. Hence, when a significant correlation occurs the number of useful PCs is much less than the number of original variables [23–25]. While PLS-DA is one of the most recent supervised MVA techniques used to discriminate samples with different characteristics according to known classification classes (such as cultivars and/or geographical origin) [22,26], we preferred OPLS-DA in our studies. As shown in several studies of metabolomics, OPLS-DA is a modification of the usual PLS-DA method that filters out variation that is not directly related to the focused discriminating response, by separating the portion of the variance useful for predictive purposes from the non-predictive variance (which is made orthogonal). The result is a model with improved interpretability. Furthermore, OPLS-DA condenses the predictive information into one component, facilitating the interpretation of spectral data. The R2(cum) and Q2(cum) are the two parameters used to describe the goodness of the model at the minimum number of components cumulatively required (cum) to optimally give account of the data variability. The R2 explains the total variations in the data, giving a quantitative measure of the goodness of fit. The goodness of prediction was estimated by Q2(cum), according to cross validation (sevenfold cross-validation) [26–28].
