*2.1. EVOO Samples*

The study analyzed a total of 92 samples of Italian and foreign extra virgin olive oil (EVOO) owing to different cultivars, monovarietal (65) and blend (27), produced in two harvest years (2018 and 2019) (Figure 1).

**Figure 1.** Monocultivar and blend extra virgin olive oil (EVOO) samples.

**Figure 1.** Monocultivar and blend extra virgin olive oil (EVOO) samples**.** The tested samples were bought from large retailers and directly from mills. Some samples were acquired specifically from the mills of the areas of Apulia, Calabria, and Sicily to ensure their The tested samples were bought from large retailers and directly from mills. Some samples were acquired specifically from the mills of the areas of Apulia, Calabria, and Sicily to ensure their origin. Other samples were sent, on a voluntary base, directly by the producers willing to participate in the research.

origin. Other samples were sent, on a voluntary base, directly by the producers willing to

#### participate in the research. *2.2. The Open Source IoT Spectrometer*

The analyzed samples were stored and kept during the analyses at a controlled temperature of 16 ◦C. The samples, owing to the 2018 harvest campaign, were analyzed between March and May

March 2020. The samples were scanned with a VIS-NIR spectrometer measuring and acquiring the spectral reflectance signatures for the EVOO samples for consequent qualitative evaluation. From each oil container (bottle or can) of the same sample, 12 spectral readings were acquired and afterwards averaged. The device used was the ultra-compact VIS-NIR spectrophotometer (**Figure 2**) Lumini C (Myspectral Ltd., Cambridge, MA, USA), able to measure spectral reflectance or absorbance. The device is small, light, low-cost, and open source. The spectral ranges covered 340–890 nm with an optical resolution equal to 8 nm and wavelength accuracy equal to 0.5 nm. The spectrophotometer is powered through a USB cable and stores data on connected cabled devices or on an internal micro SD card using a dedicated slot. For appropriate acquisition of the spectral signature, in relation to the sample reflectance characteristics, the

2019 while those produced in the 2019 harvest campaign were analyzed between February and March 2020. The samples were scanned with a VIS-NIR spectrometer measuring and acquiring the spectral reflectance signatures for the EVOO samples for consequent qualitative evaluation. From each oil container (bottle or can) of the same sample, 12 spectral readings were acquired and afterwards averaged. The device used was the ultra-compact VIS-NIR spectrophotometer (Figure 2) Lumini C (Myspectral Ltd., Cambridge, MA, USA), able to measure spectral reflectance or absorbance. The device is small, light, low-cost, and open source. The spectral ranges covered 340–890 nm with an optical resolution equal to 8 nm and wavelength accuracy equal to 0.5 nm. The spectrophotometer is powered through a USB cable and stores data on connected cabled devices or on an internal micro SD card using a dedicated slot. For appropriate acquisition of the spectral signature, in relation to the sample reflectance characteristics, the acquisition can be set at different integration times. The system is equipped with its own internal illumination system.

**Figure 2.** VIS-NIR ultra-compact spectrophotometer Lumini C Myspectral using standard cuvette holder for absorbance spectrophotometry.

A specific app was developed to manage and simplify the acquisition procedures. The software provided with the spectrophotometer, as commonly happens with open source technologies, was quite poor in terms of features and did not originally provide an appropriate historicization system for multiple acquisitions. For this reason, an app was developed and implemented. A screenshot of the app is reported in Figure 3.

The app was engineered considering two kinds of functions. The first (upper side of Figure 3) enables the configuration parameters of the instrument, such as the IP address, to connect the tablet to the device, the type of tool (in this case is Lumini C), the exposure time expressed in milliseconds (ms), and the sample's name to be archived. The second (lower side of Figure 3), graphically represents the acquired spectrum for each scan. When a new sample name is entered, the graphic area is reset, ready to display the new spectra. This helps in case of incomplete or bad acquisition since it avoided losing samples' values during the acquisition campaign. The app was developed using the Android environment and it is based on a client-server paradigm; on the client side there is the app, and on the server side there is the database for real-time storage of the spectrum and the node.js server to which the Lumini C is connected (Figure 4). The app software implements control mechanisms for the data stored on the database; these are essential since the data stored originally onboard within a microSD are now stored to a remote database. Through this mechanism, the data loss is minimized. In case of communication problems among the devices, the app notifies the problem and does not display the spectrum just acquired, allowing for a new scanning process.

**Figure 3.** Screenshot of the Lumini app control CREA-IT for spectrophotometric acquisitions of EVOO samples.

LUMINI **Figure 4.** Block diagram of the Lumini C acquisition system via Android app.

#### *2.3. Statistical Analysis*

APP CONTROL SERVER NODE.JS The multivariate matrix of Italian and foreign EVOO samples was analyzed with a 50–50 multivariate analysis of variance (MANOVA) procedure [38], a generalized multivariate Anova method based on principal component analysis (PCA) standardized data. The MANOVA was conducted in order to highlight significant differences between Italian and foreign VIS-NIR matrices. Adjusted *p*-values were conducted on a rotation testing based on 99,999 simulated datasets. The contribution of the variables was extracted for each rotation test [39].

SERVER DB An artificial intelligence approach was then applied in order to evaluate the possibility to classify Italian EVOOs and distinguish them from the foreign ones on the base of the 288 spectral transmittance values acquired through the VIS-NIR device. To do this, a multilayer feed forward artificial neural network (MLFN) was designed using a single hidden layer architecture with sigmoid hidden and SoftMax output neurons. The ANN was trained with the Bayesian regularization back propagation algorithm [40,41], as implemented in the deep learning MATLAB (The MathWorks, Inc., MA, USA) toolbox. The dataset was partitioned using 60 percent of the samples (55) as a training set and the

rest as a test set (37). The test set was used to validate the model. This partitioning (equal for each soil group) was optimally chosen with the Euclidean distances calculated by the algorithm reported by Kennard and Stone [42], selecting parameters without a priori knowledge of a regression model. The cost function was minimized using the root mean squared (RMS) normalized error performance function with a 10−<sup>8</sup> threshold on the gradient. In order to extract the most informative spectral transmittance values among the 288 acquired, in distinguishing Italian EVOO from foreign ones, it also conducted an analysis to study the feature importance. The hidden layer matrix (10 nodes × 288 variables) was a posteriori analyzed considering its elementwise absolute value. From the matrix was extracted the maximum value for each variable (e.g., column) obtaining a 1 × 288 row vector. The top 40 most significant spectral frequencies were chosen. The larger the value, the more relevant was the contribution to the ANN model. The model was developed using the MATLAB 9.7 R2019b Deep Learning Toolbox.

## **3. Results and Discussion**

## *3.1. Artificial Intelligence Modeling Based on VIS-NIR Spectra*

The MANOVA (50–50 MANOVA procedure) reported significant differences (*p* < 0.001) between the two Italian and foreign EVOO VIS-NIR matrices. The results of the analysis are reported in Table 1.


**Table 1.** MANOVA results based on Italian and foreign EVOO samples.

DF, degrees of freedom; exVarSS, explained variances based on sums of squares; nPC, number of principal components used for testing; nBu, number of principal components used as buffer components; exVarPC, variance explained by nPC components; exVarBU, variance explained by (nPC+nBU) components; *p*-value, the result from 50–50 MANOVA testing.

The ANN trained had a hidden layer size of 10 nodes and the algorithm converged after 976 iterations. Table 2 reports the characteristics and principal results of the ANN model used to predict Italian vs. foreign EVOO on the base of 288 VIS-NIR spectral transmittance data. All the 55 EVOOs in the training set were correctly classified. In testing, only five out of 37 samples were misclassified. These five samples consisted of two Italian commercial monocultivars (Coratina from Apulia and Taggiasca from Liguria) and three foreign blends from Greece, Argentina, and Croatia. Overall, 87 out of 92 samples (94.6%) were correctly classified.

**Table 2.** Characteristics and principal results of the multilayer feed forward artificial neural network (MLFN) model (training and internal test) in predicting the classification of Italian vs. foreign EVOO: number of cases, training time, number of trials, and percentage of bad predictions.


The confusion matrix of the test set is reported in Table 3.

**Table 3.** Confusion matrix of the test set of the MLFN model used in predicting the classification of Italian vs. foreign EVOO. The correctly classified samples are reported on the main diagonal of the matrix.


Overall, VIS-NIR spectroscopy analyses showed significant differences between Italian and foreign samples. From the results obtained through the ANN analysis, only five samples out of 37 were misclassified, e.g., two Italian commercial monocultivars (Coratina from Apulia and Taggiasca from Liguria) and three foreign blends (from Greece, Argentina, and Croatia). Probably, the two Italian samples were misclassified because of their uncertain geographical origin, considering that they are commercial oils. All the samples bought directly from the mills (noncommercial) were correctly classified. The off diagonal elements of the test confusion matrix (Table 3) are reported in Table 4.

**Table 4.** Off diagonal elements of the test confusion matrix reported in Table 3.


Generally, machine learning relies on the amount of data for good modeling, where more data correspond to a modeling approach with increased robustness and performance. For this reason, even if the overall accuracy of the model is almost 90% and the convergence threshold of 10-8 on the RMS error gradient is very strict, the small size of the dataset (made of 92 samples) is not enough to validate the model. On the other hand, the high accuracy obtained despite the small dataset returns the reliability of the correlation observed [43].

The present work considered 67 Italian EVOOs and 25 foreign ones (two harvesting years: 2018 and 2019). However, it must be considered that other work using different methods to authenticate EVOO geographical origin were developed using a number of samples comparable and sometimes lower than that presented in this work. As reported by Bucci et al. [44], the data set for the statistical analysis was constructed on the results of the chemical analyses performed on 153 EVOOs (years of harvesting: 1997–1999), but finally only the samples produced in 1999 (53 oils) were analyzed in the laboratory. In the work conducted by Portarena et al. [45], they analyzed the isotopic composition and carotenoid content of 38 EVOOs from seven regions along the Italian coast using isotope ratio mass spectrometry (IRMS) and resonance Raman spectroscopy (RRS). The correlation between color and pigment content is well known in the literature [46]: the crushing of very green olives produces a typical green colored oil due to the high content in chlorophyll; if olives are more mature, carotenoids will prevail, determining a yellow-gold colored oil. Additionally, as the maturation progresses, the content and profile of phenolic compounds will also be affected: crushing green olives will result in an oil characterized by a higher content of phenolic acids, phenolic alcohols, oleuropein, and secoiridoids, whereas oils produced with dark brown olives will have a high content of anthocyanins, water-soluble plant pigments that take on different colors: red, blue, or violet [47,48].
