**3. Results**

#### *3.1. RGB Image Processing*

The experimental phase with the EE allowed capturing a total of 750 tequila images (10 photos for each sample of the 25 tequilas in triplicate). The selected ROI is automatically defined and fixed for all analyzed sample images from these data, as was described in Section 2.4. Figure 3 shows four representative samples and their corresponding captured images for one tequila sample per class plus the blank solution. The black dotted lines within the UV-cuvette image denote ROI selected image area. The histogram visualization shows the presence of reddish, greenish, and blueish pixels in association with the corresponding RGB components of the images. It is possible to show that both the distribution of these color components and their intensity is clearly different for each tequila type. Similarly, it can be assumed that the information captured in the images using fixed camera parameters (exposure time, aperture, and ISO) and under the same lighting conditions is representative to build a classifier model to identify different categories of tequila.

**Figure 3.** Representative images of ( **A**) Blank solution, (**B**) Silver tequila, ( **C**) Aged tequila, and ( **D**) Extra-aged tequila samples. The image captured by the Electronic Eye (EE), region of interest (ROI) and Red, Green, and Blue (RGB) intensity histogram is noted from left to right.

As a reference, color has been one of the important factors in food quality measurement [39]. For this purpose, it is possible to use the RGB model because it is one of the best for detecting color variations of digital images. In this way, the acquired images were organized as a matrix of dimension 30 × 75, where the rows correspond to the total number of repetitions (3 tests with 10 repetitions for each test), and the columns represent the 25 tequila samples analyzed by triplicate. The intensities of RGB components are summarized in Table 2. It also integrated each RGB component's absorbance and samples' total absorbance, obtained through Equation (2).

**Table 2.** Electronic Eye RGB component's intensities and absorbances values for tequila samples. Furthermore, total absorbance values are reported.


It is possible to establish a relationship between the absorbance and the sample's content of each image provided by the Electronic Eye. According to the RGB model, an image's absorbance was calculated about each color component's average. As shown in Table 2, the average and standard deviation of each color intensity component were obtained together with their related absorbance from different tequila samples' images.

The Silver tequila sample's absorbance is 0.0644 ± 0.0034, while for Aged and Extraaged tequila samples, the average absorbance is 0.0785 ± 0.0024 and 0.0931 ± 0.0019, respectively. The variability presented in the samples can be attributed to the characteristics of each brand's product, as well as to their particular aging process. Thus, the lowest absorbance values in Silver tequila are associated with its colorless and pure tone.

Depending on the tequila aging process, the tone can be yellowish for Aged tequilas or amber for Extra-aged tequilas. In this manner, while the intensity in the tequila tone increases, the absorbance values also increase.

Related to the RGB components' intensity, Silver tequila samples showed a prevalence of the three components. However, the Aged tequila samples predominate the red and blue components, whereas the blue component is more present and has the greatest intensity in the Extra-aged tequila samples. These differences have been associated with shades

present in samples, despite being the same type of tequila, and these differences depend on the brand.

It is possible to observe that the similarity among obtained measurements for each tequila sample within the same class is minimal since the deviations are in the order of 0.0001–0.0005, demonstrating repeatability in the operativity of the designed EE.

To visualize the behavior of the RGB absorbances of the different tequila samples, radar plots were constructed. Figure 4 shows the RGB average absorbance of the complete set of tequilas grouped in each of the three categories under study. Here it is possible to observe some characteristic fingerprints for each type of tequila related to their optical properties. This evident pattern for each tequila class (i.e., Silver, Aged, and Extra-aged) will help interpret this information by the planned classifier models. The idea behind a pattern recognition process is to recognize the regularities present in data by a computational model that uses machine learning algorithms.

**Figure 4.** Radar plots for analyzed tequila samples with their respective absorbance values. (**A**) Silver, (**B**) Aged, and (**C**) Extra-aged.

#### *3.2. EE Preliminary Recognition Model*

Before modeling, RGB average absorbances were normalized to an interval of 0 to 1 to reduce illumination effects and for data treatment convenience. Afterward, a PCA analysis was done to build a preliminary recognition model, expecting to observe some sample clustering caused by the own absorbances and tequila class-related. The PCA plot with the three significant PCs is shown in Figure 5. Here the accumulated explained variance was ca. 99.96% with characteristic clusters that partially discriminate the different tequila kinds. That is, most of the Silver tequilas seem to be grouped in the upper right region of the plot, while the Aged tequilas are concentrated in the center, and the Extra-aged ones appear grouped in the left region. However, apart from the marked dispersion of these last two categories of tequilas, there is a clear overlap between some of their samples.

Although the aging mechanisms have been widely studied for different alcoholic beverages such as wine and spirits [51,52], there is still no scientific report that addresses it for tequila. Thus, considering that one of the physicochemical characteristics that are impacted during this process is the color, it is then possible to assume that the absorbances obtained with the electronic eye are also related to the aging of the analyzed tequila samples.

In this sense, the clustering regions observed in the PCA make sense when identifying that samples were grouped within the proper class. On the other hand, each cluster has a relationship with a different aging period. As a result, the dispersion present in the Aged and Extra-aged tequila cluster is clearly related to the aging times that each producer stipulates for their product. On the contrary, in the Silver tequilas cluster, the dispersion is minimal because these tequila samples do not have an aging process.

**Figure 5.** PCA score plot of the three first components obtained after analysis of tequila samples. As can be seen, some clustering is obtained according to different tequila classes.

Thus, it is highly probable that there are tequilas with different aging times within the set of tequila samples analyzed despite belonging to the same category. This may be because each tequila producer must comply with Mexican regulations to respect the minimum aging time. However, they can also establish longer aging periods without violating the standard's provisions to offer a product with better organoleptic characteristics than their competitions.

For this reason, to confirm these initial identifications seen by PCA, the next step was the use of LDA as a supervised pattern recognition method.

#### *3.3. Tequila Categories Discrimination*

Transformed data obtained by PCA were used as input information to perform LDA. Since this is a supervised method, classification success was evaluated using LOOCV. In this scheme, each sample is classified by means of the analysis function derived from the remaining samples (all cases except the case itself). This process was repeated as many times as the number of samples in the data set (i.e., 25 times), leaving out one different sample each time, considering it as a validation sample. With this approach, all samples are used once for validation. As can be observed in Figure 6, clear discrimination between the three categories of tequila was achieved. The clusters in the figure evidence that tequila samples are grouped according to their associated aged process. Although Silver tequilas are clearly grouped on the left region of the plot, the Aged and Extra-aged tequilas have class centroids located in the middle and right regions.

**Figure 6.** LDA score plot of the obtained functions after analysis of tequila samples, according to their category. Dotted lines represent classification clusters. In addition, the centroid of each class is plotted (x).

The average classification results obtained from the 25 LDA models built are reported in Table 3. Predictably from the LDA plot, the tequila samples managed to be correctly classified as Silver and Aged, reaching high classification rates (100% and 91.67%, respectively). In contrast, the Extra-aged class did not exceed 78.40% correct classification. The overall classification rate for the three classes was 90.02%. In order to evaluate the efficiency of the modeling, accuracy, precision, sensitivity, and specificity values were also calculated. It is possible to notice that sensitivity averaged for the three classes considered was 0.90, whereas specificity was 0.96.


**Table 3.** Average classification results of EE for the discrimination of different tequila samples according to expected categories employing PCA-LDA.

Many studies have established that the overall classification rate is not the best criterion for measuring classifier performance where there is an imbalance in the number of samples per class [53]. In this direction, to corroborate that the results obtained from the LDA modeling are significant, it is necessary to use another criterion that reflects with more certainty the performance of the classifier in contexts of this imbalance. A well-known alternative measure to the accuracy is Cohen's kappa coefficient [54]. The fundamental idea

for its calculation involves analyzing the differences between the reference data and the incoming data determined by the main diagonal of the confusion matrix, see definition (3).

$$\kappa = \frac{N \sum\_{i=1}^{n} m\_{i,i} - \sum\_{i=1}^{n} (\mathbf{G}\_i \mathbf{C}\_i)}{N^2 - \sum\_{i=1}^{n} (\mathbf{G}\_i \mathbf{C}\_i)} \tag{3}$$

where *i* is the class number, *N* is the total number of classified values compared to truth values, *mi*,*<sup>i</sup>* is the number of values belonging to the truth class *i* that have also been classified as class *i* (i.e., values found along the main diagonal of the confusion matrix), *Ci* is the total number of predicted values belonging to class *i*, and *Gi* is the total number of truth values belonging to class *i*.

Thus, kappa is an indicator that acquires values between 0 and 1, the first representing the absolute lack of agreemen<sup>t</sup> and the second, total agreement. According to their scheme, a value <0 indicates no agreement, 0–0.20 as slight, 0.21–0.40 as fair, 0.41–0.60 as moderate, 0.61–0.80 as substantial, and 0.81–1 as almost perfect agreement.

In this regard, kappa values were calculated for each of the 25 LDA models built considering the LOOCV process, obtaining an overall mean kappa coefficient of 0.87, which is defined as "perfect agreement". This finding indicates that this high agreemen<sup>t</sup> is related to reliable data. In other words, the RGB absorbances used to identify the tequila samples are representative enough to be modeled. Likewise, although the tequila classes are imbalanced, the LDA models do not privilege the Aged tequila class with the greatest number of samples over the Extra-aged tequila class with the least number of samples.

Additionally, from the obtained results, it is possible to confirm that even using a LOOCV does not produce an over-optimistic approach in the LDA classifiers performance.
