*2.3. Experimental Procedure*

After opening each tequila bottle, the spirit was immediately taken. A sample volume of 1 mL of spirit was used directly without pretreatment to fill different UV-cuvettes free of dust and dirt to obtain trustworthy images. Additionally, a cuvette containing the same volume of deionized water was used as a blank solution. All experiments were carried out at room temperature (25 ◦C). The first measured sample with the EE corresponded to the blank solution to establish a system's reference signal. Subsequently, the UV-cuvettes containing different samples of tequila were measured one by one. The captured digitized images were recorded and stored using the programmed control software. During the entire experimental stage, it was ensured that the chamber remains closed during the image capture process to avoid the entry of external light and obtain good quality images.

Meanwhile, the white light source stayed on, waiting for the camera module to acquire the image and send it to the Raspberry Pi computer. Each sample was analyzed in triplicate, performing 10 repetitions each time to observe the repeatability and reproducibility of measures. The time to complete the measurement process by the EE system is 10 s.

### *2.4. Image Analysis*

Digital images were obtained after placing a UV-cuvette with tequila sample in the labmade EE system described above. In all cases, the camera settings used in our experiments were fixed (exposure time of 1/16 s, an aperture of f/2, and ISO 100). From the images captured by the EE of each tequila sample and the three categories involved, separate files were saved as a *jpeg* format on the Raspberry Pi memory; the average size per image is 2.7 MB (8 Megapixels resolution, 2592 × 1944 pixels). Although using compressed *jpeg* image format implies a loss of information regarding the *raw* format, some works have reported that the RGB obtained from them contained comparable information to those in large raw files [38,39]. Likewise, *jpeg* files retained the residing color information and allowed ease of handling due to the smaller file size, mainly when some multivariate calibration techniques were used to interpret them [39,40]. In our case, using the *jpeg*

format also allowed efficient use of hardware resources (in terms of data storage and computational power requirements), as well as, this image format is closest to the images obtained by the human visual system since they are transformed using color-matching functions [41].

For the image analysis process, it is necessary to perform a preprocessing task that consists of selecting and clipping a region of interest (ROI). The ROI was chosen considering the viewing window of the UV-cuvette. This cropped area of the image and its relative position concerning the sample support is always constant. In this way, the complete set of images were cropped and saved as a separate file with a new dimension size of 1244 × 231 pixels.

Taking into account that digital images are a numeric representation of a two-dimensional collection of data, a digital image contains a fixed number of rows and columns of pixels where each pixel is specified for the red, blue, and green coordinates of a pixel array. This conceptualization of the image is related to the trichromatic theory of color vision based on the work of Maxwell, Young, and Helmholtz [37]. This theory states that there are three types of photoreceptors in the human eye, approximately sensitive to the red, green, and blue region of the spectrum, which are related to the three types of cone cells, generally referred to as L, M, and S (long, medium, and short wavelength sensitivity). These cells are responsible for the perception of colors; analogously, in the RGB color model, the image can be represented by the color's intensity, which indicates how much red, green, and blue is present in the image [42]. Hence, each component varies from zero to 255 [43]. If all the components are zero, the result is black color. In the opposite case, the result is a white color.

In the same way, considering that the obtained images are true-color images, it is possible to represent them as 3D matrices associated with RGB components. Making it possible to observe its tonal distribution through a histogram and evaluate its corresponding absorbance [44]. The critical steps followed for the EE acquisition and elaboration of RGB images' regions are illustrated in Figure 2.

**Figure 2.** A generalized block diagram of acquisition and identification of image processing performed by the Electronic Eye.

The corresponding absorbances associated with the RGB components for the available image set were evaluated using the Lambert–Beer law. This law expresses the proportional relationship between the absorbance and the concentration of certain compounds present in the sample under analysis. The equation representing this law is a crucial element in evaluating the absorbance of a sample [45].

$$A\_{\lambda} = -\log\left(\frac{I\_1}{I\_0}\right) = \varepsilon b \mathcal{C} \tag{1}$$

where *Aλ* is the absorbance defined via the incident intensity *I*0 (incident light over the sample) and transmitted intensity *I*1 (transmitted light that comes out of the sample), λ is the wavelength of the source light, *C* is the concentration of the absorbent sample expressed in *moles* \* *<sup>L</sup>*−1, *b* is the optical path (thickness of the cell), and ε is the molar absorptivity coefficient.

Similarly, it is possible to establish that (1) expresses the proportional relationship between the absorbance and the concentration of certain compounds present in the sample under analysis. Consequently, it was part of the implemented algorithms.

Experimentally, when light continues its path from the sample, passes through the camera lens, and reaches the image sensor, some light intensity is lost. This effect is because once a beam of light passes through the UV-cuvette made of transparent material containing the sample, its intensity varies due to the phenomena of absorbance, reflection, and transmission [46]. Therefore, it is possible to compare the light intensity transmitted by a standard (in our case, obtained by a blank solution) and the interest sample's light intensity. This procedure allows to obtain an experimental absorbance, as shown below in (2):

$$A\_{\lambda exprimental} = \log\left(\frac{I\_{solvent}}{I\_{analyte\ solution}}\right) \tag{2}$$

where the experimental absorbance *<sup>A</sup>λexperimental* is evaluated by *Isolvent* related to the blank solution (in this work it was used deionized water) considered a standard sample, and *Ianalyte solution* corresponding to each tequila sample to be analyzed.

#### *2.5. Data Processing and Modelling*

Data image processing and modeling were done using the specific routines written in MATLAB®2020a by the authors, based on already preprogrammed standard functions using Statistics and Machine Learning Toolbox (v11.7). Before carrying out any data processing and modeling task, it was decided to obtain information on the brightness and tonality characteristics of the acquired images to corroborate the equipment's optical adjustment. For this purpose, histograms of each RGB component were obtained for each available image. Subsequently, the experimental RGB absorbances were calculated (as described in Section 2.4). These calculated values were used as input for two different analysis methods: Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). Considering that LDA is a supervised classification method, classification accuracy was evaluated using a Leave-One-Out Cross-Validation (LOOCV) procedure. This iterative method starts using as a training set all the available observations except one, which is excluded for use as validation.

As is known, PCA is an analysis method that depends on an orthogonal linear transformation, which allows summarizing almost all variance contained in a dataset on a fewer number of directions (PCs) with newer coordinates (scores) [47]. In most cases, PCA analysis allows showing clustering data according to their similarities, so it is possible to build a preliminary recognition model that shows the different classes involved according to the measurements made. Nevertheless, to perform a proper classification task, it is necessary to use a supervised learning approach. In this regard, LDA is one of the most used classification procedures with proved successful in many applications [48]. The idea behind LDA is to find a linear transformation that best discriminates among classes. This method operates maximizing between-class variability relative to within-class variability. In this manner, the classification is performed in the transformed space based on some metrics such as Euclidean distance. However, one of the most typical methods to implement is computing a scattering matrix, which must be non-singular. Nonetheless, this criterion cannot be applied when the matrix is singular. A situation that frequently occurs in applications using image databases for pattern recognition, where the number of measurements of each sample exceeds the number of samples in each class. To tackle this problem, it is possible to implement a two-stage approach based on PCA plus LDA. Considering that both methods project the data into a smaller subspace, PCA focused on finding the PCs that maximize the variance in the data set (without considering the class

labels), while LDA finds the components that maximize between-class separation. Detailed information about this improved LDA method can be found in [49,50].
