1. Introduction
The data set obtained from an MLAPV (multifrequency large amplitude pulse voltammetry) electronic tongue device comes from various types of sensors and their magnitudes can have different scales [
1]. These signals are characterized by having high dimensionality [
2]. This can cause problems in Machine Learning models, both in pattern recognition and in the accuracy of data classification [
3]. Due to this, it is necessary to perform the correct processing of these data sets to obtain high precision values for the classification of liquid substances.
In 2020, Leon-Medina et al. [
2] developed a methodology that seeks to improve the classification accuracy with an approach based on non-linear feature extraction of signals obtained with electronic tongue type sensor array devices. This methodology is composed of several stages: (1) Data unfolding, (2) Normalization, (3) Non-linear dimensionality reduction, (4) Classification by means of a supervised machine learning model and finally a (5) Cross validation [
2]. The application of the methodology in each stage includes the execution of algorithms in the software Matlab
®. These algorithms contain a series of parameters that must be configured. As a result of the application of the methodology, the value of the classification accuracy and the confusion matrix of the classification model used are obtained, together with their performance metrics.
Due to the number of stages and the different configuration options of the parameters in the algorithms, the need was generated to develop a tool that would facilitate the application of this methodology, guiding the user through the different stages and making the configuration of the algorithms more user-friendly. One of the main advantages of a graphical user interface (GUI) is that it makes an implemented system easy to use, understand and evaluate [
4].
Section 2, describes two tests performed by the developed GUI, as well as the datasets used in each one and the operation of the GUI. Then,
Section 3, illustrates the main findings obtained during the two tests applying the methodology of data processing through the GUI. Finally,
Section 4 shows the main conclusions in data processing through the GUI.
2. Materials and Methods
The measurements of the responses of an electronic tongue system are discretized currents in time. In this way, a measurement is obtained at each instant of time for each of the electrodes that make up the electronic tongue device, obtaining a matrix of size
I × K where
I are the experimental tests and
K are the time instants of the signal collected by each electrode. Due to the electronic tongue system has an array of sensors and taking
J as the number of electrodes. A data unfolding procedure is executed to convert the three-dimensional matrix
I × J × K, in a two-dimensional matrix
I × (J · K) [
2].
Figure 1 shows an illustrative graph of the Data Unfolding process.
In this work, two tests with the developed tool are performed using two different datasets. These tests are described below:
For the first test, a dataset obtained by means of a MLAPV electronic tongue developed by Liu et al. [
5] is used. The electronic tongue consisted of a platinum pillar auxiliary sensor, an Ag/AgCl reference sensor, and six working electrodes made of different materials, gold, platinum, palladium, titanium, tungsten, and silver. In the experiment, the fourth titanium electrode was damaged, so it was not considered in the data analysis [
5]. Seven liquids or aqueous matrices were used to collect the data from the
first dataset: (1) red wine, (2) Chinese liquor, (3) beer, (4) black tea, (5) oolong tea, (6) you maofeng and (7) you pu’er. Each one with three different concentrations (14%, 25% and 100%) of the original solution mixed with distilled water, to which three replications were made, that is, 9 samples for each liquid [
2], for a total of 63 samples. With 2050 measurement points per sensor and 5 sensors in the electronic tongue, when performing the Unfolding procedure of the data (described above, see
Figure 1), the dataset is composed of a matrix of size
63 × 10,250.
The second test uses a dataset obtained from the study by Zhang et al. [
6]. This second dataset contains the data collected from an MLAPV electronic tongue with five working electrodes made of gold, silver, palladium, tungsten and silver. The auxiliary electrode used is platinum pillar and the reference electrode is Ag/AgCl [
7]. For this study, 13 liquids or aqueous matrices (number of samples) were used: beer (19), red wine (8), white liqueur (6), black tea (9), tea Maofeng (9), pu’er tea (9), Oolong tea (9), coffee (9), milk (9), cola (6), vinegar (9), medicine (6), and salt (6), for a total of 114 samples [
6]. Like the first dataset, in the
second dataset there are 2050 measurement points per sensor and 5 sensors in the electronic tongue, when performing the Unfolding procedure of the data, the second dataset has a size of
114 × 10,250.
The developed GUI is an application made in Matlab
® App Designer, it is made up of seven tabs. Only the first tab is enabled at the beginning of the GUI, as shown in
Figure 2a). By means of the
Browser button in the
Data Selection section, the file containing the dataset previously ordered with the unfolding process is selected. Subsequently, the data is loaded in the GUI through the button
Load, after this, the size of the dataset is shown in the GUI,
Figure 2b) illustrates this process.
With the dataset loaded, the
Browser button is enabled in the
Class Labels Selection to select, in the same way as was done with the dataset, the file
Class Labels. Once this vector is loaded, the number of classes used can be viewed, see
Figure 3a.
After selecting the data files, the
Normalization tab is enabled, in which the method for data normalization can be selected, see
Figure 3b. With normalized data, the
Dimensionality Reduction tab is enabled where the Feature Extraction technique [
8] to reduce the dimensionality of the data can be selected, additionally there is a
Parameters section where it is possible to configure certain parameters depending on the selected dimensionality reduction technique, see
Figure 3c. With the data in low dimensionality, the
Plot tab is enabled for the selection and visualization of the variables in 2D and scatter plots, see
Figure 3d. Simultaneously, the
Classification/Validation tab is enabled where there are four classifiers, along with some parameters that can be configured depending on the selected classifier, see
Figure 3e. Executing the classification stage, the
Cross Validation section is enabled, which contains three validation techniques, see
Figure 3f. At the end of the procedure, the
Results tab is enabled, where the classification performance metrics [
9] and the confusion matrix are shown, see
Figure 3g. At the same time, the
History tab is enabled, in which a summary of the different techniques and methods used in data processing is presented, see
Figure 3h.
Figure 3 shows the sequence of enabling the GUI tabs throughout the data processing in each of the stages.
In relation to the plots int the GUI, the data loaded in the GUI, as well as the normalized data and data after the dimensionality reduction process can be visualized. A table or graph visualization can be obtained by each experiment in the corresponding tabs. In
Figure 4, the original data are observed, in the Normalization and Dimensionality Reduction stage, the graphs are made for sample 7 in the same stages.