Feature Generation with Genetic Algorithms for Imagined Speech Electroencephalogram Signal Classification

Lara-Arellano, Edgar; Takacs, Andras; Tovar-Arriaga, Saul; Rodríguez-Reséndiz, Juvenal

doi:10.3390/eng6040075

Open AccessArticle

Feature Generation with Genetic Algorithms for Imagined Speech Electroencephalogram Signal Classification

Facultad de Ingeniería, Universidad Autónoma de Querétaro, Querétaro 76010, Mexico

^*

Author to whom correspondence should be addressed.

Eng 2025, 6(4), 75; https://doi.org/10.3390/eng6040075

Submission received: 29 January 2025 / Revised: 27 March 2025 / Accepted: 8 April 2025 / Published: 10 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

This work presents a method for classifying EEG (Electroencephalogram) signals generated when a person concentrates on specific words, defined as “Imagined Speech”. Imagined speech is essential to enhance problem-solving, memory, and language development. In addition, imagined speech is beneficial because of its applications in therapy fields like managing anxiety or improving communication skills. EEG measures the electrical activity of the brain. EEG signal classification is difficult as the machine learning (ML) algorithm has to learn how to categorize the signal linked to the imagined word. This work proposes a novel method to generate a specific feature vector to achieve classification with superior accuracy results to those found in the state of the art. The method leverages a genetic algorithm to create an optimal feature combination for the classification task and machine learning model. This algorithm can efficiently explore ample feature space and identify the most relevant features for the task. The proposed method achieved an accuracy of 96% using eight electrodes for EEG signal recordings.

Keywords:

imagined speech; EEG; feature extraction; genetic algorithm; classification; optimization algorithms; signal processing; biomedical

1. Introduction

Imagined speech, also known as silent speech or inner speech, is thinking in the form of sound: “hearing” within one’s head silently, without the intentional movement of any limbs such as the lips, tongue, or hands [1].

Processing imagined speech is important because it helps us understand how the brain generates and controls inner thoughts, supports communication skills, and aids in developing brain–computer interfaces (BCI) [2].

Imagined speech improves cognitive processes such as memory by activating neural mechanisms associated with internal rehearsal and working memory. Studies suggest that subvocal articulation, whether expressed or imagined, facilitates encoding and retrieval by strengthening the neural circuits involved in language processing. This phenomenon is especially relevant in memory processes, where inner speech helps to organize and retain information [3].

Advances in this field may lead to more effective therapeutic strategies for individuals with speech disorders, as interventions can be designed to target specific neural pathways associated with particular aspects of speech. A speech or language disorder is characterized by difficulties forming or creating sounds crucial for effective communication. These disorders can manifest in various ways, the most common being articulatory, phonological, voice, and resonance disorders. Articulatory disorders involve difficulties physically producing certain sounds, often due to poor tongue, lips, or jaw positioning [4].

On the other hand, phonological disorders originate in the cognitive aspect of sound production. In these disorders, the person has difficulty understanding the sound rules of their language, leading to error patterns when speaking. Voice disorders affect speech quality, pitch, or volume, often due to problems with the vocal cords. In contrast, resonance disorders arise from abnormal airflow through the oral or nasal cavities, which affects sound clarity [5].

Worldwide, approximately 1% of the population has some form of speech, language, or communication difficulty, a condition often referred to as speech and language impairment (SLI). This statistic translates to millions of people experiencing barriers to expressing themselves or understanding others, impacting their social interactions, educational experience, and overall quality of life. The loss or impairment of these fundamental communication skills is widely considered a significant barrier limiting these individuals’ ability to interact and integrate into society [6].

Therefore, advances in this field may lead to more effective therapeutic strategies; furthermore, understanding these neural underpinnings lays the groundwork for developing new assistive technologies, such as BCIs, which could restore or enhance the communication abilities of individuals with severe speech impairments. Exploring these neurological mechanisms continues to be a vital area of research, promising to improve the lives of millions of individuals affected by speech and language disorders [7].

Imagined speech has been possible since the emergence of language; however, the phenomenon is primarily associated with its investigation through signal processing and detection within EEG data.

This task is difficult due to several interrelated factors: it demands significant cognitive resources, leading to potential overload; distractions from other thoughts can disrupt it; incorporating sensory experiences adds unnecessary noise; the lack of auditory and vocal feedback makes it hard to gauge coherence; coordinating various brain regions complicates the process; and individual differences in the vividness of inner speech further influence the ease of imagining speech. These elements create a challenging and often noisy cognitive environment for processing inner dialogue.

Due to the high complexity of EEG signals in imagined speech, classification is difficult. For this reason, signal databases are composed of simple, imagined vocal structures. The authors of [8,9,10] used a database with five vowels (a, e, i, o, u) in English, reporting a classification accuracy of 35.7%, 87%, and 79.7%, respectively. The work of [11] proposes four words, two of which are in English (Yes, No) and two in Hindi (Haan, Na), and reports 73.4%. The dataset in [12] uses two English words (Yes, No), and they achieved an accuracy of 63.16%.

The simpler the words are, the easier it is for the classifier model to classify them and, therefore, the greater their precision. Otherwise, the more complex the words, the more difficult the task and, therefore, the lower the precision.

Within the state-of-the-art, we observe a variety of features being used. For instance, ref. [9] utilizes 4 features (mean, variance, standard deviation, skewness), ref. [10] employs 32 features, and ref. [13] uses 44 features.

It is necessary to characterize the signals using the “Feature Extraction” technique in order to have information about them. It is hard to identify the changes in the signal directly when thinking about one word or another. Hence, the feature extraction technique needs to be implemented. Furthermore, depending on the nature of the signals and the classifier algorithm, the characteristics are different, so there is no standard for extracting the type and quantity of features. Traditional methods are based on digital signal processing. This is the case in ref. [12] who used Artificial Neural Networks (ANN) or [14] where the authors used Deep Neural Networks (DNN) to analyze the features extracted from the signal. Also, in [15], the authors use the Ant Colony Optimization (ACO) algorithm for the feature extraction of EEG signals.

This feature selection can be optimized through Genetic Algorithms (GA) as proposed in [16], where they used GA as a signal processing method for feature extraction. Another work is [17], where he uses GA for the optimal classification of epileptic seizures in EEG using wavelet analysis. Another example of this comes from [18], where they applied EEG electrode selection for person identification through a GA.

Furthermore, in the development of robust descriptors for specific environments, the authors of [19] proposed a modular framework that employs a GA to optimize descriptor configurations by balancing size, efficiency, and invariance to transformations such as lighting, scaling, and rotation. Their approach incorporates binary-coded genomes to control image-processing parameters and descriptor assembly, with modules selectively activated to achieve dimensionality reduction. By iteratively refining the descriptors, their framework demonstrated superior classification performance in sparse image regions compared to standard descriptors, while maintaining competitive invariance to environmental changes. This work highlights the potential of GA-driven optimization for scene-specific feature extraction.

Once the selection of features has been optimized, a classifier model is used. Among the classical machine learning (ML) models used in similar works, [20] used Support Vector Machines (SVM), while in [21], the authors applied Naive Bayes, and in [22], the results were calculated with Random Forest.

Other authors applied Deep Learning (DL) techniques. While [12] used Recurrent Neural Networks (RNN). Ref. [8] used a Convolutional Neural Network (CNN) in conjunction with Transfer Learning (TL), and finally, in [23], they used a combination of CNN + RNN.

The authors of [13] proposed a processing technique with 44 statistical characteristics extracted from EEG signals. This inspired the present work, which uses 214 features capable of characterizing the EEG signals from an imagined speech in greater depth. Subsequently, a GA was implemented to choose and load the features with the most significant information to the classifier model. State-of-the-art methodologies yield accuracies ranging from 35.7% to 87%, while our proposed approach demonstrates significant improvement, as shown in our results.

Figure 1 presents a bibliometric analysis of research trends in imagined speech, highlighting the integration of CNNs and SVMs as dominant classification models. Furthermore, it reveals a strong connection between feature extraction, signal processing, and classification, indicating a trend toward innovative techniques to improve imagined speech classification tasks.

The main contributions of the present work are as follows:

The development of feature generation to extract information from the raw signals and their respective frequency bands.
The implementation of GA to find the selection of electrodes that allow a more efficient classification.
The implementation of GA to select the most relevant features that maximize classification accuracy.
The comparison of ten different classification models.
The dataset used was added to a public repository and will allow other techniques to be compared with this methodology.

Organization

The following sections describe the methodology used to classify imagined speech signals. The document is organized as follows: Section 2 contains the description of the dataset, the computational resources, preprocessing signals techniques, implementation of feature extraction, the optimization of GA, and the development and implementation of classifier models. Section 3 includes data classification analysis and discussion of evaluation metrics. Section 4 describes and interprets the article’s findings and future work on EEG systems for imagined speech classification tasks. Finally, Appendix A describes the equations employed for feature extraction in depth.

2. Methodology

The architecture of this work, Figure 2, filters the signals, uses feature extraction, and optimizes feature combinations using a GA to maximize the accuracy of a classifier model. These processes will be described in the following sections.

2.1. Dataset

For the project, the EEGIS (Electroencephalogram Imagined Speech) dataset [24] was used, where the inclusion criterion was that the participants were between the ages of 20 and 30. The participants had an average age of 24.8 years, with a standard deviation of 2.04. The signals were obtained using a 14-channel Emotiv Epoc+, as shown in Figure 3a, and these were recorded at 128 Hz. The channels are specified in Figure 3b.

A graphical interface was developed that randomly plays the words through speakers while the user imagines speaking the word for 8 s, during which the device records EEG signals. The subject was required to remain still with their eyes closed. Then, the recorded 8 s signals were divided into 1 s segments (128 frames each) with an overlap of 48 frames. The dataset contains 4044 instances for the nine classes (8 Spanish words and a blank state of mind). However, it was filtered only to use two simple words, /Sí/ and /No/, with 444 and 480 instances, respectively; therefore, there are 924 instances, so we have a tensor with the form (924, 14, 128). Figure 4 shows an example of this. A class balance was not performed since they are practically balanced. The data were divided into training and test sets with a proportion of 80–20%.

The rationale behind this ratio is for the following reasons: Focus on real-world performance: The 80–20 ratio is a widely accepted and well-known ratio used in many real-world machine learning classification tasks. This ratio allowed us to evaluate the model’s generalization capability on an established test set representative of typical use cases in production environments. Computational efficiency: Cross-validation can be computationally expensive and time-consuming, especially with a large dataset. Given the aforementioned preprocessing steps, in addition to the implementation of a GA (explained in depth later), it requires 100 GA individuals, 100 generations in the GA, 10 models to compare (particularly the kNN classifier model that searches across 7 k values), and 700,000 different models to be trained, resulting in a highly demanding process for the aforementioned setup. However, the 80–20% ratio provided us with a balance between sufficient model evaluation and reasonable training time.

2.2. Computational Resources

CPU: Intel Core i5 11gen.
GPU: Nvidia RTX 3060ti.
Ram: 16 GB.
vRam: 8 GB.
OS: Windows 11.
Libraries: numpy, pandas, scipy, sklearn, matplotlib, antropy, and signalityca.

2.3. Split Signals into Frequency Bands

A 4th-order band-pass Butterworth filter was used with cut-off frequencies (Table 1) to separate each of the 14 channels into five frequency bands associated with brain rhythms (Figure 5). The shape of the data tensor (924, 128, 14) became 4D (924, 128, 14, 6), where six corresponds to the raw signals plus the five frequency bands.

2.4. Feature Extraction

Feature vectors of size 214 were constructed, with each vector comprising 36 features extracted from six different layers of the same signal: the original signal and five frequency bands. Specifically, the first 34 features were derived from the raw unit signals, while 36 were from each frequency band. Altogether, these features add up to 214, forming the feature maps. The features were extracted from the 14 channels; therefore, the shape of the feature maps was (14, 214), as shown in Figure 6.

The features employed are the following:

Mean [25];
Standard deviation [26];
Coefficient variation [27];
Median [26];
Mode [26];
Max [28];
Min [28];
First quartile [29];
Third quartile [29];
Interquartile range [29];
Kurtosis [26];
Skewness [26];
Detrended fluctuation analysis [30];
Activity Hjorth param [31];
Mobility Hjorth param [31];
Complexity Hjorth param [31];
Permutation entropy [32];
Approximate entropy [33];
Spectral entropy [34];
Higuchi fractal dimension [35];
Total power spectral density [36];
Centroid power spectral density [37];
Determinism [38];
Trapping time [38];
Diagonal line entropy [38];
Average diagonal line length [38];
Recurrence rate [38];
Spectral edge frequency 25 [39];
Spectral edge frequency 50 [39];
Spectral edge frequency 75 [39];
Hurst exponent [40];
Singular valued decomposition entropy [41];
Petrosian fractal dimension [42];
Katz fractal dimension [43];
Relative band power [44];
Band amplitude [45].

These 36 features are described in depth in the Appendix A.

2.5. Optimization with Genetic Algorithm

We seek a combination of features to maximize the machine learning model’s accuracy. Because the number of possible combinations is enormous, the best choice is to use a GA, a metaheuristic that will optimize the selection of the optimal features. The GA will generate a population of genomes to reproduce and evolve them through generations. In this case, the genomes will be 228-length binary vectors, where the first 14 elements indicate which electrodes will be activated, and the remaining 214 elements indicate which features must be activated to improve. The selection of the features will be optimized using a classic machine learning classification model, where the model’s accuracy (2) is the objective function, converting this into a maximization problem. The parameters of the GA are as follows:

Number of individuals: 100;
Type of gen: dichotomic {0,1};
Number of genes: 228;
Type of selection: rank;
Type of crossover: homogenous;
Mutation: bit flip;
Fitness function: accuracy.

2.5.1. Explanation of the GA and Its Inputs

As mentioned above, a Genetic Algorithm (GA) is applied to optimize the selection of electrodes and features to maximize classification accuracy. A GA is a metaheuristic search algorithm inspired by Darwinian evolution. It efficiently finds an approximate optimal solution within a vast search space, which, in this case, consists of 228 (factorial) possible combinations of electrodes and features, which is too many for an exhaustive computational search.

2.5.2. Individuals in the GA: Not the Subjects

It is important to clarify that the term “individuals” in the context of the GA does not refer to the 10 male subjects in our dataset. In contrast, in GA, each individual represents a possible solution; in this case, a binary vector of length 228 encodes which electrodes and features are selected (1) or not (0).

2.5.3. GA Works as Follows

Initialization:
-
A population of 100 binary vectors is randomly generated (a common population size in GA).
Selection and Reproduction:
-
These 100 individuals form pairs and recombine (cross over) to create offspring.
-
Since each pair produces two offspring, the population temporarily doubles to 200 individuals.
Evaluation and Survival of the Fittest:
-
Each individual (binary vector) defines a specific subset of electrodes and features used to train and evaluate a classification model.
-
Individuals are then ranked based on the model’s accuracy.
-
The 100 best individuals survive to the next generation, while the 100 worst are discarded.
Iterative Optimization:
-
This process is repeated over several generations, refining the selection of electrodes and features until the algorithm converges on an optimal subset.
Final Solution Selection:
-
In the last generation, the highest-ranked individual (binary vector) is selected as the final solution.
-
This vector indicates, with 0 and 1, which electrodes and features provide the most valuable information to the model.

Using GA, we efficiently navigate the ample search space and identify the most relevant features and electrodes without requiring exhaustive calculations.

Upon reaching the end criterion, the GA returns the best genomes (binary arrays) and the fitness obtained (the accuracy obtained with those feature activations of the binary arrays).

2.6. Data Standardization

The data must be processed, normalized, or standardized before entering the machine learning model to obtain better performance. The proposed model uses standardization (Equation (1)).

z = \frac{x_{i} - μ}{σ}

(1)

where

x_{i}

is the data point to standardize,

μ

is the mean of all the data points in the column, and

σ

is the standard deviation of the data in the column. It is necessary to have all the data attributes in a standard scale without distorting differences in the ranges of values or losing information.

2.7. Flatten Data

As mentioned, the feature maps have the shape (14, 214). However, the GA selects the best features. Therefore, the matrices now have the shape of (M, N) where M is the number of electrodes, and N is the number of features the genome indicates. Since a classic ML model needs a data vector as input, the data point must be converted into the correct shape, flattening the feature matrices of (M, N) to (1, M × N) feature vectors, as shown in Figure 7.

2.8. Classifier Model

The problem to be solved lies in the classification of EEG signals with an AI algorithm that knows how to discern when an EEG signal is associated with the thought of a particular word. The proposed method uses a classifier model to identify (predict) the imagined word by recognizing patterns and characteristics in the EEG signal. Separating databases into training and testing sets is a common step when working with classifier models. With the training set, the model recognizes, learns, and adjusts to the patterns of the given signals. On the other hand, the testing set contains examples of new signals for the model since it has not seen them. This set validates the accuracy of the model and its classification capacity. As mentioned at the beginning, this work seeks to demonstrate that a classic machine learning algorithm can classify imagined speech. For this reason, 10 different classifier models were compared:

k-Nearest Neighbors (kNN);
Logistic Regression (LogReg);
Random Forest (RF);
Decision Tree (DT);
Gradient Boosting Machine (GBM);
AdaBoost;
Naïve Bayes (NB);
Linear Discriminant Analysis (LDA);
Multi-Layer Perceptron (MLP);
Support Vector Machine (SVM).

For each classifier model, the default parameters were used. However, for kNN, a heuristic is applied to each individual in every GA generation to search for the optimal value of k from {5,7,13,15,17,21,37} that maximizes ACC. The final model’s performance was evaluated using the metrics accuracy, precision, recall, and F1-score, which can be found in (2)–(5), respectively.

accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(2)

precision = \frac{TP}{TP + TN}

(3)

recall = \frac{TP}{TP + FN}

(4)

F 1 - score = \frac{2 \cdot precision \cdot recall}{precision + recall}

(5)

where the elements of (2)–(5) are as follows: TP (true positives), TN (true negatives), FP (false positives), and FN (false negatives).

3. Results

We can observe the GA’s performance for the ten classifier models in Figure 8. In particular, we can observe the performance of the kNN model, Figure 8a, where it started with an average accuracy below 70%; despite this, the population evolved, achieving an accuracy above 96% after 20 generations.

The GA’s output was the best genome, i.e., a binary vector of size 228, where the positions containing 1s indicate that the electrode or the feature associated with that position was activated. The best solution found by the GA was as follows: best genome = [1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1]. As the Feature Extraction section mentions, the first 34 elements correspond to features extracted from the original signal. The last 180 elements are five groups of 36 features. Each group corresponds to features in the delta, theta, alpha, beta, and gamma bands, respectively. Also, the algorithm returns the best k value for the k-NN algorithm. In this case, k = 7 was the optimal value.

In addition, when the genome was decodified, it showed the selection of electrodes found by the GA. In total, eight electrodes provided more information to the model. These were as follows: AF3, T7, O1, O2, P8, T8, F4 and F8, as shown in Figure 9.

Furthermore, the decodified features are the ones shown in Table 2.

The Genetic Algorithm’s outputs were validated in a separate experiment. The data were split into train and test sets with a meticulous 20–80 proportion, ensuring a thorough and reliable validation process.

The features used were the ones indicated by the GA, such as the k = 9 value for the k-NN classifier. The confusion matrices of the classifier models can be observed in Figure 10.

The metrics specified in Equations (2)–(5) were calculated from the confusion matrices of each of the models, and these can be seen in Table 3.

In addition, in Table 3, we can observe that the classification performance varied across different models with notable differences in recall, precision, F1-score, and accuracy. The k-Nearest Neighbors (kNN) classifier model showed the highest overall performance, achieving a recall of 0.97, an F1-score of 0.94, and an accuracy of 0.96, using eight electrodes and 103 features, which is expected since it was the only model that included a heuristic search within the GA to find the best k parameter. In contrast, models such as Random Forest (RF) and Multi-Layer Perceptron (MLP) exhibited lower recall values of 0.75 and 0.78, respectively, with corresponding accuracy scores of 0.80. Despite using only five electrodes, the Support Vector Machine (SVM) achieved a recall of 0.89, although its F1-score remained at 0.76. Similarly, Logistic Regression (LogReg) and Linear Discriminant Analysis (LDA) showed balanced performances with accuracy scores of 0.87 using five and nine electrodes, respectively. The Gradient Boosting Machine (GBM), with six electrodes and 99 features, showed moderate performance, with a recall of 0.81 and an F1-score of 0.82. These results clearly show the relationship between the number of electrodes used and classification effectiveness, where some models achieved high accuracy with fewer electrodes. In contrast, others required a higher number of features to optimize performance.

Table 4 presents the differences between similar works in the state-of-the-art, such as the types of words utilized, the processing techniques, and the classifier models. Also, in the last column, we can spot the performance of each methodology, indicating where our technique achieved the highest accuracy.

4. Discussion

The proposed method demonstrates excellent performance in recognizing and distinguishing the simple Spanish words /Sí/ (Yes) and /No/ (No). This work achieved higher accuracy through rigorous testing than existing methods in the current literature on the state of the art. The proposed approach effectively handles fundamental words and outperforms previous techniques in terms of precision. This achievement highlights the method’s robustness and efficiency, showcasing its potential for applications that require accurate recognition of these essential yet crucial responses in Spanish, making it a significant contribution to the field. Moreover, in future work, we seek to use the full EEGIS dataset containing complex words of the language and test more classifier models of both ML and DL that surpass this work.

Although the proposed approach effectively handles core words and outperforms previous techniques in terms of accuracy, it is important to emphasize this work’s limitations.

First, the limited vocabulary size is a key limitation. While our model shows promising results within the predefined word set, it does not yet scale to the full complexity of natural language. Expanding the vocabulary requires more data collection and model tuning, which is an important direction for future research.

Second, inference time represents a challenge for real-time implementation. The processing flow (feature extraction and classification) takes a few seconds per instance, even when running on a research computer. That introduces latency that can disrupt the fluidity of real-time interaction. Optimizing the algorithm’s computational efficiency, leveraging hardware acceleration (e.g., GPUs or dedicated edge computing devices), and exploring lightweight model architectures could help to mitigate this issue in future iterations.

Finally, the headset configuration presents practical limitations. The electrodes, which use a saline solution, are prone to evaporation over time, potentially degrading signal quality. Furthermore, prolonged use can cause skin irritation at the contact points. To address these issues, alternative electrode materials can be explored, headset ergonomics can be improved, or automated hydration mechanisms can be incorporated to maintain signal stability over extended periods.

Despite these limitations, our study lays the groundwork for future advancements. Implementing this system in a real-time solution would require the further optimization of computational efficiency, robustness in signal acquisition, and user comfort. These challenges will be the focus of our future work as we move toward practical, real-world applications.

Author Contributions

Conceptualization, E.L.-A. and A.T.; methodology, E.L.-A. and A.T.; software, E.L.-A.; validation, J.R.-R. and S.T.-A.; formal analysis, J.R.-R. and S.T.-A.; investigation, E.L.-A. and A.T.; resources, J.R.-R.; writing—original draft preparation, writing—original draft, review & editing, all authors. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by CONAHCYT grant number 1270170.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Applied Research Ethics Committee of the Faculty of Engineering of the Universidad Autónoma de Querétaro (protocol code CEAIFI-026-2024-TP, 24 April 2024).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The EEGIS—Electroencephalogram Imagined Speech dataset [24]—supporting the findings of this study is publicly available on Mendeley Data and can be found at the following link: https://doi.org/10.17632/73g4fw884c.1.

Acknowledgments

We would like to thank the Research and Graduate Division of the Faculty of Engineering of the Universidad Autónoma de Querétaro for allowing E.L.-A. to carry out this master’s research. We would also like to thank the Neurodiagnosis and Rehabilitation Unit “Moisés López González” for the support provided during the research. We would also like to thank the National Council of Humanities, Sciences and Technologies (CONAHCYT) for granting Eng. Edgar Lara-Arellano the master’s scholarship that made possible his full-time dedication to this research.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ACC	Accuracy;
ANN	Artificial Neural Network;
ACO	Ant Colony Optimization;
CNN	Convolutional Neural Network;
DL	Deep Learning;
DNN	Deep Neural Networks;
DT	Decision Tree;
EEG	Electroencephalogram;
FP	False positives;
FN	False negatives;
GBM	Gradient Boosting Machine;
kNN	k-Nearest Neighbors;
LDA	Linear Discriminant Analysis;
LogReg	Logistic Regression;
ML	Machine Learning;
MLP	Multi-Layer Perceptron;
NB	Naïve Bayes;
RF	Random Forest;
RNN	Recurrent Neural Networks;
SVM	Support Vector Machine;
TL	Transfer Learning;
TN	True negatives;
TP	True positives.

Appendix A

Proposed features:

Appendix A.1. Mean

The mean, Equation (A1), is a measure that in statistics provides the average value in a data points set [25].

Mean = \frac{1}{N} \sum_{i = 1}^{N} x_{i}

(A1)

where

$x_{i}$ denotes each frame sample;
N is the amount of frames.

Appendix A.2. Standard Deviation

The standard deviation, Equation (A2), is a measure that provides the variability in a dataset around its mean, also know by measure the dispersion of the dataset [26].

Standard Deviation = \sqrt{\frac{1}{N} {\sum_{i = 1}^{N} (x_{i} - Mean)}^{2}}

(A2)

where

$x_{i}$ denotes each frame sample;
N is the amount of frames;
Mean is the average of all samples.

Appendix A.3. Coefficient Variation

The coefficient of variation, Equation (A3), is the dispersion of the data points but normalized to the mean [27].

Coefficient of Variation (CV) = \frac{Standard Deviation}{Mean}

(A3)

where:

$Standard Deviation$ is the standard deviation of the signal,
Mean is the average of the signal.

Appendix A.4. Median

The median, Equation (A4), is the middle data point in an organized dataset [26].

Median = \{\begin{matrix} middle value & if the number of data points is odd, \\ \frac{value at position \frac{N}{2} + 1 + value at position \frac{N}{2}}{2} & if the number of data points is even . \end{matrix}

(A4)

Appendix A.5. Mode

The mode, Equation (A5), is the data point with the most occurrence in a dataset [26].

Mode = Value with highest frequency of occurrence

(A5)

Appendix A.6. Max

The max (maximum), Equation (A6), is the point which value is the greatest [28].

Max = max (x_{1}, x_{2}, \dots, x_{N})

(A6)

Appendix A.7. Min

The min (minimum), Equation (A7), is the point which value is the lowest [28].

Min = min (x_{1}, x_{2}, \dots, x_{N})

(A7)

Appendix A.8. First Quartile

The first quartile (Q1), Equation (A8), is the value below the 25% in a sorted dataset [29].

Q 1 = Median of the lower half of the dataset

(A8)

Appendix A.9. Third Quartile

The third quartile (Q3), Equation (A9), is the value above the 75% in a sorted dataset [29].

Q 3 = Median of the upper half of the dataset

(A9)

Appendix A.10. Interquartile Range

The interquartile range (IQR), Equation (A10), is the difference between Q3 and Q1 [29].

IQR = Q 3 - Q 1

(A10)

Appendix A.11. Kurtosis

Kurtosis, Equation (A11), is an statistical measure that describes how the data are distributed between the center and the tails [26].

Kurtosis = \frac{1}{N} \sum_{i = 1}^{N} {(\frac{x_{i} - Mean}{Standard Deviation})}^{4} - 3

(A11)

Appendix A.12. Skewness

Skewness, Equation (A12), indicates if a dataset is biased towards a direction by measuring the asymetry of the distribution of a dataset [26].

Skewness = \frac{1}{N} \sum_{i = 1}^{N} {(\frac{x_{i} - Mean}{Standard Deviation})}^{3}

(A12)

Appendix A.13. Detrended Fluctuation Analysis

Detrended Fluctuation Analysis (DFA), Equation (A13), enables the identification of the self-similarity of signals.

F (τ) \sim τ^{H}

(A13)

where

$F (τ)$ is the fluctuation function that describes the signal’s variance over a window of size $τ$ ;
H is the Hurst parameter, which characterizes the scaling behavior;
$τ$ is the window size over which the fluctuations are calculated [30].

Appendix A.14. Activity Hjorth Param

Activity, Equation (A14), is one of the Hjorth parameters, and it measures signal power, reflecting brain activity.

Activity = \frac{1}{N} \sum_{i = 1}^{N} x_{i}^{2}

(A14)

where

$x_{i}$ is the EEG signal amplitude value at time point i;
N is the amount of frames in the signal [31].

Appendix A.15. Mobility Hjorth Param

Mobility, Equation (A15), is one of the Hjorth parameters, and it measures mean frequency, providing relations between brain activity and neural processes.

Mobility = \frac{\sqrt{Var (d x)}}{\sqrt{Var (x)}}

(A15)

where

x denotes the EEG signal;
$d x$ stands for the first derivative of the signal;
$Var (x)$ corresponds to the variance of the signal;
$Var (d x)$ corresponds to the variance of the first derivative of the signal [31].

Appendix A.16. Complexity Hjorth Param

Complexity, Equation (A16), is one of the Hjorth parameters, and it measures the change in frequency.

Complexity = \frac{Mobility (d x)}{Mobility (x)}

(A16)

where:

x denotes the EEG signal;
$d x$ is the first derivative of the signal;
$Mobility (x)$ corresponds to the mobility of the signal;
$Mobility (d x)$ corresponds to the mobility of the first derivative [31].

Appendix A.17. Permutation Entropy

Permutation Entropy (PE), Equation (A17), in a given time series, captures the relations between values and enables the extraction of the probability distribution of the ordinal patterns.

PE = - \sum_{p} P (π_{p}) log (P (π_{p}))

(A17)

where

P (π_{p})

denotes the probability of each unique ordinal pattern

π_{p}

occurring in the embedded time series, which is constructed by mapping segments of the time series into patterns based on the ordering of their values [32].

Appendix A.18. Approximate Entropy

Approximate Entropy (ApEn), Equation (A19), measures the amount of steadiness and the unpredictability of variations over time series dataset.

Equation: The Approximate Entropy

ApEn

for a signal of length N embedding dimension m and tolerance r is calculated as follows:

ϕ^{m} (r) = \frac{1}{N - m + 1} \sum_{i = 1}^{N - m + 1} log (\frac{number of patterns within tolerance r of X_{i}^{m}}{N - m + 1})

(A18)

Then, the approximate entropy

ApEn (m, r, N)

is defined by the difference:

A p E n (m, r, N) = ϕ^{m} (r) - ϕ^{m + 1} (r)

(A19)

where

X_{i}^{m}

represents an m-length segment of the time series starting at position i. r is the tolerance, typically a fraction of the standard deviation of the time series, which defines the range within which patterns are considered similar.

ϕ^{m} (r)

quantifies the average similarity between m-length patterns across the time series [33].

Appendix A.19. Spectral Entropy

Spectral Entropy (SE), Equation (A21), is the measure of spectral power distribution and forecastability based on Shannon entropy.

Equation: The Spectral Entropy

SE

for a normalized power spectral density

P (f)

over N frequency components is calculated as follows:

P (f) = \frac{{| X (f) |}^{2}}{\sum_{i = 1}^{N} {| X (f_{i}) |}^{2}}

(A20)

where

{| X (f) |}^{2}

is the power at frequency f and the denominator normalizes the power spectral density to make

P (f)

a probability distribution across frequencies.

The Spectral Entropy

SE

is then defined by [34]

SE = - \sum_{i = 1}^{N} P (f_{i}) log (P (f_{i}))

(A21)

Appendix A.20. Higuchi Fractal Dimmension

The Higuchi Fractal Dimension (HFD), Equation (A24), is a method for the box-counting dimension of the graph of a time series.

The length of the curve at scale k for a particular m is given by

L_{m} (k) = \frac{1}{(\frac{N - m}{k}) k} \sum_{i = 1}^{(\frac{N - m}{k})} | x_{m + i \cdot k} - x_{m + (i - 1) \cdot k} |

(A22)

where

L_{m} (k)

represents the average length of the time series at scale k and starting point m and k is the scale parameter, determining the spacing between points in the resampled time series.

The overall length at scale k, denoted

L (k)

, is the average of

L_{m} (k)

over all m values:

L (k) = \frac{1}{k} \sum_{m = 1}^{k} L_{m} (k)

(A23)

Finally, the Higuchi Fractal Dimension

HFD

is estimated by fitting a line to the points

(log (k), log (L (k)))

and calculating the slope, which gives the following fractal dimension [35]:

HFD = slope of the line fitting (log (k), log (L (k)))

(A24)

Appendix A.21. Total Power Spectral Density

Total Power Spectral Density (PSD), Equation (A25), measures the power in a time series or a signal as a function of frequency.

{PSD}_{total} = \sum_{i = 1}^{N} P (f_{i})

(A25)

where

P (f_{i})

represents the power at each frequency

f_{i}

and N is the total number of frequency bins [36].

Appendix A.22. Centroid Power Spectral Density

Centroid Power Spectral Density (Centroid PSD), Equation (A26), indicates where the center of mass of the power spectrum is located.

f_{centroid} = \frac{\sum_{i = 1}^{N} f_{i} \cdot P (f_{i})}{\sum_{i = 1}^{N} P (f_{i})}

(A26)

where

f_{i}

is the frequency at the i-th bin,

P (f_{i})

represents the power at frequency

f_{i}

, and N is the total number of frequency bins [37].

Appendix A.23. Determinism

Determinism (DET), Equation (A27), calculates the proportion of recurring patterns in a time series dataset and shows deterministically ordered patterns.

DET = \frac{\sum_{l = l_{min}}^{N} l \cdot P (l)}{\sum_{i, j} R (i, j)}

(A27)

where l represents the length of each diagonal line in the recurrence plot,

P (l)

is the frequency of diagonal lines of length l,

l_{min}

is a minimum line length threshold (to avoid counting very short, trivial recurrences), and

\sum_{i, j} R (i, j)

is the total number of recurrence points [38].

Appendix A.24. Trapping Time

Trapping Time (TT), Equation (A28), measures the time that the signal spends in a state of recurrence, which is useful for interpreting neural activities.

TT = \frac{1}{M} \sum_{i = 1}^{M} T_{i}

(A28)

where M is the total number of recurrences and

T_{i}

is the length of the i-th trapping time, i.e., the duration the signal spends in the recurrent state before leaving [38].

Appendix A.25. Diagonal Line Entropy

Diagonal Line Entropy (DLE), Equation (A29), measures the complexity of brain activity evaluating recurrent patterns. It computes the entropy of the distribution of diagonal line lengths.

DLE = - \sum_{l = 1}^{L} P (l) log (P (l))

(A29)

where

P (l)

is the probability of finding a diagonal line of length l and L is the maximum length of diagonal lines considered in the recurrence plot [38].

Appendix A.26. Average Diagonal Line Length

Average Diagonal Line Length (ADLL), Equation (A30), measures the regularity or persistence of specific patterns of brain activity.

ADLL = \frac{1}{M} \sum_{i = 1}^{M} l_{i}

(A30)

where M is the total number of diagonal lines in the recurrence plot and

l_{i}

is the length of the i-th diagonal line [38].

Appendix A.27. Recurrence Rate

Recurrence Rate (RR), Equation (A31), measures the stability of signals, calculating how often the system revisits similar states.

RR = \frac{\sum_{i, j} R (i, j)}{N^{2}}

(A31)

where

R (i, j)

is the recurrence matrix, with

R (i, j) = 1

if points i and j are similar within a specified threshold and N is the total number of points in the time series [38].

Appendix A.28. Spectral Edge Frequency 25

Spectral Edge Frequency 25 (SEF 25), Equation (A32), represents the frequency below 25% of the total power of a signal.

\int_{0}^{f_{SEF 25}} P (f) d f = 0.25 \sum_{f = 0}^{f_{\max}} P (f)

(A32)

where

P (f)

is the power spectral density at frequency f,

f_{\max}

is the highest frequency in the power spectrum, and

f_{SEF 25}

is the frequency below which 25% of the total power is contained [39].

Appendix A.29. Spectral Edge Frequency 50

Spectral Edge Frequency 50 (SEF 50), Equation (A33), represents the frequency below 50% of the total power of a signal.

\int_{0}^{f_{SEF 50}} P (f) d f = 0.50 \sum_{f = 0}^{f_{\max}} P (f)

(A33)

where

P (f)

is the power spectral density at frequency f,

f_{\max}

is the highest frequency in the power spectrum, and

f_{SEF 50}

is the frequency below which 50% of the total power is located [39].

Appendix A.30. Spectral Edge Frequency 75

Spectral Edge Frequency 75 (SEF 75), Equation (A34), represents the frequency below 75% of the total power of a signal.

\int_{0}^{f_{SEF 75}} P (f) d f = 0.75 \sum_{f = 0}^{f_{\max}} P (f)

(A34)

where

P (f)

is the power spectral density at frequency f,

f_{\max}

is the maximum frequency in the power spectrum, and

f_{SEF 75}

is the frequency below which 75% of the total power is concentrated [39].

Appendix A.31. Hurst Exponent

The Hurst exponent (H), Equation (A38), calculates autocorrelations in a time series dataset by measuring the degree of persistence, randomness, or anti-persistence in a signal.

Calculate the cumulative sum of deviations from the mean, $Y (t)$ :

$Y (t) = \sum_{i = 1}^{t} (x (i) - \bar{x})$

(A35)

where $x (i)$ is the i-th data point and $\bar{x}$ is the mean of the time series.
Calculate the rescaled range $R / S$ for different window sizes n:

$R (n) = max (Y (t)) - min (Y (t))$

(A36)

$S (n) = \sqrt{\frac{1}{n} \sum_{t = 1}^{n} {(x (t) - \bar{x})}^{2}}$

(A37)
The Hurst exponent is then estimated from the relationship between the rescaled range $R / S$ and the window size n:

$R / S \sim n^{H}$

(A38)

where $R (n)$ is the range of the cumulative sum, $S (n)$ is the standard deviation of the time series for window size n, and H is the Hurst exponent, which is determined by the slope of the log–log plot of $R / S$ versus n [40].

Appendix A.32. Singular Valued Decomposition Entropy

Singular Value Decomposition Entropy (SVDE), Equation (A40), characterizes the information content or regularity of a signal by measuring the uncertainty or disorder in the distribution in a time series dataset.

Given a time series $x (t)$ , first construct a time delay embedding matrix X with embedding dimension m and time delay $τ$ :

$X = [\begin{matrix} x (1) & x (2) & \dots & x (N - m + 1) \\ x (2) & x (3) & \dots & x (N - m + 2) \\ ⋮ & ⋮ & ⋱ & ⋮ \\ x (m) & x (m + 1) & \dots & x (N) \end{matrix}]$

(A39)
Apply Singular Value Decomposition (SVD) to the matrix X:

$X = U Σ V^{T}$

(A40)

where U is an orthogonal matrix containing the left singular vectors, $Σ$ is a diagonal matrix containing the singular values, and $V^{T}$ is an orthogonal matrix containing the right singular vectors.
Compute the entropy of the singular values $σ_{1}, σ_{2}, \dots, σ_{k}$ (diagonal elements of $Σ$ ):

$SVDE = - \sum_{i = 1}^{k} p (σ_{i}) log (p (σ_{i}))$

(A41)

where $p (σ_{i})$ is the normalized probability distribution of the singular values $σ_{i}$ and k is the number of singular values considered in the decomposition (often corresponding to the rank of the matrix X) [41].

Appendix A.33. Petrosian Fractal Dimension

The Petrosian Fractal Dimension (PFD), Equation (A43), calculates properties of brain acitivity to specifically identify and distinguish states of physiologic function.

For a given time series $x_{1}, x_{2}, \dots, x_{N}$ , compute the number of sign changes $N_{s c}$ , which is the number of times the signal crosses its local mean:

$N_{s c} = \sum_{i = 2}^{N} 1_{{sgn (x_{i} - x_{i - 1}) \neq sgn (x_{i - 1} - x_{i - 2})}}$

(A42)

where $sgn (x)$ is the sign function, and $1$ is the indicator function.
Calculate the Petrosian Fractal Dimension using the following formula:

$PFD = {log}_{2} (\frac{N_{s c}}{N - N_{s c}}) + 1$

(A43)

where $N_{s c}$ is the number of sign changes, and N is the total number of data points in the time series [42].

Appendix A.34. Katz Fractal Dimension

The Katz Fractal Dimension (KFD), Equation (A45), analyzes and compares complex, waveforms measuring the irregularity or complexity of neural activity and enabling the distinguishment of different cognitive states.

Define the time series as $x_{1}, x_{2}, \dots, x_{N}$ .
Compute the total length of the curve using the following formula:

$L = \sum_{i = 2}^{N} |x_{i} - x_{i - 1}|$

(A44)

where L is the total length of the time series.
Calculate the Katz Fractal Dimension using the following formula:

$KFD = {log}_{10} (\frac{L}{d_{0}}) / {log}_{10} (N)$

(A45)

where L is the total length of the time series, $d_{0}$ is the distance between the first and last points of the time series: $d_{0} = | x_{1} - x_{N} |$ , and N is the total number of data points in the series [43].

Appendix A.35. Relative Band Power

Relative Band Power, Equation (A48), measures the distribution of power across different brainwave frequency bands, which are associated with various cognitive and mental states.

Calculate the total power of the signal across all frequencies. This can be carried out by summing the power spectral density (PSD) across the entire frequency range $f_{\min}$ to $f_{\max}$ :

$P_{total} = \int_{f_{\min}}^{f_{\max}} P (f) d f$

(A46)

where $P (f)$ is the power spectral density at frequency f, and $f_{\min}$ and $f_{\max}$ are the limits of the frequency range (e.g., 0 to 40 Hz for an entire EEG).
For each frequency band b, calculate the band power:

$P_{b} = \int_{f_{\min_{b}}}^{f_{\max_{b}}} P (f) d f$

(A47)

where $P_{b}$ is the power within the band, and $f_{\min_{b}}$ and $f_{\max_{b}}$ are the frequency limits of the band (for example, $[1, 4]$ Hz for Delta, $[4, 8]$ Hz for Theta, etc.).
Finally, compute the relative band power for each frequency band b by dividing the band power by the total power:

$R_{b} = \frac{P_{b}}{P_{total}}$

(A48)

where $R_{b}$ is the relative band power for the band b, $P_{b}$ is the power within the frequency band b, and $P_{total}$ is the total power across all frequencies [44].

Appendix A.36. Band Amplitude

Band amplitude, Equation (A49), measures the magnitude of the time series within a particular frequency range.

Calculate the amplitude: Once the signal has been filtered, calculate the instantaneous amplitude of the filtered signal. This can be completed by computing the envelope of the filtered signal using methods such as the Hilbert transform:

$A_{b} (t) = \sqrt{x_{b} {(t)}^{2} + \dot{x_{b}} {(t)}^{2}}$

(A49)

where $x_{b} (t)$ is the filtered signal for band b, $\dot{x_{b}} (t)$ is the time derivative of the filtered signal $x_{b} (t)$ , and $A_{b} (t)$ is the instantaneous amplitude of the signal in the frequency band b.
Compute the mean amplitude: The mean amplitude over a period or window of time can be calculated to summarize the overall strength of oscillations within the band:

$A_{b} = \frac{1}{T} \int_{0}^{T} A_{b} (t) d t$

(A50)

where $A_{b}$ is the average amplitude in the frequency band b, and T is the total duration of the signal or the time window over which the average is computed [45].

References

Brigham, K.; Kumar, B.V.K.V. Imagined Speech Classification with EEG Signals for Silent Communication: A Preliminary Investigation into Synthetic Telepathy. In Proceedings of the 2010 4th International Conference on Bioinformatics and Biomedical Engineering, Chengdu, China, 18–20 June 2010; pp. 1–4. [Google Scholar] [CrossRef]
Lee, S.H.; Park, J.H.; Kim, D.S. Imagined Speech and Visual Imagery as Intuitive Paradigms for Brain-Computer Interfaces. arXiv 2024, arXiv:2411.09400. [Google Scholar]
Jäncke, L.; Langer, N.; Hänggi, J. Diminished Whole-brain but Enhanced Peri-sylvian Connectivity in Absolute Pitch Musicians. J. Cogn. Neurosci. 2012, 24, 1447–1461. [Google Scholar] [CrossRef]
Geva, S.; Jones, P.S.; Crinion, J.T.; Price, C.J.; Baron, J.C.; Warburton, E.A. The neural correlates of inner speech defined by voxel-based lesion–symptom mapping. Brain 2011, 134, 3071–3082. [Google Scholar] [CrossRef]
Kummer, A.W. Perceptual assessment of resonance and velopharyngeal function. In Proceedings of the Seminars in Speech and Language; Thieme Medical Publishers: New York, NY, USA, 2011; Volume 32, pp. 159–167. [Google Scholar]
Broomfield, J. The nature of referred subtypes of primary speech disability. Child Lang. Teach. Ther. 2004, 20, 135–151. [Google Scholar] [CrossRef]
DeWitt, I. Phoneme and word recognition in the auditory ventral stream. Proc. Natl. Acad. Sci. 2012, 109, E505–E514. [Google Scholar] [CrossRef]
Cooney, C.; Folli, R.; Coyle, D. Optimizing Layers Improves CNN Generalization and Transfer Learning for Imagined Speech Decoding from EEG. In Proceedings of the International Conference on Systems, Man and Cybernetics (SMC), Bari, Italy, 6–9 October 2019. [Google Scholar]
Min, B.; Kim, J.; Park, H.J.; Lee, B. Vowel Imagery Decoding toward Silent Speech BCI Using Extreme Learning Machine with Electroencephalogram. BioMed Res. Int. 2016, 2016, 2618265. [Google Scholar] [CrossRef] [PubMed]
Morooka, T.; Ishizuka, K.; Kobayashi, N. Electroencephalographic Analysis of Auditory Imagination to Realize Silent Speech BCI. In Proceedings of the 2018 IEEE 7th Global Conference on Consumer Electronics, GCCE, Nara, Japan, 9–12 October 2018; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2018; pp. 73–74. [Google Scholar] [CrossRef]
Balaji, A.; Haldar, A.; Patil, K.; Ruthvik, S.; Baths, V. EEG-based Classification of Bilingual Unspoken Speech using ANN. In Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Jeju Island, Republic of Korea, 11–15 July 2017. [Google Scholar]
Sereshkeh, A.R.; Trott, R.; Bricout, A.; Chau, T. EEG Classification of Covert Speech Using Regularized Neural Networks. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 2292–2300. [Google Scholar] [CrossRef]
Nava, G.H. Predicción de eventos epilépticos mediante técnicas de aprendizaje profundo usando señales EEG 2023. Available online: https://ri-ng.uaq.mx/handle/123456789/9749 (accessed on 21 June 2024).
Panachakel, J.T.; Ramakrishnan, A.G.; Ramakrishnan, A.G. Decoding Imagined Speech using Wavelet Features and Deep Neural Networks. In Proceedings of the 2019 IEEE 16th India Council International Conference (INDICON), Rajkot, India, 13–15 December 2019. [Google Scholar] [CrossRef]
Fernandez-Fraga, S.; Aceves-Fernandez, M.; Pedraza-Ortega, J.; Tovar-Arriaga, S. Feature Extraction of EEG Signal upon BCI Systems Based on Steady-State Visual Evoked Potentials Using the Ant Colony Optimization Algorithm. Discret. Dyn. Nat. Soc. 2018, 2018, 2143873. [Google Scholar] [CrossRef]
Li, Y.; Wu, L.; Wang, T.; Gao, N.; Wang, Q. EEG Signal Processing Based on Genetic Algorithm for Extracting Mixed Features. Int. J. Pattern Recognit. Artif. Intell. 2019, 33, 1958008. [Google Scholar] [CrossRef]
Ocak, H. Optimal classification of epileptic seizures in EEG using wavelet analysis and genetic algorithm. Signal Process. 2008, 88, 1858–1867. [Google Scholar] [CrossRef]
Albasri, A.; Abdali-Mohammadi, F.; Fathi, A. EEG electrode selection for person identification thru a genetic-algorithm method. J. Med Syst. 2019, 43, 297. [Google Scholar] [CrossRef]
Takacs, A.; Toledano-Ayala, M.; Dominguez-Gonzalez, A.; Pastrana-Palma, A.; Velazquez, D.T.; Ramos, J.M.; Rivas-Araiza, E.A. Descriptor Generation and Optimization for a Specific Outdoor Environment. IEEE Access 2020, 8, 52550–52565. [Google Scholar] [CrossRef]
Koizumi, K.; Ueda, K.; Nakao, M. Development of a Cognitive Brain-Machine Interface Based on a Visual Imagery Method. In Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 18–21 July 2018; pp. 1062–1065. [Google Scholar]
García-Salinas, J.S.; Villaseñor-Pineda, L.; Reyes-García, C.A.; Torres-García, A.A. Transfer learning in imagined speech EEG-based BCIs. Biomed. Signal Process. Control 2019, 50, 151–157. [Google Scholar] [CrossRef]
Lee, S.H.; Lee, M.; Jeong, J.H.; Lee, S.W. Towards an EEG-based intuitive BCI communication system using imagined speech and visual imagery. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Bari, Italy, 6–9 October 2019; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2019; pp. 4409–4414. [Google Scholar] [CrossRef]
Saha, P.; Abdul-Mageed, M.; Fels, S. Deep Learning the EEG Manifold for Phonological Categorization from Active Thoughts. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019. [Google Scholar]
Lara, E.; Rodríguez, J.; Takacs, A. EEGIS—Electroencephalogram Imagined Speech Dataset. Mendeley Data 2024. [Google Scholar]
Mukherjee, S.P.; Banerjee, P.K.; Misra, B.P. Measures of central tendency: The mean. J. Pharmacol. Pharmacother. 2011, 2, 140–142. [Google Scholar]
Livingston, E.H. The mean and standard deviation: What does it all mean? J. Surg. Res. 2004, 119, 117–123. [Google Scholar] [CrossRef]
Abdi, H. Coefficient of variation. Encycl. Res. Des. 2010, 1, 169–171. [Google Scholar]
Edelbaum, T.N. Theory of maxima and minima. In Mathematics in Science and Engineering; Elsevier: Amsterdam, The Netherlands, 1962; Volume 5, pp. 1–32. [Google Scholar]
Altman, D.G.; Bland, J.M. Statistics notes: Quartiles, quintiles, centiles, and other quantiles. BMJ 1994, 309, 996. [Google Scholar] [CrossRef]
Hu, K.; Ivanov, P.C.; Chen, Z.; Carpena, P.; Stanley, H.E. Effect of trends on detrended fluctuation analysis. Phys. Rev. E 2001, 64, 011114. [Google Scholar] [CrossRef]
Hjorth, B. Eeg analysis based on time domain properties. Electroencephalogr. Clin. Neurophysiol. 1970, 29, 306–310. [Google Scholar] [CrossRef]
Bandt, C.; Pompe, B. Permutation Entropy: A Natural Complexity Measure for Time Series. Phys. Rev. Lett. 2002, 88, 174102. [Google Scholar] [CrossRef] [PubMed]
Pincus, S.M. Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. USA 1991, 88, 2297–2301. [Google Scholar] [CrossRef] [PubMed]
Gibson, J. What is the interpretation of spectral entropy? In Proceedings of the 1994 IEEE International Symposium on Information Theory, Trondheim, Norway, 27 June–1 July 1994; p. 440. [Google Scholar] [CrossRef]
Higuchi, T. Approach to an irregular time series on the basis of the fractal theory. Physica 1988, 31, 277–283. [Google Scholar] [CrossRef]
Youngworth, R.N.; Gallagher, B.B.; Stamper, B.L. An overview of power spectral density (PSD) calculations. Opt. Manuf. Test. VI 2005, 5869, 206–216. [Google Scholar] [CrossRef]
Massar, M.L.; Fickus, M.; Bryan, E.; Petkie, D.T.; Terzuoli, A.J. Fast computation of spectral centroids. Adv. Comput. Math. 2011, 35, 83–97. [Google Scholar] [CrossRef]
Webber, C.; Zbilut, J. Recurrence Quantification Analysis of Nonlinear Dynamical Systems. In Nonlinear Dynamics and Time Series; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]
Schwender, D.; Daunderer, M.; Mulzer, S.; Klasing, S.; Finsterer, U.; Peter, K. Spectral edge frequency of the electroencephalogram to monitor depth of anaesthesia. Br. J. Anaesth. 1996, 77, 179–184. [Google Scholar] [CrossRef] [PubMed]
Mandelbrot, B.B.; Wallis, J.R. Robustness of the rescaled range R/S in the measurement of noncyclic long run statistical dependence. Water Resour. Res. 1969, 5, 967–988. [Google Scholar] [CrossRef]
Liu, H.; Li, Z.; Zhang, J. Singular Value Decomposition Entropy and Its Application to Time-Series Analysis. Remote Sens. 2022, 14, 5983. [Google Scholar] [CrossRef]
Petrosian, A. Kolmogorov complexity of finite sequences and recognition of different preictal EEG patterns. In Proceedings of the IEEE Symposium on Computer-Based Medical Systems, Lubbock, TX, USA, 9–10 June 1995; pp. 212–217. [Google Scholar] [CrossRef]
Katz, M.J. Fractals and the analysis of waveforms. Cornput. Bid. Med 1988, 18, 145–156. [Google Scholar] [CrossRef]
Torrente, F.A.; Cortés, J.M.; Nunez, P.; Gaitán, E. Relative Band Power Estimation of Brain Waves Using EEG Signals: An Overview and a New Method. Int. J. Environ. Res. Public Health 2023, 20, 1447. [Google Scholar] [CrossRef]
Baccigalupi, A.; Liccardo, A. The Huang Hilbert Transform for evaluating the instantaneous frequency evolution of transient signals in non-linear systems. Measurement 2016, 86, 1–13. [Google Scholar] [CrossRef]

Figure 1. Bibliometric network using keywords such as imagined speech, EEG, and classification in Scopus.

Figure 2. System architecture.

Figure 3. (a) Emotiv EPOC+. (b) Electrode positions.

Figure 4. Chunk examples taken from the EEGIS dataset.

Figure 5. Example of filtered chunks.

Figure 6. Feature map tensor diagram.

Figure 7. Flatten feature maps into feature vectors diagram.

Figure 8. GA performance for the ten classifier models. (a) kNN. (b) Log Reg. (c) RF. (d) DT. (e) GBM. (f) AdaBoost. (g) NB. (h) LDA. (i) MLP. (j) SVM.

Figure 9. Electrode combination found by the GA. Active electrodes are colored green.

Figure 10. Confusion matrices of the classifier models. (a) kNN. (b) Log Reg. (c) RF. (d) DT. (e) GBM. (f) AdaBoost. (g) NB. (h) LDA. (i) MLP. (j) SVM.

Table 1. Cut-off frequencies for each band.

Band	Lower Cut-Off Frequency	Upper Cut-Off Frequency
Delta	0.5 Hz	4 Hz
Theta	4 Hz	8 Hz
Alpha	8 Hz	13 Hz
Beta	13 Hz	30 Hz
Gamma	30 Hz	40 Hz

Table 2. Feature activation found by the GA for the kNN classifier.

Feature	Raw Signal	Delta	Theta	Alpha	Beta	Gamma
Mean		X			X	X
Standard deviation	X	X		X
Coefficient variation	X		X
Median	X	X			X
Mode	X	X		X
Max		X	X	X		X
Min			X			X
First quartile		X			X
Third quartile	X				X
Interquartile range		X	X	X	X
Kurtosis
Skewness	X	X	X	X	X
Detrended fluctuation analysis		X	X	X		X
Activity Hjorth param	X	X	X		X
Mobility Hjorth param		X	X		X
Complexity Hjorth param	X		X	X	X
Permutation entropy	X	X	X	X	X	X
Approximate entropy	X	X		X
Spectral entropy		X	X		X
Higuchi fractal dimension		X		X
Total power spectral density	X		X			X
Centroid power spectral density	X	X			X	X
Determinism						X
Trapping time		X				X
Diagonal line entropy		X		X	X	X
Average diagonal line length		X			X
Recurrence rate					X	X
Spectral edge frequency 25						X
Spectral edge frequency 50	X	X			X	X
Spectral edge frequency 75	X		X	X	X
Hurst exponent					X
Singular valued decomposition entropy		X				X
Petrosian fractal dimension		X	X
Katz fractal dimension	X		X	X
Relative band power					X
Band amplitude		X	X	X	X	X
Total features activated	15	23	14	16	20	15

Table 3. Model’s performance metrics.

Classfier	Electrodes (Max: 14)	Features (Max: 214)	Recall	F1-Score	Precision	Accuracy
kNN	8	103	0.97	0.94	0.95	0.96
LogReg	5	107	0.87	0.85	0.85	0.87
RF	7	106	0.75	0.76	0.84	0.80
Decision Tree	8	101	0.82	0.81	0.81	0.82
GBM	6	99	0.81	0.82	0.87	0.84
AdaBoost	5	107	0.80	0.77	0.78	0.80
Naïve Bayes	7	101	0.87	0.81	0.82	0.84
LDA	9	106	0.88	0.84	0.85	0.87
MLP	9	112	0.78	0.78	0.79	0.80
SVM	5	110	0.89	0.76	0.78	0.82

Table 4. Comparing this work with related works in the state-of-the-start.

Article	Dataset (Words)	Processing	Classifier	Accuracy
Our work	/Sí/, /No/	Feature Extraction, Genetic Algorithm	kNN	96%
[8]	/a/,/e/,/i/,/o/,/u/	ICA	CNN+TL	35.7%
[9]	/a/,/e/,/i/,/o/,/u/	Statistical characteristics	ELM, LDA, SVM	87%
[10]	/a/,/e/,/i/,/o/,/u/	Statistical characteristics	SVM, DT, LDA, QDA, PCA	79.7%
[11]	/Yes/,/No/,/Haan/,/Na/	Digital filters	ANN, SVM, RF, ADA	73.4%
[12]	/Yes/,/No/	ICA, digital filters and DWT	LDA, SVM, kNN, NB, MLP	63.16%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lara-Arellano, E.; Takacs, A.; Tovar-Arriaga, S.; Rodríguez-Reséndiz, J. Feature Generation with Genetic Algorithms for Imagined Speech Electroencephalogram Signal Classification. Eng 2025, 6, 75. https://doi.org/10.3390/eng6040075

AMA Style

Lara-Arellano E, Takacs A, Tovar-Arriaga S, Rodríguez-Reséndiz J. Feature Generation with Genetic Algorithms for Imagined Speech Electroencephalogram Signal Classification. Eng. 2025; 6(4):75. https://doi.org/10.3390/eng6040075

Chicago/Turabian Style

Lara-Arellano, Edgar, Andras Takacs, Saul Tovar-Arriaga, and Juvenal Rodríguez-Reséndiz. 2025. "Feature Generation with Genetic Algorithms for Imagined Speech Electroencephalogram Signal Classification" Eng 6, no. 4: 75. https://doi.org/10.3390/eng6040075

APA Style

Lara-Arellano, E., Takacs, A., Tovar-Arriaga, S., & Rodríguez-Reséndiz, J. (2025). Feature Generation with Genetic Algorithms for Imagined Speech Electroencephalogram Signal Classification. Eng, 6(4), 75. https://doi.org/10.3390/eng6040075

Article Menu

Feature Generation with Genetic Algorithms for Imagined Speech Electroencephalogram Signal Classification

Abstract

1. Introduction

Organization

2. Methodology

2.1. Dataset

2.2. Computational Resources

2.3. Split Signals into Frequency Bands

2.4. Feature Extraction

2.5. Optimization with Genetic Algorithm

2.5.1. Explanation of the GA and Its Inputs

2.5.2. Individuals in the GA: Not the Subjects

2.5.3. GA Works as Follows

2.6. Data Standardization

2.7. Flatten Data

2.8. Classifier Model

3. Results

4. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

Appendix A.1. Mean

Appendix A.2. Standard Deviation

Appendix A.3. Coefficient Variation

Appendix A.4. Median

Appendix A.5. Mode

Appendix A.6. Max

Appendix A.7. Min

Appendix A.8. First Quartile

Appendix A.9. Third Quartile

Appendix A.10. Interquartile Range

Appendix A.11. Kurtosis

Appendix A.12. Skewness

Appendix A.13. Detrended Fluctuation Analysis

Appendix A.14. Activity Hjorth Param

Appendix A.15. Mobility Hjorth Param

Appendix A.16. Complexity Hjorth Param

Appendix A.17. Permutation Entropy

Appendix A.18. Approximate Entropy

Appendix A.19. Spectral Entropy

Appendix A.20. Higuchi Fractal Dimmension

Appendix A.21. Total Power Spectral Density

Appendix A.22. Centroid Power Spectral Density

Appendix A.23. Determinism

Appendix A.24. Trapping Time

Appendix A.25. Diagonal Line Entropy

Appendix A.26. Average Diagonal Line Length

Appendix A.27. Recurrence Rate

Appendix A.28. Spectral Edge Frequency 25

Appendix A.29. Spectral Edge Frequency 50

Appendix A.30. Spectral Edge Frequency 75

Appendix A.31. Hurst Exponent

Appendix A.32. Singular Valued Decomposition Entropy

Appendix A.33. Petrosian Fractal Dimension

Appendix A.34. Katz Fractal Dimension

Appendix A.35. Relative Band Power

Appendix A.36. Band Amplitude

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI