5.1. First Stage
Table 5 shows the hyperparameters achieved for each of the recurrent networks optimized using GWO.
A notable robustness is observed in the LSTM model, with 102 neurons distributed between two layers. It has a total of 76,861 trainable parameters. On the other hand, bidirectional networks have only 27 neurons in total but have 48,183 trainable parameters. On the other hand, the GRU is presented with only 43 neurons and 33,425 trainable parameters. The lighter nature of GRU may be the reason why it takes more epochs to reach convergence. Despite this difference in the density of neurons between GRU and bidirectional networks, there is no considerable disparity in complexity. This observation shows that a more significant number of neurons does not necessarily result in an intrinsically more complex network. Concerning learning rates, a high rate such as the one adopted by the bidirectional model (0.0117) suggests a faster adaptation of the weights, although with possible oscillations that may be experienced during the process. Meanwhile, more contained rates, such as those adopted by the LSTM (0.00346) and GRU (0.00554), suggest a more cautious approach toward convergence.
The batch size, which is another crucial hyperparameter, shows variations between architectures. In GRU, a considerable batch of 329 is used, probably to speed up training through simultaneous data processing. However, this benefit may be risky, as larger batch sizes may compromise convergence accuracy. Despite these risks, on all architectures, including LSTM with a batch size of 188 and bidirectional with 199, a flawless accuracy of 100% was achieved during testing.
Figure 1 shows the final block diagram for each of the three trained models.
Table 6 shows the temporal analysis of the different architectures of the recurrent neural networks studied. A difference in time is observed between the different stages evaluated.
The LSTM network proved to be the most efficient in terms of training time, requiring only 31.47 s. This result is particularly interesting given its high neuronal density and relatively large number of trainable parameters. The moderate learning rate (0.00346) and batch size (188) could contribute to this rapid convergence and efficient training. Regarding validation time, the LSTM was also slightly faster than the GRU, needing only 0.81 s. LSTM was remarkably effective for prediction, with a time of only 0.12 ms.
On the other hand, the GRU, despite being less dense and having fewer parameters than the LSTM, required a longer training time of 51.28 s. Given its lighter architecture, this longer duration is related to the need for more epochs to converge. The validation time of the GRU was slightly longer than that of the LSTM, registering 0.85 s. Despite this marginal difference, it is relevant to mention that the GRU’s prediction time, while still relatively fast, was slower than the LSTM, taking 0.134 ms.
Finally, the bidirectional architecture, which uses an underlying LSTM structure to process sequences in both directions, showed the longest training time of the three, at 81.60 s. This increase in time is associated with the bidirectional nature of the model, which processes forward and backward information, intrinsically increasing the computational load. Despite its compact configuration of neurons, its validation time was the longest, requiring 1.23 s. In terms of prediction, it also showed the longest time, at 0.2 ms.
Figure 2 presents the error evolution in different recurrent networks through the GWO optimization method. The GRU network, shown in
Figure 2b, starts with the highest error, approximately 17.5%. However, its rapid convergence is notable, reaching an error of 0% in the third iteration. On the contrary, the bidirectional network, which can be seen in
Figure 2c, starts with the lowest error, 1.6%, in its first iteration, thanks to an appropriate combination of hyperparameters obtained by the algorithm. Despite this, it requires six iterations to minimize the error to 0%, showing a more gradual reduction than the other architectures, a direct consequence of its low starting error. In the case of the LSTM, presented in
Figure 2a, it starts with an error of 6% and shows a rapid decrease until the third iteration, after which its decrease becomes more gradual, reaching 0% in the eighth iteration.
Figure 3 illustrates the training and validation accuracy behavior of the three recurrent neural network models: LSTM, GRU, and bidirectional, each optimized with the GWO optimization algorithm. Consistently across all three models, an increase in classification across iterations is observed, indicative of an absence of overfitting. The LSTM model shows a rapid increase in accuracy that soon stabilizes, maintaining a slight advantage in training accuracy over validation, suggesting effective generalization without falling into memorization. On the other hand, although the GRU model follows a similar trend in increasing precision, it presents a distinctive peak in the validation curve that could be attributed to temporal overfitting or variations in the test data. However, this model also stabilizes its precision, demonstrating its ability to adapt and generalize with the advancement of time. The bidirectional network maintains the general behavior observed in LSTM and GRU, with the training and validation accuracy curves advancing in close formation throughout the process.
Figure 4 presents the evolution of the average error in the wolf population throughout the iterations, illustrating how the global solutions improve as they advance. A distinctive feature of metaheuristic algorithms is their ability to offer multiple solutions at the end of the iterative process. Each solution, corresponding to an individual in the population, can be adapted to the desired objective but with different properties. At the end of iteration 10, several RNN configurations reported an error of less than 1%, each with different sets of hyperparameters. For this study, those networks with faster response times in the evaluation stage of each topology were chosen. However, it is possible to select networks according to other criteria, such as the minimum number of neurons or the shortest training time, depending on the specifics of the problem addressed.
Figure 4a, corresponding to LSTM, reveals a start with the highest average error, approximately 65%. Furthermore, it shows a convergence to the lowest error in iteration 9, characterized by a gradual decrease. This behavior suggests a constant and balanced optimization of the prediction for the LSTM population. In contrast,
Figure 4b, corresponding to GRU, exhibits a more irregular evolution, with an initial error close to 60%, reaching the minimum average error at iteration 8. This slightly oscillating behavior in GRU suggests that the GWO algorithm faces challenges in finding solutions that significantly reduce the error. Finally,
Figure 4c shows that bidirectional neural networks start with a lower average error, around 41%. These networks reach faster convergence, achieving the minimum error in iteration 5. Their smooth and rapid trajectory suggests that GWO has a better facility to identify favorable solutions in this topology.
This study used SVM with a Gaussian kernel as a reference model. Since SVM does not allow direct processing of raw signals, performing a proper characterization of these signals was imperative. The characteristics proposed in ref. [
10] were used for this. The features used are shown in
Table 7.
It is relevant to highlight that the dataset and features used in this study are the same as those used in [
10]. These features were carefully selected for this database in the previously mentioned work. By implementing the above-mentioned features, the SVM model achieved an efficiency of 93%.
Table 8 presents the fundamental comparisons between the models based on RNN and SVM.
Table 9 provides a detailed analysis of the performance of an SVM classifier in the testing stage for the five moves. Regarding sensitivity, class 1 shows the best performance with 85.2%, closely followed by class 2 with 81.9%. Class 3 also performs well, with 80.2%. However, the sensitivity decreases noticeably for classes 4 and 5, with 63.9% and 72.1%, respectively, indicating that the SVM classifier has difficulty correctly identifying these classes compared to the first three. Regarding specificity, which evaluates the classifier’s ability to correctly identify negatives, a generally high performance is observed in all classes. Class 1 achieves a specificity of 95.6%, and classes 2 and 3 also exhibit high specificity, 93.2% and 95.9%, respectively. Although classes 4 and 5 present lower specificity, 83.2% and 82.9%, these values are still relatively high. It is important to contrast these results with the performance achieved by the LSTM, GRU, and bidirectional models which, by achieving 100% accuracy, also achieve 100% sensitivity and specificity. The lower performance of the SVM, particularly in sensitivity for classes 4 and 5, could indicate limitations in its ability to handle certain characteristics of these data or require more specific tuning of the model.
Table 8 shows an interesting comparison concerning the training and response times of the models. Noteworthy is the fact that SVM has the shortest training time. However, this efficiency is offset by a longer response time in the classification phase. This behavior is attributed to extracting features from the data before entering them into the classifier. This additional step imposes a delay that affects its performance in terms of response time. In contrast, RNN networks have the advantage of working directly with the raw data, eliminating the need for a feature extraction step and offering faster response.
Another relevant aspect is classification efficiency. Even though all models are trained using the same database and identical preprocessing, SVM has a lower classification rate. This discrepancy is due to the added complexity of selecting appropriate features. While the effectiveness of RNNs focuses on the quality and complexity of the input data, SVM has the particularity of depending not only on the selected features but also on the interaction and synergy between them. This analysis highlights the fundamental differences between feature-based approaches and those based on raw data, highlighting the strengths and limitations inherent to each methodology in the context of EMG signal classification.
5.2. Second Stage
Table 10 shows the hyperparameters achieved for each of the recurrent networks optimized using GWO in the second stage.
For the LSTM model, an increase in the number of neurons in the first layer is observed from 28 to 31, suggesting a need for greater capacity to adapt to the variability in the data in the second stage, where a more significant number of individuals was included in the testing set. However, there was a significant reduction in the number of neurons in the second layer, going from 74 to 13, which could indicate an attempt to simplify the model to prevent overfitting. The batch size increased from 188 to 206, while the training epochs increased from 10 to 14, indicating that the model required more iterations on the data to reach convergence. Additionally, there was a slight increase in the learning rate.
In this second stage, a significant change is observed in the configuration of the GRU model. The number of neurons in the first layer was slightly reduced to 22, while it was increased to 101 in the second layer. This redistribution in model capacity suggests a change in modeling strategy, possibly due to differences in variability. By having a more significant number of individuals in the testing set in the second stage, the model may have needed to strengthen the internal layers to better generalize over the unseen data, thus avoiding overfitting the peculiarities of the training set. The batch size experienced a slight increase to 339, and the training epochs decreased to 19. These changes in the training hyperparameters suggest a search for balance between the stability and the speed of convergence of the model. A larger batch size may contribute to a more stable gradient estimation during training. At the same time, the reduction in the number of epochs suggests that the model was able to achieve an excellent fit to the data more efficiently. Finally, the learning rate increased from 0.00554 to 0.00731, indicating a more aggressive adjustment to the model weights during training. This increase can be interpreted as an attempt to speed up the training process.
In the second stage, the number of neurons in both layers experienced a slight increase, reaching 15 for both for the bidirectional model. This change suggests an adjustment of the model in response to the increased variability in the data introduced by the change in the distribution of the training, validation, and testing sets. It is important to note that a relatively simple structure is maintained despite this increase in the capacity of the model. The batch size was kept constant at 199, indicating that the data processed in each training iteration was adequate from the first stage. However, the training epochs decreased slightly to 15, suggesting that the model was able to fit the data more efficiently in the second stage despite potential additional complexities. One of the most notable changes was the learning rate, which increased from 0.0117 to 0.0177. This increase in the speed at which the model adjusts its weights is an effort to speed up the training process and achieve faster convergence.
Figure 5 shows the final block diagram for each of the three trained models.
Table 11 shows the temporal analysis of the different architectures of recurrent neural networks studied in stage two.
In evaluating the training, validation, and prediction times of the different recurrent neural network architectures, distinctive patterns and significant changes are observed between the two stages of the study. The LSTM model proved the most time efficient, with 31.47 s for training, 0.81 s for validation, and 0.12 ms for predictions. However, in the second stage, these times increased, recording 52.76 s, 1 s, and 0.21 ms, respectively. This increase can be attributed to increased training epochs, which implies a higher computational cost.
On the other hand, despite being generally slower than the LSTM, the GRU model maintained reasonable times and experienced a less pronounced increase between the two stages. In the first stage, the GRU recorded 51.28 s, 0.85 s, and 0.134 ms for training, validation, and prediction, respectively, and in the second stage, these times increased to 57.90 s, 1.2 s, and 0.24 ms. This behavior may be related to the adjustments to the number of neurons and the learning rate observed in the hyperparameters.
The bidirectional neural network, for its part, showed the highest times in both stages, underlining its computationally more intensive nature due to information processing in two directions. In the first stage, the times were 81.60 s for training, 1.23 s for validation, and 0.2 ms for predictions, while in the second stage, these increased dramatically to 115.16 s, 2.6 s, and 0.34 ms, respectively. This increase in times can be justified by the increase in the complexity of the model, reflected in the number of neurons and the learning rate.
Figure 6 presents the evolution of the best solution per iteration of the GWO optimization algorithm applied to the data from the second stage. In this instance, particular behaviors can be observed in each of the neural network architectures evaluated. In the case of the LSTM network,
Figure 6a, it starts with an error close to 14%, which is higher than that recorded in the first stage. However, this network shows a remarkable ability to quickly adjust its parameters, resulting in an accelerated decrease in error. This phenomenon can be attributed to the reduction in the number of individuals used in the training and validation phases, decreasing the variability in these datasets and facilitating the network learning process. On the GRU network side, shown in
Figure 6b, a similar initial behavior is observed in both stages, with a comparable starting error. However, during the second stage, the decrease in error manifests itself more gradually, reaching a minimum in the fourth iteration for both phases of the experiment. Finally, in
Figure 6c, the bidirectional network presents a less abrupt error decay during the second stage, reaching a minimum error in iteration 9. This contrasts with the first phase, where the minimum error is achieved in iteration 6.
Figure 7 illustrates an encouraging behavior of the models during the training and validation phases, highlighting the absence of overfitting, since a concurrent increase in precision is observed for both phases. However, it is particularly interesting to note the peculiar behavior of the bidirectional network between iterations 8 and 11, where a brief decrease in percentage accuracy is experienced, as shown in
Figure 7c. This small valley in accuracy could be attributed to a slightly high learning rate, which could have caused oscillations in model convergence. However, the crucial thing to highlight is the ability of the bidirectional network to recover, eventually achieving a classification close to 99%. This demonstrates the notable resilience and robustness of the model, highlighting its ability to overcome temporary setbacks in training and improve its accuracy.
Table 12 summarizes the precision achieved by each model in the testing phase for each experimental stage. During the first stage, the LSTM, GRU, and bidirectional models achieved an impressive 100% accuracy, highlighting their ability to capture and learn from the complexity of arm movement patterns based on EMG signals. However, in the transition to the second stage, a slight decrease in the accuracy of all models was observed. The LSTM model, initially achieving classification perfection, experienced a slight drop, recording 98.46% accuracy. This decrease is attributed to the variability introduced by the new distribution of individuals in the training and testing phases. For its part, despite having maintained an accuracy of 100% in the first stage, the GRU model showed a more pronounced decrease in the second, reaching 96.38% accuracy. This reduction is due to its simpler structure compared to the LSTM, making it more susceptible to variations in the data. Despite its ability to process information in both directions and capture more complex contexts, the bidirectional network was not immune to variability between stages and experienced a decrease in accuracy, registering 97.63% in the second stage. Although this decrease is notable, the bidirectional network managed to maintain relatively high performance, demonstrating its robustness and ability to adapt.
The results presented in
Table 13 reveal the performance of the three implemented models regarding sensitivity and specificity across five different classes. In general, all models exhibit high sensitivity and specificity in all classes, with most values exceeding 95%. This demonstrates a strong ability of the models to identify instances of each class (sensitivity) correctly and to properly exclude instances that do not belong to that class (specificity). Furthermore, there is notable consistency in performance across different classes for each model, suggesting good generalization of the models across various classification conditions.
Analyzing each model individually, LSTM achieves the highest sensitivity and specificity rates in almost all classes for values greater than 97%. On the other hand, although achieving a sensitivity and specificity of 100% in classes 1 and 3, respectively, the GRU model shows slightly lower performance in other classes compared to the LSTM, being more notable in classes 4 and 5, where the sensitivity drops below 95%. The bidirectional model shows behaviors similar to LSTM and GRU. Regarding the analysis by class, classes 1 to 4 are those that the three models most accurately identify. However, class 5 is the most challenging regarding sensitivity, especially for the GRU model. This could suggest greater complexity or similarity to other classes that make their precise identification difficult.