1. Introduction
The increase in computing power has brought the presence of many computing devices in the daily life of human beings. A broad spectrum of applications and interfaces have been developed so that humans can interact with them. The interaction with these systems is easier when they tend to be performed in a natural way (i.e., just as humans interact with each other using voice or gestures). Hand Gesture Recognition (HGR) is a significant element of Human–Computer Interaction (HCI), which studies computer technology designed to interpret commands given by humans.
HGR models are human–computer systems that determine what gesture was performed and when a person performed the gesture. Currently, these systems are used, for example, in several applications, such as intelligent prostheses [
1,
2,
3], sign language recognition [
4,
5], rehabilitation devices [
6,
7], and device control [
8].
HGR models acquire data using, for example, gloves [
9], vision sensors [
10], inertial measurement units (IMUs) [
11], surface electromyography sensors, and combinations of sensors, such as surface electromyography sensors and IMUs [
12]. Although there are different options for data acquisition, all of these options have their limitations; for example, gloves and vision sensors cannot be used by amputees; gloves can constrain normal movement, especially in cases involving the manipulation of objects; vision sensors can have occlusion problems, and changes of illumination and changes in the distance between the hands and the sensors; and IMUs and surface electromyography sensors generate noisy data [
13,
14]. Even though all these devices collect data related to the execution of a hand movement, surface electromyography sensors also extract the intention of the movement. This means that these sensors can also be used with amputees, who cannot execute the movements, but have the intention to do so [
15,
16].
Surface electromyography, which we will refer to from now on as EMG, is a technique that records the electrical activity of skeletal muscles with surface sensors. This electrical activity is produced from two states of a skeletal muscle. The first state is when a skeletal muscle is at rest, where each of the muscular cells (i.e., muscle fibers) has an electric potential of approximately –80 mV [
15]. The second state is when a skeletal muscle is contracted to produce the electric potential that occurs in a motor unit (MU), which is composed of muscle fibers and a motor neuron. These electric potential differences are produced when a motor neuron activates a neuromuscular junction by sending two intracellular action potentials in opposite directions. Then, they are propagated by depolarizing and re-polarizing each one of the muscle fibers [
16]. The sum of the intracellular action potentials of all muscle fibers of a motor unit is called a motor unit action potential (MUAP). Therefore, when a skeletal muscle is contracted, the EMG is a linear summation between several trains of MUAPs [
15].
There are two types of muscle contractions: static and dynamic. In a static contraction, the lengths of the muscle fibers do not change, and the joints are not in motion, but the muscle fibers still contract, for example, when someone holds his/her hand still or to make the peace sign. While in a dynamic contraction, there are changes in the lengths of the muscle fibers, and the joints are in motion, for example, when someone waves their hand to do the hello gesture [
17].
The EMG signals can be modeled as a stochastic process that depends on the two types of contraction described above. First, the mathematical model for a static contraction (MMSC) is a stationary process because the mean and covariance remain approximately the same over time, and the EMG depends solely on muscle force [
18]. Consider (
1):
where
N is the number of active MUs,
is the train of impulses that indicate the active moments of each MU,
are the MUAPs of each MU, and * denotes convolution. However, the MMSC can be viewed as a non-stationary process when factors, such as muscular fatigue and temperature affect the EMG [
19].
Second, the mathematical model for a dynamic contraction (MMDC) is a non-stationary process, and its mathematical model is similar to the amplitude’s modulation (AM modulation):
where
is a function that indicates the intensity of the EMG signal (i.e., information signal),
is a unit-variance Gaussian process representing the stochastic aspect of the EMG (i.e., carrier signal), and
is the noise from the sensors and biological signal artifacts [
17,
20].
The mathematical models of EMG are not used in HGR due to the difficulty of parameter estimation in non-stationary processes. However, machine learning (ML) methods are widely used because ML can infer a solution for non-stationary processes [
21] using several techniques; for example, covariate shift techniques [
21,
22], class-balance change [
22], and segmentation in short stationary intervals [
23].
HGR using ML is just one approach to myoelectric control [
24], which uses EMG signals to extract control signals to command external devices [
25,
26], for example, prostheses [
1], drones [
8], input devices for a computer [
27], etc. There are other approaches that include conventional amplitude-based control, and the direct extraction of neural code from EMG signals. In conventional amplitude-based control, one EMG channel controls one function of a device (e.g., hand open is assigned to one channel, and hand closed to a second channel). When the amplitude of this EMG exceeds a predefined threshold, this function is activated [
28,
29,
30,
31]. The direct extraction of neural code from EMGs is another approach, in which the motor neuron spike trains are decoded from EMG signals to translate into commands [
32,
33,
34,
35].
For many applications, HGR models are required to work in real time. A human–computer system works in real time when a user performs an action over the system, and this system gives him/her a response fast enough that it is perceived as instantaneous [
25]. Moreover, the response time in a real-time human–computer system is relative to its application and user perception [
36]. For this reason, the controller delay, which is the response time of an HGR model, has been widely researched. For instance, a user does not perceive any delay when the controller delay is less than 100 ms in the control of devices, such as a key or a switch [
36,
37]. In HGR using EMGs, Hudgins & Parker et al. [
38] stated that the acceptable computational complexity is limited by the controller delay of the system, which must be kept below 300 ms to reduce the user-perceived lag. This optimal controller delay was generally agreed upon by many researchers [
39,
40]. However, there have been several optimal controller delays reported in the scientific literature, namely 500 ms [
41], and 100–125 ms [
42] using a box and blocks test, which is a target achievement test.
Most of the real-time HGR models are evaluated using metrics for machine learning, such as accuracy, recall, precision, F-score,
error, etc. However, this evaluation fails to reflect the performance exhibited in online scenarios as it does not account for the adaptation of users to non-stationary signal features [
43,
44,
45,
46,
47]. For example, Hargrove et al. [
48] demonstrated that the inclusion of transient contractions (i.e., non-stationary signals) in the training data decreases the accuracy, but improves the user performance in a real-time virtual clothespin task. Therefore, in order to evaluate the real-life performance, the real-time HGR models can be evaluated using target achievement tests, such as the box and blocks test [
42,
49], target achievement control test [
50], and Fitts’ law test [
51], which is an international standard in HCI (ISO9341-9).
Currently, there are many primary studies regarding real-time HGR models using EMG and ML, which, in several cases, do not have standardized concepts, such as types of models, real-time processing, types of hand gestures, and evaluation metrics. This standardized knowledge is essential for reproducibility and requires a Systematic Literature Review (SLR) of the current primary studies. To the best of our knowledge, there is no SLR regarding these HGR models. Therefore, we developed this SLR to present the state-of-the-art of the real-time HGR models using EMG and ML. Based on this SLR, we make three contributions to the field of HCI. First, we define a standard structure of real-time HGR models. Second, we standardize concepts, such as the types of models, data acquisition, segmentation, preprocessing, feature extraction, classification, postprocessing, real-time processing, types of gestures recognized, and evaluation metrics. Finally, we discuss future work based on the research gaps we identified.
Following this introduction, the article is organized as follows: in
Section 2 we describe the methodology used to execute this SLR; in
Section 3 we outline the results and the discussion of the data extracted from the primary studies; and
Section 4 and
Section 5 contain the conclusions and future work respectively.
3. Results and Discussion
The data extracted from the 65 SPS (see
Table 3) are presented and analyzed in five subsections: the study overview subsection and the other four subsections, one per each RQ (see
Section 2.1). Although some SPS presented more than one HGR model, we selected the models with the best performance in the evaluation; therefore, we used 65 HGR models for this review.
3.1. Study Overview
The study overview shows a general vision of the settings used in the SPS. Among other data, we decided to extract the publication year and the type of publication.
Figure 2a shows the number of SPS per year, which has increased steadily since 2013. Moreover, in
Figure 2b, we show that most of the SPS were presented in conferences, also see
Table 3.
3.2. Results of the RQ1 (What Is the Structure of Real-Time HGR Models Using EMG and ML?)
We found that the structures of the 65 real-time HGR models are not regular across the studies. However, they have some stages in common, such as Data Acquisition (DA), Segmentation (SEGM), Preprocessing (PREP), Feature Extraction (FE), Classification (CL), and Postprocessing (POSTP). We present a standard structure, considering the frequent stages after they were assembled, the result is illustrated in
Figure 3. Note that there are SPS that did not use all stages of the standard structure because Segmentation, Preprocessing, Feature Extraction, and Postprocessing are optional stages (i.e., without them a model is still feasible).
Table 6 shows the stages of the standard structure used by the SPS.
Aside from the structure of the models, we identified two types of models: the individual models and the general models. Individual models are trained relying on the gestures (data) of a person and recognize the gestures of that same person. General models are trained with the data of several people and recognize the gestures of any person. We found 44 SPS that developed individual models (SPS 1, SPS 2, SPS 3, SPS 5, SPS 6, SPS 8, SPS 9, SPS 10, SPS 13, SPS 15, SPS 16, SPS 24, SPS 25, SPS 27, SPS 28, SPS 30, SPS 33, SPS 34, SPS 36, SPS 37, SPS 38, SPS 39, SPS 41, SPS 42, SPS 43, SPS 44, SPS 45, SPS 47, SPS 48, SPS 49, SPS 51, SPS 52, SPS 53, SPS 55, SPS 56, SPS 57, SPS 58, SPS 59, SPS 60, SPS 61, SPS 62, SPS 63, SPS 64, and SPS 65), and 11 SPS that developed general models (SPS 7, SPS 11, SPS 17, SPS 22,
SPS 23, SPS 26, SPS 31, SPS 32, SPS 35, SPS 40, and SPS 46). The 10 remaining studies do not indicate any type of HGR model. Out of the 11 general models, SPS 35 is the only general model that was evaluated using EMG data from people who did not participate in the training phase. The other 10 general models only used EMG data from people who participated in the training; therefore, it is not possible to conclude that these 10 models are able to recognize gestures of any person.
3.2.1. Data Acquisition
In the Data Acquisition stage, EMGs are acquired from EMG sensors, which can be part of homemade or commercial devices.
Table 7 shows the number of sensors, the sampling rates, and the acquisition devices used in the HGR models. We found that 27 HGR models used eight sensors, 21 of them (SPS 2, SPS 3, SPS 4, SPS 7, SPS 8, SPS 9, SPS 13, SPS 17, SPS 18, SPS 19, SPS 20, SPS 34, SPS 35,
SPS 36, SPS 40, SPS 44, SPS 46, SPS 47, SPS 52, SPS 56, and SPS 61) used the commercial device Myo armband that has eight sensors with a corresponding sampling rate of 200 Hz, and the other six (SPS 5,
SPS 25, SPS 27, SPS 59, SPS 62, and SPS 63) used homemade devices with a similar design to the Myo armband, their sampling rates are 1000 Hz, 960 Hz, 1000 Hz, 1000 Hz, 1200 Hz, and 1000 Hz, respectively.
Additionally, the EMG sampling rate of 16 HGR models (SPS 1, SPS 5, SPS 10, SPS 11, SPS 26,
SPS 27, SPS 30, SPS 31, SPS 32, SPS 37, SPS 38, SPS 39, SPS 43, SPS 48, SPS 49, and SPS 55) is 1000 Hz because these SPS indicate that the sampling rate must be at least twice the highest frequency of the EMG, according to the Nyquist sampling theory, and approximately 95% of the signal power in the EMG is below 400–500 Hz [
114,
115,
116]).
Table 7 also shows the use of commercial devices, including the Myo armband from Thalmic Labs Inc., the MA300 from Motion Lab Systems Inc., the Bio Radio 150 from Cleveland Medical Devices Inc., the ME6000 from Mega Electronics Ltd., the Analog Front End (ADS1298) from Texas Instruments, the Telemyo 2400T G2 from Noraxon, and the EMG-USB2 from OT Bioelettronica. Furthermore, two models (SPS 43 and SPS 45) use high-density EMG sensors.
3.2.2. Segmentation
EMGs are partitioned into multiple segments or windows using different techniques, such as gesture detection and sliding windowing (see
Table 7). Gesture detection computes the beginning and the end of a hand gesture, and returns the EMG that only corresponds to muscle contraction. Therefore, the segment lengths are variable as they depend on the duration of the hand gestures. The sliding windowing techniques partition the EMG into fixed adjacent segments (i.e., adjacent sliding windowing) or fixed overlapping segments (i.e., overlapping sliding windowing) (see
Figure 4). By increasing the window length, up to a certain point, the controller delay increases, and also the accuracy of the models increase as more data are collected for recognition [
25,
40].
3.2.3. Preprocessing
HGR models use preprocessing techniques that transform the EMG into an input signal for Feature Extraction or for the ML algorithm if the structure of the HGR model does not have Feature Extraction (see
Table 6). For example, a common preprocessing technique is the use of a Notch Filter at 50 or 60 Hz that eliminates the AC frequency of the powerlines (SPS 10). Other examples include Offset Compensation, Pre-smoothing, Filtering, Rectification, Amplification, and the use of the Teager–Kaiser-Energy Operator (see
Table 7). Offset Compensation is a technique that eliminates noise through the compensation of the average value of the EMG:
where,
are the raw EMG values,
is the average value of the signal, and
are the EMG values after the use of offset compensation. Pre-smoothing is a technique that computes the mean of the last
m values of the EMG and then sets the mean to the current value
of the signal:
where,
are the raw EMG values and
is the current value that is based on the mean of the
m previous values of the raw EMG. Filtering is a technique that removes some unwanted frequencies or an unwanted frequency band from the raw EMG. Rectification transforms the negative values into positive values (e.g., absolute value function). The Teager–Kaiser-Energy Operator increases the signal-to-noise ratio to improve the muscle activity onset detection of a gesture [
117]. The most used preprocessing technique is filtering (see
Table 7).
3.2.4. Feature Extraction
Feature extraction techniques map the EMG into a feature set. These techniques extract features in different domains, such as time, frequency, time-frequency, space, and fractal.
Table 8 shows the domains of the feature extraction techniques used by the models. Most of the real-time HGR models use time-domain features because the controller delay for their computation is less than the controller delay for the computation of features in other domains (see
Table 9). The mean absolute value is the most used feature in the 65 studies analyzed.
3.2.5. Classification
In this stage, classifiers generate class labels (i.e., the gestures recognized) from a feature set of the EMG. The classifiers used are support vector machines (SPS 7, SPS 10, SPS 14, SPS 15, SPS 18, SPS 23, SPS 25, SPS 27, SPS 28, SPS 30, SPS 38, SPS 39, SPS 49, SPS 52, SPS 53, SPS 55, and SPS 59), feedforward neural networks (SPS 2, SPS 16, SPS 17, SPS 22, SPS 24, SPS 26, SPS 29, SPS 32, SPS 35, SPS 36, SPS 44, SPS 42, SPS 46, SPS 47, SPS 56, SPS 60, and SPS 61), linear discriminant analysis (SPS 5, SPS 11, SPS 13, SPS 31, SPS 37, SPS 45, SPS 48, SPS 57, SPS 63, SPS 64, and SPS 65), convolutional neural networks (CNN) (SPS 4, SPS 20, SPS 43, and SPS 62), CNN with transfer learning (SPS 34), radial basis function networks (SPS 40), temporal convolutional networks (SPS 41), k-nearest neighbors and dynamic time warping (SPS 8, and SPS 9), collaborative-representation-based classification (SPS 19), k-nearest neighbors (SPS 1), k-nearest neighbors and decision trees (SPS 12), binary tree-support vector machine (SPS 21), vector auto-regressive hierarchical hidden Markov models (SPS 6), Gaussian mixture models and hidden Markov models (SPS 3), quadratic discriminant analysis (SPS 33), fuzzy logic (SPS 50), recurrent neural networks (SPS 51), generalized regression neural networks (SPS 54), and one vs one classifier (58). The most commonly used ML algorithms are support vector machines, feedforward neural networks, and linear discriminant analysis.
3.2.6. Postprocessing
To improve the accuracy of the HGR models, the postprocessing techniques adapt the output of the ML algorithm to the final application. Only 15 out of 65 SPS used postprocessing techniques, such as majority voting (SPS 2, SPS 11, SPS 21, SPS 37 and SPS 43), elimination of consecutive repetitions (SPS 8, SPS 9, SPS 36, and SPS 51), threshold (SPS 35, and SPS 44), and velocity ramps (SPS 60, SPS 63, SPS 64, and SPS 65).
Many works perform an analysis of some of the stages shown in
Section 3.2 to determine the best structure to improve the accuracy of the HGR models, for example, data acquisition [
39,
48,
118,
119], optimal window length [
120], filtering [
121,
122], feature extraction [
123], and classification [
124,
125] stages. However, the results are inconclusive because the structure of the HGR models depend on the environment in which the models are developed (i.e., the data sets used, the people who participated in the evaluation, the application of the models, etc.)
3.3. Results of the RQ2 (What Is the Controller Delay and Hardware Used by Real-Time HGR Models Using EMG and ML?)
3.3.1. Controller Delay of the HGR Models
The controller delay is the sum of two values, which are the data collection time (DCT) (i.e., window length) and the data analysis time (DAT) [
39,
42]. In real-time processing, the DCT and DAT should be as short as possible, but the DCT also should allow the HGR model to collect enough EMG data to recognize a hand gesture. For instance, in prosthesis control, the optimal DCT using four EMG sensors with a sampling rate of 1 kHz should be between 150–250 ms [
120].
An HGR model using EMG is considered to work in real-time when the response time (i.e., controller delay) is less than the optimal controller delay. There are several optimal controller delays reported in the scientific literature, namely 300 ms [
39], 500 ms [
41], and 100 ms for fast prosthetic prehensors and 125 ms for slower prosthetic prehensors [
42].
In accordance with the Inclusion and Exclusion Criteria (see
Section 2.3.1), all 65 HGR models indicate that they are real-time models. However, there are some SPS that did not report the controller delay (i.e., DCT and DAT) of their HGR models.
Table 10 shows the DCT and DAT of the SPS.
3.3.2. Hardware Used
The controller delay of the HGR models not only depends on their structure but also on the hardware used to process the models. For example, an HGR model may not work in real-time if the user perceives delays in the HGR response because the device has limited processing capabilities. The same HGR model may also be considered to work in real-time in another device with better processing capabilities. For this reason, when a model is described, it is fundamental to indicate the hardware characteristics of the devices used for running an HGR model.
Table 10 shows the two types of hardware used, which are personal computers and embedded systems. Ten HGR models were processed in personal computers, such as laptops, desktops, etc., five HGR models were processed in embedded systems, and the remaining models did not indicate the hardware used.
3.4. Results of the RQ3 (What Is the Number and Type of Gestures Recognized by Real-Time HGR Models Using EMG and ML?)
3.4.1. Number of Gestures Recognized
The number of gestures recognized is the number of classes of an HGR model. There are HGR models that have the same number of gestures, and each model has different gestures. For example, there are two HGR models that recognize four gestures, but the classes of the first model are thumb up, okay, wrist valgus, and wrist varus (SPS 14), and the classes of the second model are hand extension, hand grasp, wrist extension, and thumb flexion (SPS 22). Hence to compare these models, it is important to consider the difference in the gestures as well.
3.4.2. Type of Gestures Recognized
The hand gestures, according to the type of movement, are classified as static and dynamic. A static gesture is made when the skeletal muscles are in constant contraction (i.e., there is no movement during the gesture), and in a dynamic gesture, the skeletal muscles are in contraction, but it is not constant, which indicates that there is movement during the gesture.
The EMG data generated by a gesture has two states: transient and steady. The EMG data in the transient state are generated during the transition from one gesture to another, and the EMG data in the steady state are generated when a gesture is maintained [
38]. Moreover, the offline classification of hand gestures using EMG data in the steady state is more accurate than in the transient state as the variance of the EMG data in the transient state varies more (i.e., non-stationary process) than in the steady state over time [
40]. However, in the training phase, the inclusion of EMG data in the transient state improves subject performance in a real-time virtual clothespin task [
46,
48].
Figure 5 presents the EMG data of a person who made a long-term gesture (i.e., gestures that lasted a long time) after a relaxed position or rest gesture. In this figure, the EMG data in the transient state are generated during the transition from the rest gesture to the peace sign, and the EMG data in the steady state are generated when the peace sign is maintained. The short-term gestures (i.e., gestures that lasted only a short time) generate more EMG data in the transient state than in the steady state as most of the time is spent in transitions from one gesture to another (see
Figure 6).
The durations of the gestures used by the models are shown in
Table 11. This table shows seven aspects about the gestures recognized by the HGR models reviewed in this SLR, such as the number of classes, the number of gestures per person in the training set (NGpPT), the number of people who participated in the training (NPT), the number of gestures per person in the evaluation set (NGpPE), the type of gestures recognized, the state of the EMG data used, and the duration of the gestures (DG). NGpPT, NPT, and DG show the EMG data used to train the individual (
), and general (
) models. We found that 63 out of 65 HGR models recognized static gestures, and only one HGR model recognized both dynamic and static gestures (SPS 25); moreover, no HGR model recognized only dynamic gestures. Additionally, six SPS used EMG data in the steady state, two SPS used EMG data in the transient state, three SPS used EMG data in the steady and transient states, and the remaining HGR models did not indicate the state of the EMG data. There were 31 out of the 65 HGR models that considered the rest gesture (i.e., the hand does not make any movement) as a class.
Finally, 5 out of the 65 HGR models (SPS 59, SPS 60, SPS 62, SPS 63, and SPS 64) recognized static gestures simultaneously to control multiple degrees of freedom of a prosthesis, which replicates simultaneous movements, such as wrist rotation and grasp to turn a doorknob. The remaining HGR models recognized gestures sequentially.
3.5. Results of the RQ4 (What Are the Metrics Used to Evaluate Real-Time HGR Models Using EMG and ML?)
According to the type of evaluation (see
Section 1), we divide the SPS into two groups. HGR models evaluated using metrics for machine learning (56 models), and target achievement tests (nine models).
3.5.1. HGR Models Evaluated Using Metrics for Machine Learning (from SPS 1 to SPS 56)
These 56 HGR models used 13 evaluation metrics (see
Table 12), such as accuracy (
9), recall (
10), precision (
11), accuracy per user (
12), recall per user (
13), precision per user (
14), median of the accuracy per user (
15), standard deviation of the accuracy per user (
16), standard deviation of the accuracy per class (
17), standard deviation of each user accuracy (
18), standard deviation of the recalls of each class (
19), classification error (
20), and Kappa index (
21). The accuracy is the metric most used,
Table 12 shows the evaluation metrics used by these 56 models. The formulas of these evaluation metrics are:
where
is the number of gestures made by the user
which were recognized by the model as
j but they were
k.
is the set of test users,
is the set of predicted classes,
is the set of actual classes,
u is the total number of test users, and
g is the number of classes.
We identified five machine-learning metrics that evaluate the entire HGR model. The first one is accuracy, which is the fraction of gestures recognized correctly among all the test data. Second, the recall is the fraction of gestures recognized correctly for a class among the test data of this class. Third, the precision is the fraction of gestures recognized correctly of a class among the gestures recognized by the HGR model as this class. Fourth, the standard deviation of the accuracy per user is the amount of dispersion of the recognition accuracies per user. Finally, the standard deviation of the accuracy per class is the amount of dispersion of the recalls of a particular model.
These metrics can produce biased results for two reasons: an incorrect definition of a true positive, and an unbalanced test. In order to determine the recognition accuracy, a gesture is considered as a true positive (i.e., the gesture is recognized correctly) when the HGR model determines what gesture was performed and when this gesture was performed by a person. However, only SPS 51 is evaluated in this way. Eleven HGR models (SPS 2, SPS 5, SPS 6, SPS 7, SPS 8, SPS 9, SPS 19, SPS 20, SPS 34, SPS 35, and SPS 36) determine the classification accuracy because they only took into consideration what gesture was performed by a person as a true positive, and the remaining models do not show what they consider a true positive.
In addition, the test set is balanced when it has the same number of samples per class and the same number of samples per user (see
Table 13). For example, if an HGR model is evaluated using a set that has more data for the user A, the accuracy of this model and the accuracy of the user A tend to be the same.
There are five SPS (SPS 2, SPS 5, SPS 8, SPS 9, and SPS 18) in which the evaluation was performed with data acquired without feedback (i.e., the correctness of classification was not provided in the evaluation), thus people cannot adjust their movements to the HGR model. Eight SPS were performed with data acquired with feedback from the HGR model (SPS 1, SPS 4, SPS 11, SPS 12, SPS 13, SPS 17, SPS 20, and SPS 29), and the remaining SPS do not indicate information about feedback.
Table 13 shows the recognition accuracies, the number of people who participated in the evaluation, type of data set (i.e., balanced or unbalanced), and the use of Cross-Validation by the 56 HGR models. The largest number of people is 80 (SPS 23). Three HGR models were evaluated using EMG data from amputees (SPS 6, SPS 21, and SPS 48). Moreover, 19 HGR models use cross-validation, that is, a technique used to minimize the probability of biased results in small data sets (see
Table 13).
3.5.2. HGR Models Evaluated Using Target Achievement Tests (from SPS 57 to SPS 65)
These nine HGR models used three target achievement tests, including the motion test (SPS 60), target achievement control test (TAC) (SPS 60, SPS 63, and SPS 65), and Fitts’ law test (FLT) (SPS 59, SPS 61, SPS 62, SPS 64, and SPS 65). These three tests used ten metrics, such as throughput (SPS 57, SPS 58, SPS 59, SPS 61, SPS 62, SPS 64, and SPS 65), path efficiency (SPS 57, SPS 58, SPS 59, SPS 60, SPS 61, SPS 62, SPS 64, and SPS 65), overshoot (SPS 57, SPS 58, SPS 59, SPS 61, SPS 62, SPS 64, and SPS 65), average speed (SPS 57), completion rate (SPS 57, SPS 58, SPS 60, SPS 61, SPS 63, SPS 64, and SPS 65), stopping distance (SPS 58), completion time (SPS 60, SPS 63, and SPS 65), real-time accuracy (SPS 60), length error (SPS 63), and reaction time (SPS 64) (see
Table 14).
A motion test was proposed by patients with targeted muscle reinnervation to evaluate the myoelectric capacity [
128]. These patients should maintain a gesture until the HGR model has made a predetermined number of correct predictions. In TAC, the patients control a virtual prosthesis to obtain a target for a dwell time, which is generally 1 s [
50]. These patients have a trial time to get the target, which is generally 15 s. FLT is a similar test to TAC, but the users control a circular cursor with two or three degrees of freedom. FLT states that there is a trade-off between speed and accuracy [
51,
108], which is defined by:
where
is the movement time,
a and
b are empirical constants, and
is the index of difficulty (ID) of a target (see Equation (
23)), which is calculated using the distance (D) from an initial point to a target, and the width (W) of the target. Throughput is a metric proposed by Fitts, which is the ratio between the ID and MT (see Equation (
24)), to summarizes the performance of a control system. The results of FLT are reliable when this test combines a variety of IDs [
129].
The people who participated in these tests received feedback (i.e., the correctness of classification was provided in the evaluation). Four out of these nine HGR models were evaluated with four amputees (SPS 63), two amputees (SPS 59, and SPS 64), and one amputee (SPS 65).
In order to achieve concluding results, it is necessary to consider the sample size, which is the number of people who participated in the evaluation
(see
Table 11) times the number of gestures per person
(see
Table 13), to allow us to obtain statistically significant results. Using the typical values of a statistical hypothesis test (confidence level of 95%, margin of error of 5%, and population portion of 50%), we estimated
according to the Normal Distribution using the Central Limit Theorem (
25), and
according to the Hoeffding’s inequality (
26), which is widely used in machine learning theory.
where,
z is the critical value of the normal distribution for a confidence level of 95%,
is the margin of error,
p is the population portion, and
is the confidence level. Therefore, the sample size
gestures of the test set must be in the order of hundreds of thousands. None of the works present so far considered these values to achieve a significant result. In the scientific literature, many EMG data sets are available [
130], but, according to the best of our knowledge, the data set with the higher
is 30 [
131], and with the higher
is 40 [
84,
132].
4. Conclusions
This SLR analyzes works that propose HGR models using surface EMG and ML. Following the Kitchenham methodology, we introduced four RQs based on the main goal of this SLR, which was to analyze the state-of-the-art of these models. To answer these four RQs, we presented, analyzed, and discussed the data extracted from 65 selected primary studies. Below are our findings in regard to the four RQs.
Structure: The structure of the models studied varies from one work to the other. However, we were able to examine the structure of these models using a structure composed of six stages: data acquisition, segmentation, preprocessing, feature extraction, classification, and postprocessing. Under this standard structure, we studied the types of HGR models, the number of EMG sensors, the sampling rate, sensors, segmentation and preprocessing techniques, extracted features, the domain of the extracted features, and the ML algorithm. The most used structure is: eight EMG sensors, a sampling rate between 200 Hz and 1000 Hz, overlapping sliding windowing, filtering (segmentation), mean absolute value (feature extraction), support vector machines, and feedforward neural networks (classification).
Controller delay and hardware: The controller delay of gesture recognition models is the sum of two values: data collection time (DCT) and data analysis time (DAT). A recognition model works in real-time when this sum is less than an optimal controller delay. However, the works analyzed report several optimal controller delays for different applications, suggesting that the optimal controller delay is relative to the user perception and the application of a recognition model.
Number and types of gestures recognized: The 65 works analyzed propose models that recognize different number and types of gestures: 31 works took into consideration the rest gesture as a class to be recognized; only one model recognized both static and dynamic gestures; and the remaining models recognized static gestures only. No model recognized dynamic gestures only as most of the EMG data generated by dynamic gestures are in the transient state. Recognizing gestures using EMG data in the transient state is more complex than in the steady state because the latter behaves as a non-stationary process. The classification of the hand gestures using EMG data in the steady state is more accurate than in the transient state, and only nine works recognized short-term gestures (i.e., using EMG data in the transient state).
Metrics and results: We divided the SPS according to the types of evaluation, which are machine-learning metrics and target achievement tests. 56 SPS evaluated their models using machine learning metrics. We found 13 machine-learning metrics and three target achievement tests. The training and testing protocols vary among the works making the comparison of their performance very difficult. Moreover, taking into consideration that many works do not describe these protocols and the whole structure of the model, one key point is the significance and reproducibility of the results. Using the normal distribution for the number of people, and the Hoeffding’s inequality for the number of gestures per person, we estimated that the sample size of the test set must be in the order of the hundreds of thousands to obtain a result with a confidence level of 95% and a precision of 5%. None of the works analyzed utilize a test set of this magnitude, and therefore the confidence and reproducibility of their results are questionable. Based on the definition a true positive, only one out of the HGR models, which used machine-learning metrics, was evaluated using the recognition accuracy; the remaining models were evaluated using classification accuracy as they only took into consideration what gesture was performed by a person as a true positive.