1. Introduction
In 2017, the global count of unilateral upper limb amputees exceeded 11.3 million, with an additional 11.0 million individuals experiencing bilateral upper limb amputations [
1]. Within Canada, around 6800 individuals live with an amputation proximal to the wrist [
2]. A recent study [
2] compared the utility outcomes and costs associated with two interventions for treating hand amputations: hand vascularized composite allotransplantation and myoelectric hand prostheses. The conclusion was that treating unilateral amputations with myoelectric prostheses was more cost-effective.
Myoelectric prosthetic hands have emerged as a pivotal avenue for restoring both gesture and prehensile capabilities in upper-limb amputees, offering a non-invasive alternative to permanent surgical interventions [
2]. Prevalent control systems in prostheses often utilize a trigger-based mechanism relying on one or two surface electromyography (EMG) channels [
3,
4]. This setup maps single muscle contraction events to predefined movement sequences, necessitating explicit user commands for mode switching [
3,
4]. This sequential switching introduces latency in execution times, requiring multiple distinct commands to transition between different grip modes [
3,
4]. The non-intuitive nature of this grip-switching process, coupled with awkward control dynamics and a lack of sufficient feedback, has been identified as a primary contributor to the low acceptance rates observed for myoelectric prosthesis devices [
5,
6]. Addressing these challenges is essential for enhancing the usability and acceptance of myoelectric prosthetic devices within the user community [
5,
6].
Several endeavors to classify sEMG (surface electromyography) signals from human forearm muscles have been documented in previous works. To mitigate intuitiveness concerns, prototype solutions within the existing literature focus on deciphering user gesture intentions by targeting distinct flexor and extensor muscles in the forearm [
7]. The voluntary contractions of the remaining forearm muscles after amputation can be identified through machine learning classifiers, such as artificial neural network (ANN), linear discriminant analysis (LDA), and support vector machine (SVM) classifiers. SVM is often chosen due to its mathematical interpretability and global optimization. It performs well even with a small training set [
8]. SVM’s elasticity parameter, also known as the box constraint hyperparameter
C, controls the maximum penalty imposed on margin-violating observations and aids in preventing overfitting [
8]. Palkowski and Redlarski [
9] employed two EMG sensors on the forearm, sampling data at 16 Hz, to discern six whole hand and wrist gestures using an SVM classifier. Lee et al. [
10] successfully classified ten hand gestures using features obtained by three EMG sensors and achieved an accuracy exceeding 90% for each participant. However, their machine learning model underwent training and testing on participant datasets without cross-participant data amalgamation to assess the model’s generalization capability. This restrictive testing approach introduces bias into the classification accuracy, raising questions about the applicability of this technology in prosthetic hands. Other previous works [
8,
11,
12] achieved high accuracy of over 90% for classifying multiple whole hand and wrist gestures, like wrist opening/closing, ulnar and radial deviation, and flexion-extension. However, whole hand and wrist gestures are less challenging to classify and offer a limited functional application for upper limb amputees seeking the restoration of manual dexterity.
The efficacy of targeting specific muscles requires the strategic deployment of electrodes and their configuration—a critical consideration for feature-based approaches that are relatively unexplored in the literature.
This project aimed to improve hand gesture classification accuracy, strategic sensor placement, and the selection of practical gestures for potential integration with myoelectric prosthetic hands. Therefore, this study investigated the development of a multiple kernel learning (MKL)-based SVM classifier for classifying five intricate hand gestures crucial for amputees—power grasp (clenching of the wrist), hook grip (four-digit grasp), fine pinch (using index finger and thumb), coarse pinch (using all five digits), and point gesture (flexing of digit 3, 4, and 5). The research methodology involved building the classifier using data collected across five distinct sensor configurations, each utilizing one or two EMG sensors on the forearm toward a minimalistic data collection approach, while striving to identify the optimal sensor placement to obtain the highest classification accuracy.
2. Materials and Methods
2.1. Experimental Procedure
sEMG data were collected from the right hand of eight participants without neurological/musculoskeletal impairment or diagnosis (age: 21 ± 2 years (mean ± SD), body height: 169.5 ± 2.8 cm (mean ± SD), body weight: 57.9 ± 9.5 kg (mean ± SD), 7 males and 1 female). All participants were acquainted with the experimental procedures. Informed consent was obtained from all subjects involved in the study. The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Ethics Committee of the University of Alberta (AB T6G 2N2, approved on 25 April 2022).
The sEMG signals were acquired using bipolar MyoWare 2.0 Muscle sensors [
13] (SparkFun Electronics, Niwot, CO, USA) chosen for their potential for integration into low-cost hand prostheses. The previous version of this sensor has been frequently used in the literature because of its low cost, easy-to-customize features, and favorable performance reports in validation studies, showing it to be comparable to more expensive commercial EMG systems [
14,
15]. MyoWare 2.0 has three electrodes—mid-muscle, end-muscle, and reference [
13]. Before being acquired by the microcontroller, the differential signal passed through an instrumentation amplifier with a high CMRR (common mode rejection ratio, which is the ratio of differential gain to common-mode gain of the amplifier stage) (140 dB) and unitary gain to eliminate common noise sources, such as the 50 Hz line noise. Subsequently, the signal was filtered by a first-order band-pass filter with cut-off frequencies at 20 Hz and 498 Hz [
13]. Following this, the signal underwent rectification and smoothing, achieved by a low-pass envelope detection circuit (3.6 Hz) embedded in the sensor hardware, resulting in a linear envelope of the EMG signal. The enveloped EMG output was then acquired by the microcontroller (Arduino UNO) with a sampling frequency of 780 Hz (SD = 5Hz) to address concerns regarding undersampling in previous studies [
16].
The EMG sensor placement configurations were based on the gestures aimed to be classified. The sensors were placed only along those forearm muscles majorly responsible for flexion of the index finger, ring finger, and thumb. The five gestures used were (1) a coarse pinch using all five digits, (2) a fine pinch using index finger and thumb, (3) a hook grip (four-digit grasp), (4) point gesture (flexing of digits 3, 4, and 5), and (5) a power grasp (clenching of the wrist), labelled as 0 to 4, respectively, for the classifier. These five gestures (
Figure 1) are considered to have high utility for amputees in their daily lives and are also seen in commercial prosthetics [
3,
4].
The EMG sensors were placed at three locations on the ventral side of the forearm (
Figure 2) for five different configurations to acquire sEMG signals from muscles primarily responsible for the flexion of the index finger, ring finger, and thumb. The sensor configurations were based on [
17,
18,
19] and are described as follows:
C1: One sensor placed proximal to the wrist along the flexor pollicis longus to acquire the thumb flexion data.
C2: One sensor placed proximal to the wrist along flexor digitorum superficialis to acquire the index finger flexion data.
C3: One sensor placed along flexor digitorum superficialis and flexor digitorum profundus to acquire ring finger flexion data.
C4: Two sensors placed proximal to the wrist along flexor digitorum superficialis to acquire index finger flexion data and proximal to the elbow along flexor digitorum superficialis and flexor digitorum profundus to acquire ring finger flexion data.
C5: Two sensors placed proximal to the wrist along the flexor digitorum superficialis to acquire index finger flexion data and along the flexor pollicis longus to acquire the thumb flexion data.
The participants were asked to perform each gesture repeatedly 20 times, with a time interval of two seconds between each repetition. After performing one round of data collection for a single gesture, the participant took a 4 min rest to relax their muscles to prevent fatigue before beginning the next round for a new gesture. The signals measured from the two EMG sensors were recorded simultaneously.
Figure 2.
The five different sensor placement configurations used for acquiring EMG data: (a) thumb flexion data, (b) index finger flexion data, (c) ring finger flexion data, (d) index and ring flexion data, and (e) index and thumb flexion data.
Figure 2.
The five different sensor placement configurations used for acquiring EMG data: (a) thumb flexion data, (b) index finger flexion data, (c) ring finger flexion data, (d) index and ring flexion data, and (e) index and thumb flexion data.
2.2. Data Processing
Figure 3 shows a flowchart of the data processing steps involved in the entire experiment and modeling pipeline. First, the EMG data were recorded by sensors and filtered to eliminate noise. Second, the signals were segmented into shorter sequences, and time and frequency domain features were extracted to capture distinct characteristics of the EMG data. Third, the processed data were used for hand gesture classification using machine learning, where an 80–20 train-test split was performed to train and evaluate the models. Fourth, the hyperparameters of the machine learning models for the given dataset were tuned using a 10-fold grid search cross-validation method to enhance performance. Fifth, the classifiers were trained using the full training dataset and these optimized hyperparameters. Sixth, the model’s performance was assessed using unseen testing data and the evaluation metric—the accuracy score—to reflect the model’s effectiveness in classifying the EMG data.
Addressing noise in EMG signals is crucial to enhance classification accuracy. Employing an efficient filtering technique significantly contributes to refining EMG signal classification. To refine the EMG signals obtained for specific gestures from each participant, a digital filtering process was applied. This process aimed to eliminate erratic peaks and local extrema clutter from the long-envelope EMG signals as seen closely in
Figure 4. Among various filtering methods, the Gaussian smoothing filter (GSF), recommended in [
20], emerged as a promising approach, leading to improved EMG signal modeling and accuracy (
Figure 4). To facilitate feature extraction-based analysis, the long EMG data sequence per participant for a single gesture was segmented into smaller windows, each containing four gestures for a specific participant. The individual sequences are about 8000 samples long and were stored separately for subsequent time and frequency domain feature extraction.
2.3. Feature Extraction and Selection
Feature extraction is crucial for reducing the dimensionality of the data representing a gesture, thereby enabling the classification of gestures. A total of 22 time and frequency domain features were extracted from the 8000 sample long segmented EMG signals for every gesture. Features included statistical measures such as root mean square (RMS), variance (VAR), mean absolute value (MAV), slope sign change (SSC), zero crossings (ZC), waveform length (WL), median absolute deviation (MAD), skewness, kurtosis, and energy. These features are well-established in the literature for their utility in gesture recognition [
21,
22]. The time-frequency domain features include autoregressive model coefficients (5th order), entropy (based on approximate coefficients of 1D discrete wavelet transform of level 4), variance estimates (based on maximum overlap discrete wavelet transform of level 3), spectral entropy, mean frequency, and band power. The wavelet transforms were based on the Daubechies-2 wavelet. The above features have been used multiple times in the literature [
10,
23,
24]. To facilitate effective classification, the extracted features were normalized to a mean of zero by subtracting the mean from every sample and normalizing it using standard deviation.
Principal component analysis (PCA) was utilized in the feature selection phase as a key step to further reduce the dimensionality of the feature space, which was derived from the segmented EMG data obtained by the two sensors. With an initial feature set comprising 44 distinct features (22 × 2 sensors), PCA was utilized to transform these features into a new set of principal components. The primary objective was to retain the most informative components while reducing dimensionality. The top features capturing over 95% variability across the dataset were meticulously selected. This selection criterion ensured the retention of only those components that significantly contributed to the data’s variance, thereby compressing the feature space while preserving its essential information. These selected components, which captured the most of the dataset’s variability, were then exclusively fed into the classifier. This strategic use of reduced, yet informative, PCA-derived features aimed to enhance classification performance by focusing on the most critical aspects of the EMG data.
2.4. Machine Learning Classifier
In this paper, SVM was employed to classify five hand gestures. As SVM is a kernel-based method, selecting proper kernel functions and associated hyper-parameters is an important task. This problem is usually solved by a trial-and-error approach. Moreover, a typical single-kernel SVM application frequently adopts the same hyper-parameters for each class, and it may not be suitable when feature pattern distributions are significantly different among different classes. Although there are different kernels, such as the Gaussian kernel, polynomial kernel, and sigmoid kernel, it is often unclear which is the most suitable kernel for a given dataset, and thus it is desirable for the kernel methods to use an optimized kernel function that adapts well to the data set at hand and the type of the boundaries between classes. An efficient way to design a kernel that is optimal for a given data set is to consider the kernel as a convex combination of basis kernels as illustrated in Equation (1). Such an MKL-based SVM is inspired by [
25].
Here, K1 is the linear kernel, K2 is the quadratic kernel, K3 is the cubic kernel, and KRBF is the Gaussian or the radial basis function kernel with unity standard deviation. The coefficients {a, b, c, d} are hyperparameters to be tuned using a 10-fold grid search cross-validation. The elasticity parameter or box constraint C is also a hyperparameter expressing the degree of loosing constraint. A large C can classify the training samples more correctly but also end up overfitting and reducing the testing accuracy, hence, the box constraint is also tuned using a 10-fold grid search cross-validation. The training and testing of the data are done using a typical 80–20 train-test split. PCA was used for feature selection to further reduce the dimensionality of the input vector given to the classifier and only choose statistically significant features.
Further analysis encompassed utilizing three machine learning classifiers: naïve Bayes, decision tree, and KNN. This was undertaken to evaluate the performance of SVM within the specific dataset and experimental conditions, focusing on robustness with an increasing number of gestures and varying sensor configurations. Although SVMs are well-established as effective for EMG-based gesture recognition [
26,
27], the research aims to investigate these findings within our unique setup that uses commercially available low-cost sEMG sensors. By confirming the robustness and effectiveness of SVMs in this application, further evidence will be added to the established theory, considering any nuances from our specific dataset.
4. Discussion
The study extends beyond assessing classification accuracy, delving into the complexities of classifying an increasing array of gestures, as corroborated by prior research [
28]. This investigation further reveals the limitations of using a single sensor for EMG data acquisition, which restricts the number of distinctly identifiable classes. We evaluated the effectiveness of four machine learning classifiers—SVM, naïve Bayes, KNN, and decision tree—over five distinct sensor configurations for collecting detailed hand gesture EMG data from the forearms of eight participants. An extensive set of 22 features, encompassing both time and frequency domains per sensor, were extracted from the EMG signals. Prior studies suggest that a mixed-domain feature set can bolster classifier performance [
29].
The application of PCA to select key features significantly reduced feature space dimensionality, enhancing classification accuracy. The results, as summarized in
Table 1,
Table 2,
Table 3,
Table 4 and
Table 5, identified sensor setup
C5 (two sensors placed proximal to the wrist along flexor digitorum superficialis and flexor pollicis longus) as the most effective, yielding the highest accuracy with SVM, naïve Bayes, and decision tree classifiers. The SVM classifier, empowered by MKL, consistently achieved over 90% accuracy in gesture classification through
C5, as shown in
Table 2 and
Figure 5. The dual-sensor configuration of
C5, which captures data from the muscles controlling thumb and index finger movements, suggests an advantageous strategy for sensor placement in prosthetic hand design.
The fine pinch gesture, depicted in
Figure 1b, emerged as particularly challenging due to its lower muscle contraction levels and reliance on just two fingers, making it more prone to noise.
C5’s precise sensor positioning over the active muscles during this gesture enabled the capture of more nuanced data. A comparison of
C1 and
C5 (
Figure 2) illustrates the significant influence of an additional EMG sensor on classification accuracy, confirming the superiority of dual-channel over single-channel EMG in gesture recognition.
When space constraints limit sensor placement near the wrist on a prosthetic arm,
C4 (two sensors placed proximal to the wrist along flexor digitorum superficialis and along flexor digitorum superficialis and flexor digitorum profundus) emerges as a feasible alternative, achieving 87.2% accuracy with SVM (
Table 5). These results highlight the importance of both sensor location on the muscle and proximity to the muscle belly for optimal data collection.
MKL’s enhancement of SVM performance is evident in sensor configuration
C3 (one sensor placed along flexor digitorum superficialis and flexor digitorum profundus), where a single sensor competes closely with the dual-sensor
C4 in accuracy (SVM testing accuracy,
Table 5). MKL’s adaptive selection of kernel functions for
C3’s data is a significant advance over single-kernel methods. The SVM’s hyperparameter
C plays a pivotal role in balancing model complexity against overfitting, with a range of values from 0.01 to 10, evaluated through a 10-fold grid search cross-validation (
Table 1). This fine-tuning was crucial for developing an optimized SVM model with strong generalization capabilities for precise gesture classification. Thus, for prosthetic arms that can only incorporate one EMG sensor,
C3 is the recommended setup. The use of MKL with SVM significantly improves performance over base SVMs, especially for classifying multiple gestures and allowed us to achieve non-zero MKL coefficients for classifying all five gestures as detailed in
Table 1. While base SVMs excel in binary classification, MKL handles the complexity of EMG signals better by combining multiple kernels. This results in robust classification of five distinct gestures, justifying the added complexity of MKL.
The confusion matrices demonstrate SVM’s high performance over different gestures at around 80% to 100% with the
C5 configuration. Additionally, SVM excelled as a binary classifier for the fine pinch versus power grasp gestures (
Figure 1), achieving 100% accuracy. A noted trend is the reduction in classification accuracy with an increasing number of gestures, a phenomenon that resonates with previous findings [
26]. This decline is attributed to the broader dispersion of EMG signals in the forearm and the resultant signal overlap from simultaneous muscle contractions when performing complex gestures.
A potential limitation of the present study, when extended to real-time online EMG classification, is the slight variation in the lengths of EMG segments in the time domain used for feature extraction across different gesture configurations. This limitation can be readily addressed by selecting precisely equal segment lengths during online application. Another limitation of the current work is the sample size of eight participants. This sample size is similar to other samples sizes in the literature [
25,
27]. However, the limited sample size may affect the robustness of accuracies obtained to a certain level. The current study serves to validate the proposed methodology on a small scale, and it is planned to conduct future experiments with a larger sample size and include individuals with limb differences to directly evaluate the application of our findings in prosthetic hand control. This will enhance the generalizability and relevance of the current research.
Future research will apply the findings from this paper to replicate hand gestures in a myoelectric hand prosthetic in real-time. It will also explore different sensor configurations and machine learning techniques to further enhance the performance of myoelectric prostheses. Additionally, the development of more sophisticated feature extraction and classifier optimization methods could be beneficial in handling multifaceted EMG signal analysis. This study serves as a steppingstone towards the realization of more efficient and effective myoelectric prostheses, and it is hoped that the insights gained will inspire further exploration in this promising field.