*4.3. Intention Recognition*

Based on the results of the confusion matrix, CNN possesses 3.5% better overall prediction accuracy compared with SVM. CNN also outperforms SVM significantly in gestures/actions 6, 7, and 8, suggesting better performance in dealing with gestures/actions with high similarities. Dividing all data sets into one training set and one testing set using the leave-one-out method may have led to biased prediction results. To validate the results, k-fold cross-validation is adopted. Selecting 10 distinct random training data sets, the network is retrained 10 times and the corresponding results are recorded in Table 3. The average accuracy of CNN presents even higher superiority over SVM with a much smaller variance presented. Three possible reasons are proposed for this phenomenon. First, CNN possesses advantages in dealing with nonlinear problems [59]. In this study, volunteers with distinct digit lengths, initial hand positions, and joint motion trajectories may lead to significant non-linear correlations between data sets and the target posture, which decreases the prediction accuracy of SVM. Secondly, the convolutional layer in the CNN model extracts more deep-level features [10], while the SVM model only extracts specific features in the data pre-processing stage (using PCA). Ideally, the first five actions/gestures all experience substantial data variation in a single column with non-periodic fluctuations observed in other columns, which are mainly due to the instability of the human hand joint, random motion of the arm, and environmental noise. As these 'unwanted' amplitude fluctuations exceed a certain threshold, the rate of misclassifications rises for SVM as it is unable to effectively extract the data set feature in such a scenario. Lastly, compared with the SVM model there are more adjustable parameters in the CNN model (Supplementary Materials, Tables S1 and S2), which helps it to better adapt to the eight actions/gestures in this study. Observing each individual result, the worst prediction accuracy from SVM is only 41.1% compared with CNN's 92.2%. The low accuracy may either be a result of overfitting or the presence of substantial outliers. However, low accuracy is only observed in a single run, and outliers due to the shaking of hands and random movement of the arm are likely to be the dominant issue. Although the outliers due to arm rotation, hand shaking, etc., may violate the performance of the hand exoskeleton system, the average intention-recognition accuracy (97.1%) based on K-fold cross-validation suggests a reasonable model setup and training process. The CNN model, with its decent balance between high intention-recognition accuracy and a lightweight network structure (the prediction time consumption for both CNN and SVM models is shown in Supplementary Material Table S6), is recommended for real-time intention recognition.


**Table 3.** K-fold cross-validation of CNN model and SVM model.

In this study, a complete hand exoskeleton rehabilitation system is proposed for post-stroke rehabilitation and assistance in complex daily life activities. Three rehabilitation/daily life assistance modes are developed for various personal needs, namely, robot-in-charge, therapist-in-charge, and patient-in-charge modes. With the aid of a sensor matrix, the patient-in-charge mode allows the detection of a small rotation angle in digits and achieves high intention-recognition accuracy when dealing with similar gestures/actions. Thus, stroke patients with limited exercise ability (e.g., 5◦ in each joint) can conduct self-rehabilitation and complex daily activities with the proposed device. Regarding the 'stiff hand' phenomenon observed in stroke patients, the synergy of the actuator (with push force up to 43 N) and linkage can provide enough torque and an accurate trajectory for digit joints.

Note that all experiments are conducted on healthy volunteers. In future studies, the effectiveness of the hand exoskeleton system on stroke patients will be evaluated. Constrained by the size of the current electric actuator, the motion of the DIP joint is not considered. To achieve higher flexibility in the hand exoskeleton, a smaller force transmission mechanism such as voltage-sensitive composite material will be considered for the active control of finger DIP joints. The thumb CMC joint plays an essential role in grasping in terms of flexibility and force transmission. Though the current design allows the grasping of large objects (Figure S10), a mechanism with higher active DoFs for the thumb CMC joint will be designed to better service the assistive purposes. To achieve higher intention-recognition accuracy, three aspects can be considered in further study. Firstly, researchers should increase the user motion information by using more sensors in the system. Secondly, the CNN model architecture can be improved so that the model possesses stronger feature extraction capability. Thirdly, increased diversity and the number of training data sets may further improve the intention-recognition accuracy.

**Supplementary Materials:** The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/bioengineering9110682/s1, Figure S1. Stress evaluation for thumb exoskeleton and index finger exoskeleton. Index finger exoskeleton base serves as fix frame with force (5 N each) applied in the direction labelled in red. The strength of the whole structure is also tested in real action. (a) For exoskeleton made by Aluminum 6061, the maximum stress is ~33.9 MPA compared with the material's yielding stress; (b) for exoskeleton made by PLA material, the maximum stress is ~31.5 MPA compared with the material's yielding stress. Figure S2. Schematic view of the three passive DoFs. Figure S3. Components in the index finger exoskeleton. There are two highlighted areas in the finger exoskeleton, which illustrate the sensor locations and the sliding chute for length adjustment. Figure S4. Hand exoskeleton worn by fingers of different phalanx lengths. (a,b) Length of proximal and intermediate phalanxes are 40 mm and 25 mm, respectively; (c,d) length of proximal and intermediate phalanxes are 46 mm and 27 mm, respectively; (e–f) length of proximal and intermediate phalanxes are 50 mm and 30 mm, respectively. Table S1. The parameters for constructing a Convolution Neural Network (CNN). Figure S5. Index finger length of the five volunteers. (a) Proximal phalanx length, middle phalanx length, and height of the volunteer are ~48 mm, ~30 mm, and ~183 cm, respectively; (b) proximal phalanx length, middle phalanx length, and height of the volunteer are ~46 mm, ~27 mm, and ~168 cm, respectively; (c) proximal phalanx length, middle phalanx length, and height of the volunteer are ~44 mm, ~24 mm, and ~170 cm, respectively; (d) proximal phalanx length, middle phalanx length, and height of the volunteer are ~43 mm, ~23 mm, and ~175 cm, respectively; (e) proximal phalanx length, middle phalanx length, and height of the volunteer are ~40 mm, ~21 mm, and ~156 cm, respectively. Figure S6. Flow chart of differential evolution algorithm. Table S2. Genetic Algorithm setup for CNN model optimization. Figure S7. Results of genetic algorithm to optimize hyperparameters of CNN model. Table S3. The optimal value of hyperparameters in the CNN model. Table S4. Genetic Algorithm setup for SVM model optimization. Figure S8. Results of Genetic Algorithm to optimize hyperparameters of SVM model. Table S5. The optimal value of hyperparameters in the SVM model. Figure S9. A demonstration of grasping objects with the passive joint setup illustrated in Figure S2. (a) A small toolbox with dimeter of ~3.5 cm; (b) water bottle with dimeter of ~6 cm; (c) Orange with dimeter of ~6 cm; A 1:35 M1A1 tank model. Figure S10. Curve of learning rate with epoch. Table S6. Prediction time using CNN model and SVM models. Figure S11. Comparison of identifiable signals with different levels of noise.

**Author Contributions:** Conceptualization, K.X., X.X. and X.C. (Xianglei Chenand); methodology, K.X.; software, K.X., X.C. (Xianglei Chenand), X.C. (Xuedong Chang) and C.L.; validation, K.X., X.C. (Xianglei Chenand) and X.C. (Xuedong Chang); formal analysis, K.X.; data curation, K.X., X.C. (Xianglei Chenand) and X.C. (Xuedong Chang); writing—original draft preparation, K.X., X.C. (Xianglei Chenand) and X.C. (Xuedong Chang); writing—review and editing, K.X., L.G., F.L., X.X., H.S., Y.W. and J.Z.; visualization, K.X., X.C. (Xianglei Chenand) and X.C. (Xuedong Chang); supervision, K.X. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research is funded by the National Natural Science Foundation of China (Grant number: 12102127), Natural Science Foundation of Jiangsu Province (Grant number: BK20190164), Fundamental Research Funds for the Central Universities (Grant number: B210202125), China

Postdoctoral Science Foundation (Grant number: 2021M690872), and Changzhou Health Commission Technology Projects (Grant number: ZD202103).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Informed consent is obtained from all subjects involved in the study.

**Data Availability Statement:** The data presented in this study are available in this article.

**Conflicts of Interest:** The authors declare no conflict of interest.
