3.2.1. Robot-in-Charge Rehabilitation Mode

Figure 12 illustrates the control flow diagram for the three rehabilitation modes, where the robot-in-charge mode is presented by green blocks. Based on the rehabilitation therapy suggested by the doctor, the trajectory of each exoskeleton joint can be planned with the aid of Equation (2), and the corresponding elongation in the linear attractors is calculated via Equation (4). In the rehabilitation process, the real-time data collected by the thin-film pressure sensors installed in the hand exoskeleton (Figure 5b) can be monitored by doctors and the data can be used as a recovery evaluation index.

**Figure 12.** Hand rehabilitation exoskeleton control flow diagram for the three rehabilitation modes.

3.2.2. Therapist-in-Charge Rehabilitation Mode

The therapist-in-charge training mode is presented by the orange blocks in Figure 12. In this mode, a wearable controller is required to be equipped by the therapist. In Figure 5c, there are three IMUs (IMUs 1–3) that record the rotation of the index finder, while two IMUs (IMU 4–5) are installed to detect the motion of the thumb. The three angle readings (pitch, roll, and yaw, specified in Figure 5c) from IMU 3 mainly serve the purpose of a motion benchmark, and the rest of the angles function for hand exoskeleton motion tracking.

In the rehabilitation process, rotation in the index finger PIP or MCP joint leads to the angle variation in IMUs 1 or 2, respectively, while motion in the thumb MCP or DIP can be detected by IMUs 4 or 5, respectively. IMUs in one digit are all aligned in the same plane. For any adjacent two IMUs, differences in pitch and yaw angles are expected to be 0. Taking the index finger PIP joint as an example, using the reading of IMU 2 as a benchmark, the PIP rotation angle can be expressed as follows:

–

–

$$
\begin{bmatrix} Roll \\ Pitch \\ Yaw \end{bmatrix}\_{M \text{CP}} = \begin{bmatrix} Roll \\ Pitch \\ Yaw \end{bmatrix}\_{Imu1} - \begin{bmatrix} Roll \\ Pitch \\ Yaw \end{bmatrix}\_{Imu2} \tag{6}
$$

With the aid of the slave computer, the real-time PIP joint angle variation of the therapist's index finger is obtained. IMUs installed in positions of the hand exoskeleton are similar to the positions in the wearable controller (Figure 5b), and the angle variation in each joint of the hand exoskeleton can also be calculated with the aid of Equation (5). Utilizing Equation (4), the demanded elongation of the linear actuator installed in the exoskeleton is calculated. Figure 13 indicates the decent real-time performance of the therapist-in-charge training mode. apist's index finger is obtained.

**Figure 13.** Real-time PIP joint angles variation for index finger of therapist and index finger exoskeleton. Blue line refers to the PIP joint rotation performed by therapist equipped with wearable controller, while the red line presents the PIP joint angle change in index finger exoskeleton.

#### 3.2.3. Patient-in-Charge Rehabilitation Mode

recognition is of vital importance. 'Stiff hand' is usually observed in stroke patients, and ognizing one hand's posture/action to guide the other hand's motion is the best strategy. The patient-in-charge training strategy designed in this research targets patients with limited exercise ability who are only able rotate digit joints at a small angle (e.g., 5 ◦ ). For these patients who require self-rehabilitation and complex daily activities, intention recognition is of vital importance. 'Stiff hand' is usually observed in stroke patients, and the stiffness is unpredictable considering the vast population of stroke patients, thus recognizing one hand's posture/action to guide the other hand's motion is the best strategy. In this study, both the exoskeleton and its corresponding controller are adopted.

Compared with statistical intention-recognition methods, the deep learning approach of a CNN (Figure 14a) is adopted for its renowned training efficiency and prediction accuracy [19]. Results of the CNN model are validated and compared with the widely adopted machine learning method SVM. Gestures of the hand recognized by the wearable controller can be correlated with planned trajectories of the hand exoskeleton, and these trajectories can be planned and adjusted based on the needs of patients, utilizing Equations (2)–(4).

**Figure 14.** Deep Learning and machine-learning-based intention recognition. (**a**) 1D CNN structure diagram; (**b**) sensor output data pattern of the six actions/gestures; (**c**) confusion matrix diagrams of CNN (**left** panel) and SVM (**right** panel) models.

Data Acquisition and Processing

In this research, the wearable controller is worn by the right hand of volunteers to record data from both the IMUs and thin-film pressure sensors labelled in Figure 5c. Eight unique gestures/actions are selected for the identification experiment (inset of Figure 14b). Five healthy volunteers are involved in the data acquisition (the hand size of each volunteer is presented in Supplementary Materials Figure S5), and each gesture/action is repeated 250 times by each individual volunteer. A total of 10,000 sets of data are collected. Among all the data sets, a random 84% are utilized for training and the remaining 16% are used for testing. To mimic a real application scenario and improve intention-recognition accuracy, diversity of data sets for each gesture/action is necessary. In other words, even for the same gesture/action, the rotation angle (0 to ~60◦ ) for each joint and the force (0 to ~3 N) exerted on the pressure sensor varies significantly for each individual repeat. In addition, during the data-acquisition process, random movement of the arm is inevitable. As such, IMU also records information related to arm rotation, shaking hand, etc. Prior to data acquisition, IMUs and pressure sensors are calibrated. The data acquisition frequency is fixed at a low value of 40 Hz, and each individual gesture/action is performed at a slow pace, which guarantees the diversity of data sets. For the actual rehabilitation process, the data collection frequency can be adjusted based on the preferences of the user.

In the data acquisition phase, the first two gestures are performed by rotating the z1 axis (Figure 2) in counterclockwise and clockwise directions, respectively. The third and fourth gestures/actions are achieved by rotating the y1 axis (Figure 2) in clockwise and counterclockwise directions, respectively. The fifth gesture refers to the bending of both MCP and PIP joints in the index finger. The last three gestures/actions are holding cylinders with a small radius (38 mm, 48 mm, and 60 mm, respectively), aiming to test the effectiveness of the whole HMI strategy. Each collected data set contains five columns, with 200 data points fitted in each column. The first two columns record measurements from pressure sensors 5 and 7 labelled in Figure 5c. The third column refers to the pitch angle change in IMU 2, which describes the up- and down-motion of the index finger dominated by the MCP joint. The roll angle variation of IMU 2 is recorded in column 4, aiming to distinguish between left and right MCP rotation. For the fifth column, the pitch angle difference between IMU 1 and IMU 2 is taken for the description of PIP joint rotation.

Data processing techniques such as normalization and feature extraction are essential for deep learning and machine learning models. Considering the range of measurements from distinct sensor types and the distribution patterns of each data set [56], extra efforts may be required for the CNN model to balance the multiple distribution centers if normalization is not applied. As a result, it slows down the training efficiency, also making the model more difficult to converge. Normalization is achieved in two steps. Firstly, all numbers in each column are scaled to fit in the range of [0, 1], utilizing (*x*−*xMin*) *xMax*−*xMin* . Then, the data set mean value is adjusted to 0 based on (*x*−*µ*) *σ* . In addition, effective feature extraction reduces the correlation of irrelevant dimensions in the data sets, thereby speeding up the training process [57]. Regarding the feature extraction process, it can be achieved in the convolutional layer of the CNN model. For the SVM model, Principal Component Analysis (PCA) is required to reduce the dimensions of the original data set.

Intention-Recognition Model and Results

The structure of the one-dimensional CNN deep neural network model adopted in this research is shown in Figure 14a, which is mainly composed of convolutional, batch normalization, pooling, SoftMax, and fully connected layers. The convolutional layer is designated to extract the features of the specified data segment. The batch normalization layer ensures a decent backpropagation gradient, which alleviates the problem of vanishing gradients [58]. The pooling layer is presented for the reduction of input matrix dimensions. The SoftMax layer stabilizes the values in the backpropagation process and leads to easier convergence for the classification task. The fully connected layer links all the previous features to obtain the classification result. The key parameters, Filters (F), Kernel size (K),

Strides (S), and Padding (P), are presented in Supplementary Materials Table S1. In addition to the parameters mentioned above, the training result of the CNN model is also sensitive to the variation of hyperparameters. In the consideration of intention-recognition accuracy, GA is adopted to find the optimal hyperparameters. GA is a set of mathematical models abstracted from the process of reproduction in nature. It realizes the heuristic search of complex space by simplifying the genetic process. The flow chart of GA (more specifically, the differential evolution algorithm) is shown in Figure S6 (Supplementary Materials). The average recognition accuracy of 10-times K-fold cross-validation is taken as the fitness function of individuals in the population, and the three hyperparameters (Learning Rate, Batch Size, and Epoch) are taken as the decision variables in Table S2 (Supplementary Materials). After 10 generations of population iterations, the optimal parameters of the model were obtained and are shown in Figure S7 and Table S3 (Supplementary Materials).

SVM is a widely adopted machine learning method for classification and intention recognition. The performance of the SVM model is highly related o three hyperparameters, which are kernel function, penalty parameter C, and Gamma. In this study, the linear data dimension reduction algorithm PCA retains 98% of the key information in the original data sets, which minimizes the information loss while compressing data set dimensions significantly and accelerating training/testing. PCA processing reduces each sample data set's dimensions from 1 × 1000 to 1 × 27. The genetic algorithm is also utilized to optimize the hyperparameters of the SVM model. The average recognition accuracy of 10-times K-fold cross-validation is also taken as the fitness function of population individuals. The three hyperparameter parameters mentioned above (kernel function, parameter C, and gamma) are used as decision variables in Table S4 (Supplementary Materials). After 10 generations of population iteration, the optimal parameters of the model are obtained and shown in Figure S8 and Table S5 (Supplementary Materials).

Upon adopting the optimal hyperparameters, a confusion matrix is obtained via testing data set prediction. The confusion matrix in Figure 14c indicates that both methods reach at least 95.6% overall recognition accuracy. Featuring the confusion matrix of the CNN model, each individual posture reaches at least ~98.5% prediction accuracy, and only 15 misclassifications are observed among the total 1600 testing data sets. The SVM model presents high classification accuracy for the first five postures/actions, while significant misclassifications occur when dealing with the last three cylinder-holding tasks.

#### **4. Discussion**
