2.4.2. Principal Component Analysis

Principal component analysis (PCA) is a statistical technique that performs a linear transformation from an original set of values into a smaller one of uncorrelated variables [15]. The idea was conceived of by K. Pearson [27] and later developed by Hotelling [28].

PCA needs to find some encoding function that produces the code for an input, and a decoding function that produces the reconstructed input, given its code [29]. In general, PCA uses a covariance matrix to reduce the data dimension. By calculating the eigenvalue eigenvector of the covariance matrix of the data, and selecting the matrix composed of eigenvectors corresponding to *k* features with the largest eigenvalue (i.e., the largest variance), the data matrix can be converted into a new space to achieve dimensional reduction of data features. However, in our study, PCA is used to improve the estimation performance, rather than dimension reduction, which is the same PCA application described in [15].

### 2.4.3. Random Forest with Principal Component Analysis

The structural diagram of the RFPCA method used for the knee estimation from the sEMG is shown in Figure 5. In this figure, the blue lines represent the training process and the red lines describe the studying course. The time-domain amplitudes were extracted from 5 sEMG channels. After the process and PCA, one part of the data would be used for training to build the estimation model and the rest would be used for testing.

**Figure 5.** The structural diagram of the random forest with principal component analysis (RFPCA) used for knee estimation from sEMG.

In our work, the processed sEMG *Xt* forms the original data set *<sup>X</sup>emg*, and *<sup>X</sup>emg* which is then transformed through a matrix *P* by PCA. Next, the input *X* of the following RF is obtained, which is described as follows:

$$X = PX\_{\text{emp}}.\tag{3}$$

with the combination of *X* and the knee angle *yknee*, the data set after PCA is derived as ( *X*, *yknee*). These data are divided into training set and testing set.

When the training sample *Tr* = (*Xtr,Ytr*) is given, the goal is to use the *Tr* to establish a estimation model *E*(*X*) and apply it to estimate the new knee angle from sEMG. The RF consists of a collection of *N* randomized regression trees *<sup>r</sup>*(*X*, *vi*), where *vi* (*i* = 1, 2, ... , *N*) are the independent random

variables. They are used to resample the training set and select the successive direction for splitting [19]. The estimation model *E*(*X*) is an average of the regression trees in the RF, expressed as:

$$E(\mathbf{X}) = \frac{\sum\_{i=1}^{N} r(\mathbf{X}\_i \mathbf{v}\_i)}{N},\tag{4}$$

When the testing data set *Xtest* is used to validate the estimation performance, the estimated knee angle can be calculated as follows:

$$\hat{\mathcal{g}}\_{\text{knee}} = E(\mathbf{X}\_{\text{test}}; \boldsymbol{\upsilon}\_1, \boldsymbol{\upsilon}\_2, \boldsymbol{\upsilon}\_\text{-}, \boldsymbol{\upsilon}\_\text{N}, T\_r) = \frac{1}{N} \sum\_{i=1}^N r(\mathbf{X}\_{\text{test}}; \boldsymbol{\upsilon}\_i, T\_r). \tag{5}$$
