2.1.1. 3D Pose Comparison

Human motion is the coordinated movement of different body parts, and motion data from the Kinect sensor could be seen as a sequence of frames that comprise 3D coordinates of joint positions of the human skeleton and each bone determined by two connected joints could be seen as a 3D vector in the space. In order to assess the similarity between trainer's motion (coach) and trainee's motion (end user), the first step is to quantify the difference between these two 3D poses at a given frame. In this study, we chose the sum of the angle difference between all corresponding bone vectors of the two 3D poses as the distance measure. Eight major bones of the human skeleton (the upper and lower arms, and the upper and lower legs) were chosen for motion comparison since most human motions during the rehabilitation exercises involve the coordination of upper and lower limb movements. The angle difference (θ) between two corresponding bone vectors of trainer and trainee is illustrated in Figure 1, which can be calculated by the law of cosine (see Equation (1)).

$$\begin{array}{rcl} \theta & = & \arccos(\overrightarrow{\overrightarrow{AB} \cdot A'B'})\\ & & \overrightarrow{|AB|} \cdot \overrightarrow{|A'B'|} \end{array} \tag{1}$$

**Figure 1.** Angle difference (θ) between corresponding bone vectors for trainer (green) and trainee (red), using the left lower leg as an example.

## 2.1.2. Motion Comparison

Given two sequences of motion data X and Y, each frame of motion data is a 3D pose. We applied DTW to find the optimal matching between the trainer's motion and the trainee's motion while minimizing the effects of shifting and distortion in time [32]. Since we chose eight bone vectors for a motion comparison and the motion of each bone constitutes one dimensional time-series data, both trainer and trainee's data have eight dimensions. The matrix of each dataset has dimension 8-by-*n*, where *n* is the total frames of motion data. To explain how DTW works, let us start with the one-dimensional case. The motion of a body part can be denoted as *S* = (*s*1, *s*2, ... , *sn*) and *T* = (*t*1, *t*2, ... , *tm*), which correspond to the trainer's motion and trainee's motion, respectively. The element in *S* and *T* is a normalized bone vector of that body part in a certain 3D coordinate system. To compare the similarity of sequence *S* and *T* by DTW, an *n*-by-*m* cost matrix is constructed where the (*i*th, *j*th) element denoted as *C*(*si*, *tj*) is the angle difference between *si* and *tj* (See Equation (1)). A warping path denoted as *P* defines an alignment between *S* and *T* in the cost matrix, which should satisfy three conditions: boundary condition, monotonicity condition, and step size condition [33]. There could be multiple feasible warping paths in the cost matrix and the total matching cost of one warping path *P* between *S* and *T* is defined by the equation below.

$$\mathcal{L}\_p(\mathcal{S}, T) = \sum\_{k=-1}^s \mathcal{C}(s\_{ik'}, t\_{jk}), \tag{2}$$

where *s* is the length of the warping path *P*.

The goal of DTW is to find the optimal warping path, which has the minimal cumulative distance among all the possible warping paths. The DTW distance *DTW(S*, *T)* is defined as the total matching cost of the optimal warping path. In order to find the optimal warping path, a dynamic programming method is used. The recursive equation is given by the equation below.

$$D(i,j) = \min\{D(i-1,\ j-1), D(i-1,\ j), D(i,\ j-1)\} + \mathcal{C}(s\_{i\cdot}, t\_{\cdot}).\tag{3}$$

where 1 < *i* < *n* and 1 < *j* < *m*. *D*(*i*, *j*) represents the matching cost between standard data (*S*) and testing data (*T*) from (1, 1) to (*i*, *j*).

DTW can be generalized from the one-dimensional case to a multi-dimensional case [34]. For a multidimensional case, *si* and *tj* are not single bone vector but multiple bone vectors, which represent whole body motion. Multi-dimensional *DTW*(*S*, *T*) is calculated in a similar way as the one-dimensional case, except that we need to redefine *C*(*si*, *tj*) as the sum of angle difference among all the dimensions. The output of DTW is the matching cost associated with the cumulative distance along the shortest warping path. Therefore, the lower the matching cost is, the closer the two motion sequences are and the better the motion performance is. In order to quantitatively assess motion performance of a trainee,

we further convert the DTW distance (matching cost) to a meaningful performance score in terms of the percentage (0–100%) using the following equation.

$$Performance\,\,score = \,\,\frac{\sum\_{k=1}^{s} \left[1 - \frac{C\left(s\_{ik}, t\_{jk}\right)}{90 \times 8}\right]}{s} = 1 - \frac{DTW(S, T)}{90 \times 8 \times s},\tag{4}$$

where *s* is the length of optimal warping path, 8 stands for eight bone vectors selected for motion evaluation, and *C*(*sik*, *tjk*) is the element of optimal warping path in the cost matrix, which is the summation of angle differences for eight bone vectors, and DTW distance-*DTW*(*S*, *T*) is a summation of elements (*C*(*sik*, *tjk*)) along the optimal path. We assume the angle difference between two corresponding bone vectors is within 90 degrees based on an earlier study [18], which results in the maximum *DTW*(*S*, *T*) along the optimal path, which would be 90 × 8 × s. Because the output distance (*DTW*(*S*, *T*)) is a measure of dissimilarity between the two motion time series (the longer the distance, the greater the deviation), the last part of Equation (4) would be a percentage score (0–100%) to measure the level of similarity between the trainee's motion and trainer's motion.
