*2.4. Data-Analysis*

#### 2.4.1. Extracting and Converting Raw Data

The raw data was converted into BVH file by the Axis neuron software. The BVH file is a file format developed by the BVH Company to store skeleton hierarchy information and three-dimensional motion data [33]. The BVH file comprises two parts: one is used to store skeleton hierarchy information and the other to store motion information. The skeleton hierarchy information includes the connection relationship between joint points and the o ffsets of the child joint points from their parent skeleton points. In the skeleton hierarchy, the first skeleton point is defined as Root. Root is the parent of all other skeleton points in the skeleton hierarchy. Motion information stores the global translation amount and the rotation amount of Root in each frame of the movement. The global translation amount is the position coordinate: X position, Y position, and Z position in the world coordinate system and the rotation amount is the rotation component: X rotation, Y rotation, and Z rotation in the Euler angle [33]. The motion information of other skeleton points is recorded on the rotation amount related to the parent points. The IMU used 17 sensors to measure motion data on 17 points of the body and the recorded order of the rotation amount of each point is Z rotation, Y rotation, and X rotation. The skeleton hierarchy information of BVH on the IMU and the skeleton model are shown in Figure 3.

In the BVH file, the rotation data is recorded on the Euler angle of 17 skeleton points. Some issues with rotation data expressed on the Euler angle (gimbal lock and singularity problems) were overcome using quaternion [34]. Quaternion is a 4-dimensional hyper-complex number, expressing a three-dimensional vector space on real numbers [35]. We used four-tuple notation to represent quaternion as follows:

$$q = [w, x, y, z] \tag{1}$$

In this quaternion, *w* is the scalar component, and *x*, *y*, *z* are the vectors.

Therefore, the format of the rotation data from BVH files was converted from Euler angle to quaternion. If the order of rotation in Euler angle is *z*, *y*, *x*, we used α, β, γ to represent the rotation angles of the object around *x*, *y*, and *z* axes. The corresponding quaternion can be converted as follows:

$$\begin{aligned} \begin{bmatrix} q & -\begin{bmatrix} w \\ x \\ y \\ z \end{bmatrix} - \begin{bmatrix} \cos(\gamma/2) & \cos(\beta/2) & 0 \\ 0 & 0 & \sin(\alpha/2) \\ \sin(\gamma/2) & 0 & 0 \\ 0 & 0 & 0 \end{bmatrix} \\ - \begin{bmatrix} \cos(\gamma/2)\cos(\beta/2)\cos(a/2) + \sin(\gamma/2)\sin(\beta/2)\sin(a/2) \\ \cos(\gamma/2)\cos(\beta/2)\sin(a/2) - \sin(\gamma/2)\sin(\beta/2)\cos(a/2) \\ \cos(\gamma/2)\sin(\beta/2)\cos(a/2) + \sin(\gamma/2)\cos(\beta/2)\sin(a/2) \\ \sin(\gamma/2)\cos(\beta/2)\sin(a/2) - \cos(\gamma/2)\sin(\beta/2)\sin(a/2) \end{bmatrix} \end{aligned} \tag{2}$$

**Figure 3.** The skeleton hierarchy information of BVH on the IMU (Perception Neuron 2.0): (**a**) A participant wearing the IMU (Perception Neuron 2.0) to measure the motion data; (**b**) The interface of Perception Neuron 2.0; and (**c**) The skeleton model of BVH file for Perception Neuron 2.0.

#### 2.4.2. Extracting Key-Frames

After extracting the motion data, we used key-frames extraction to reduce the motion data. Due to the limited storage and bandwidth capacity available to users, the large amount of motion data collected on Mocap may restrict its application [36]. Key-frames extraction, which extracts a small number of representative key-frames from a long motion sequence, is widely used in motion analysis. This technology can reduce the data amount, which facilitates data storage and subsequent data analysis [36,37].

#### Extraction of Key-Frames on Inter-Frame Pitch

We used the distance between quaternions to evaluate the inter-frame pitch between frames and set a threshold of inter-frame pitch to extract key-frames [38]. The method is based on the rotation data of each skeleton point which is represented as a quaternion and uses a simple form to evaluate the distance between two quaternions. The inter-frame pitch between the two frames is assessed by the sum of the distances between the quaternions of every point. The process is constructed with three sections: calculating the distance between quaternions, calculating the inter-frame pitch between frames, and extracting key-frames on the set threshold of inter-frame pitch.

1. The distance between quaternions

To evaluate the distance between two quaternions, the conjugate quaternion *q*\* of a quaternion is defined as follows:

$$q\* = [w\_\prime - \mathfrak{x}\_\prime - \mathfrak{y}\_\prime - \mathfrak{z}] \tag{3}$$

and the quaternion norm ||*q*|| is defined as follows:

$$\|q\| = \sqrt{w^2 + x^2 + y^2 + z^2} \tag{4}$$

then:

$$\left\|q\right\|^2 = qq\* = w^2 + x^2 + y^2 + z^2 \tag{5}$$

when a quaternion norm ||*q*|| is 1, which means:

$$w^2 + x^2 + y^2 + z^2 = 1\tag{6}$$

the quaternion is a unit quaternion. A quaternion is converted to a unit quaternion by dividing it by its norm.

From the definitions of conjugate quaternion, quaternion norm, and unit quaternion, we can define the inverse of a quaternion (*q*<sup>−</sup>1) as follows [39]:

$$q^{-1} = \frac{1}{\|q\|} = \frac{1}{\|q\|^2} q\*\_{\prime} \|q\| \neq 0 \tag{7}$$

According to Shunyi et al. [38], if there are two quaternions: *q*1, *q*2 are unit quaternions and:

$$q\_1 q\_2^{-1} = [w, \; \ge, \; y, \; z] \tag{8}$$

the distance between the quaternions *q*1 and *q*2 is:

$$d(q\_1, q\_2) = \arccos w \tag{9}$$

Therefore, we converted the rotation of a skeleton point based on Euler angles into quaternion, then normalized and converted the quaternion into unit quaternion, and finally calculated the di fference between any two quaternions of the point according to Equation (9).

#### 2. Calculation of Inter-Frame Pitch between Two Frames

We used the sum of the di fferences between the quaternions at 17 skeleton points to evaluate the inter-frame pitch between two frames. The human motion represented by the BVH file are discrete-time vectors, which is the same after conversion to quaternions [38]. The weightage for di fferent points needs to be taken into account when calculating the inter-frame pitch due to the tree-structure (parent-child) of the BVH format. Referring to the methods used in previous research [38,40], and the relationship structure between the skeleton points on the IMU in this study (see Figure 3), we assigned the weightage values of the 17 skeleton points as shown in Table 1.

If *t*1 and *t*2 are the two frames in a sequence of frames, we defined the inter-frame pitch between two frames: *t*1 and *t*2 as the following equation:

$$D(t\_1, t\_2) = \sum\_{i=1}^{n} W\_i d(q\_i(t\_1), q\_i(t\_2)) \tag{10}$$

In Equation (10), *n* represents the total number of skeleton points (*n* = 17), *Wi* represents the weightage of each skeleton point (shown in Table 1), and *qi* represents the quaternions of each skeleton point.


**Table 1.** The weightage of the 17 skeleton points.

#### 3. Key-frames extraction on the set threshold of inter-frame pitch

Based on the inter-frame pitch between two frames, we set: key\_frame as an array to store the quaternion corresponding to the key-frames of motion; key\_num as a set of vector to store the serial number corresponding to a key-frame; key\_num1 presents the time series number corresponding to the first key-frame; current\_key as the last frame in the set of key\_num. λ is a preset threshold value of inter-frame pitch which is mainly determined based on the demand for a compression rate of frames. The algorithm steps are shown in Figure 4.

#### **Figure 4.** The algorithm steps of key-frames extraction.

<sup>4.</sup> Motion reconstruction error

*Sensors* **2020**, *20*, 6258

The purpose of motion reconstruction is to rebuild the same number of frames as the original frames based on interpolation reconstruction of non-key-frames between adjacent key-frames [38,41]. First, individually, the position coordinates (in the world coordinate system) of points were calculated on the point hierarchy and relative rotation angle between the points in the BVH file. Second, given that *pt*1 and *pt*2 are the positions of a point of adjacent key-frames in time *t*1 and *t*2, then *pt* (representing the position of a point of non-key-frame in time *t*) is calculated by linear interpolation between *pt*1 and *pt*2 as follows [41]:

$$\begin{array}{c} p\_t = \mu(t)p\_{t1} + (1 - \mu(t))p\_{t2}, \\ \mu(t) = \frac{t\_2 - t}{t\_2 - t\_1}, \\ t\_1 < t < t\_2 \end{array} \tag{11}$$

The algorithm steps of motion reconstruction are shown in Figure 5.

#### **Figure 5.** The algorithm steps of motion reconstruction.

In this study, we used the position error of the human posture to calculate the reconstruction error between the reconstructed frames and the original frames [38]. Assuming *m*1 is the original motion sequence, *m*2 is the reconstruction motion sequence from the key-frames, the reconstruction error *E*(*<sup>m</sup>*1, *m*2) is evaluated as [42]:

$$E(m\_1, m\_2) = \frac{1}{n} \sum\_{i=1}^{n} D(p\_1^i - p\_2^i) \tag{12}$$

The distance of human posture is used to measure the position error of human posture:

$$D(p\_1^i - p\_2^i) = \sum\_{k=1}^m \left\| p\_{1,k}^i - p\_{2,k}^i \right\|^2 \tag{13}$$

In this equation, *m* represents the total number of skeleton points, *pi*1,*k* is the position of *k* point in *i* frame of the original motion sequence, and *pi*2,*k* is the position of *k* point in *i* frame in the reconstruction sequence.

#### Extraction of Key-Frames on Clustering

A problem with the key-frames extraction on inter-frame pitch is that the compression rate of the key-frames with the same inter-frame threshold for different actions may vary considerably [40]. As the eight motions of Baduanjin are quite different, the key-frames extraction on the inter-frame

pitch may cause some motions to be compressed too much, and some motions not compressed enough. Therefore, we also chose another way to extract key-frames on clustering. This method was used for key-frames with the pre-set compression rate [43].

#### 1. K-means clustering algorithm

K-means clustering algorithm is an iterative partition clustering algorithm. In this key-frame extraction method, we used the K-means clustering algorithm to cluster the 3D coordinates ([*<sup>x</sup>*, *y*, *z*]) of the skeleton points in the original frame. Assuming that the total length of the original frames is *N*, *i* represents the *i* frame in *N*. *pi* is the vectors of the 3D coordinate positions of all relevant skeleton points of the *i* frame in the original frames. Therefore, the vectors collection of the 3D coordinate data of every point of original frames is (*p*1, *p*2, ... , *p<sup>i</sup>*), *pi* ∈ *RN*. According to the K-means clustering algorithm, the data of skeleton points (*R<sup>N</sup>*) in the frames is clustered into *K* (*K* ≤ *N*) clusters as follows [44]:

Step 1: Randomly select *K* cluster centroids from *R<sup>N</sup>* are *u*1, *u*2 ... *uK*;

Step 2: Repeat the following process to ge<sup>t</sup> convergence.

For the *pi* corresponding to one frame, we calculated the distances from each cluster centroid (*uj*, *j* ∈ *K*) and classified it into the class corresponding to the minimum distance [45]:

$$D = \operatorname\*{argmin} \sum\_{i=1}^{N} \sum\_{j=1}^{K} \left\| p^i - u\_j \right\|^2 \tag{14}$$

In this equation, *D* represents the minimum distance between the cluster centroid and the centre of *pi*, and when *D* is the smallest, *pi* is classified into class *j*.

For each class *j*, the cluster centroid (*uj*) of that class was recalculated:

$$\mu\_j = \frac{\sum\_{i=1}^{N} r\_{ij} p^i}{\sum\_{i=1}^{N} r\_{ij}} \tag{15}$$

In this equation, *rij* indicates that when *pi* is classified as *j*, it is 1; otherwise, it is 0.

#### 2. Key-frames extraction

Using the above k-means clustering algorithm, we extracted K cluster centroids from the original frame. Each cluster is clustered from the 3D coordinates of the 17 points in the original frames. Therefore, one cluster centroid is constructed with 51 (17 × 3) vectors. Based on these cluster centroids, we extracted the key-frames by calculating the Euclidean distance between the cluster centroid of each point and the corresponding point coordinates in the original frames. The steps to extract key-frames are as follows:

Start

> Input the 3D coordinate data of every point of the original frames:

$$\begin{aligned} \{p^1, p^2 \dots p^i\}, \quad & p^i \in \mathbb{R}^N; \\ & p^i = (p^i\_{1'}, p^i\_2 \dots p^i\_j), \; j = 17; \\ & p^i\_j = [\mathbf{x}^i\_{j'}, \mathbf{y}^j\_{j'} \mathbf{z}^i\_j] \end{aligned} \tag{16}$$

and the number of key-frames to be extracted is *K*;

Step 1: Using the k-means clustering algorithm to calculate cluster centroids of the *K* clusters are expressed as:

$$\begin{aligned} u\_{\mathfrak{m}} &= (u\_{m1}, u\_{m2} \dots u\_{mj}), \; m \in (1, 2, 3 \dots K), \; j = 17; \\ u\_{\mathfrak{m}j} &= [x\_{\mathfrak{m}j}, y\_{\mathfrak{m}j}, z\_{\mathfrak{m}j}] \end{aligned} \tag{17}$$

Step 2: Calculate the Euclidean distance of 3D coordinates between each point of the cluster and the corresponding point of the original frames:

$$\begin{array}{lcl}\mathsf{C}\_{m} & = \min(\boldsymbol{\mu}\_{m\prime}\boldsymbol{p}^{i}) \\ & = \sum\_{j=1}^{17} \min(\mathrm{dis}(\boldsymbol{u}\_{mj\prime}\boldsymbol{p}^{i}\_{j})); \\ \mathrm{dis}(\boldsymbol{u}\_{m\prime j}\boldsymbol{p}^{i}\_{j}) & = \left\|\boldsymbol{u}\_{mj} - \boldsymbol{p}^{i}\_{j}\right\|^{2} \end{array} \tag{18}$$

min(*dis*(*umj*, *pi j*)) means that after calculating the distances between m cluster and all original frames, the *j* point of pi which value of *dis*(*umj*, *pi j* ) is minimum is recorded as 1; otherwise, it is recorded as 0. *i* of *pi* corresponding to the maximum value of *Cm* is a sequence of key-frames.

Step 3: Sequences of key-frames are arranged from small to large after extraction. If the first frame and the last frame in the original frames are not included in the key-frames, the first frame and the last frame must be added into key-frames.

End

In this key-frames extraction, the number of key-frames can be preset. The key-frames of the corresponding compression rate is obtained by presetting the compression rate as follows [42]:

$$K = \underline{c}\_{\text{\textquotedblleft}} \underline{rate} \ast N \tag{19}$$

where *K* is the number of key-frames to be extracted, *c\_rate* is the compression rate of the key-frame to be obtained, and *N* is the total number of original frames.

After extracting key-frames, we continued with the ways to motion reconstruction and evaluate reconstruction error as described above.

#### 2.4.3. Evaluate Motion the Accuracy of Motions Data

In this study, we referred to previous studies [13,46] to evaluate the motion accuracy of student motions by assessing the di fferences between students' motions and teacher's motions. Due to the di fference in speed between individual movements, di fferent time series were considered when assessing the di fference between two motions. We chose DTW, a well-established method, to account for di fferent time series to evaluate the di fference in the motions between teachers and students [47]. Since DTW compares the other methods, i.e., HMM and SAX, without a training stage, the taken time is shorter. First, the derived quaternions were normalized in unit length of a quaternion: *q* = [*<sup>w</sup>*, *x*, *y*, *z*] can be described as: ||*q*|| = 1 and *w*<sup>2</sup> + *x*2 + *y*2 + *z*2 = 1. Therefore, three components (*<sup>x</sup>*, *y*, *z*) out of the four components ( *w*, *x*, *y*, *z*) of the quaternions can be used to represent the rotations of the skeleton points over a temporal domain. Then, we used DTW to evaluate the di fference between two sequences of motions on the skeleton points. First, we assessed the di fference between two motions on a single skeleton point. For example, there are two motion data on quaternions for a skeleton point from a teacher and a student, one from the teacher: *qtea*(*t*), one from a student: *qstu*(*t*). The length of the two sequences of quaternions are *n* and *m*:

$$\begin{array}{l} q\_{\text{tar}}(t) = q\_{\text{ter}}(1), q\_{\text{tar}}(2), \dots, q\_{\text{tar}}(i), \dots, q\_{\text{tar}}(n) \\ q\_{\text{stu}}(t) = q\_{\text{stu}}(1), q\_{\text{stu}}(2), \dots, q\_{\text{stu}}(j), \dots, q\_{\text{stu}}(m) \end{array} \tag{20}$$

The vector in the quaternion arrays consists of three components (*<sup>x</sup>*, *y*, *z*) of quaternions. A distance matrix (*n* × *m*) is constructed to align the quaternions of two sequences. The elements (*i*, *j*) in the matrix represent the Euclidean distance: *dis*(*qtea*(*i*), *qstu*(*j*)) between the two points *qtea*(*i*) and *qstu*(*j*):

$$\text{dis}(q\_{\text{tar}}(i), q\_{\text{stu}}(j)) = \left| q\_{\text{tar}}(i) - q\_{\text{stu}}(j) \right|^2 \tag{21}$$

*Sensors* **2020**, *20*, 6258

In the distance matrix, many paths are from the upper-left corner to the lower-right corner of the distance matrix. We used Φ*k* to represent any point on these paths: Φ*k* = (Φ*tea*(*k*), Φ*stu*(*k*)) where:

Φ*tea*(*k*): the value of *k* is 1, 2, ... , *n*,

Φ*stu*(*k*): the value of *k* is 1, 2, ... , *m*,

Φ*k*, the value of *k* is 1, 2, ... , *T*, (*T* = *n* × *m*)

We found a suitable path as the warping path, where the cumulative distance of path is the smallest of all paths [39]:

$$DT\mathcal{W}(q\_{\text{tar}}(t), q\_{\text{stu}}(t)) = \min \sum\_{k=1}^{T} \text{dis}(\Phi\_{\text{tar}}(k), \Phi\_{\text{stu}}(k)) \tag{22}$$

Then, the distance of *DTW*(*qtea*(t),*qstu*(*t*)) is obtained through dynamic programming as follows [47]:

$$\begin{aligned} DTN(q\_{\rm flux}(t), q\_{\rm flux}(t)) &= f(u, m); \\ f(0, 0) &= 0; \\ f(0, 1) &= f(1, 0) = \text{ov}; \\ f(i, j) &= \text{dis}(q\_{\rm flux}(i), q\_{\rm flux}(j)) + \min\{f(i - 1, j), f(i, j - 1), f(i - 1, j - 1)\} \nu(i - 1, 2, \dots, w; j - 1, 2, \dots, m) \end{aligned} \tag{23}$$

To prevent the wrong matching by excessive time warping, the warping path was constrained near the diagonal of the matrix by setting the global warping window for DTW [48,49]. In this study, the global warping window is set as 10 percent of the entire window span: 0.1 × max(*<sup>n</sup>*, *m*). The cumulative distance of the warping path represents the difference of rotation between teacher and student on the skeleton points is shown in Equation (22). Then, the macro difference between students' motions and teacher's motions was evaluated by taking the average of the cumulative distances of all the skeleton points as follows:

$$D(m\_{\text{ten}}, m\_{\text{stu}}) = \frac{\sum\_{i=1}^{n} DT\mathcal{W}(q\_{\text{tea}}^{i}, q\_{\text{stu}}^{i})}{n} \tag{24}$$

In this equation, *m*tea represents the teacher motion sequence; *m*stu represents the students' motion sequence, *qi* is the vectors of the quaternion of *i* skeleton point in the two motion sequences, and the total number of skeleton points is *n*.

Finally, data of the differences were analysed using IBM SPSS Statistics 25.0 to assess if there were significant differences in the motion accuracy of the two groups of students (novice and senior students) on the whole and each point. We used the independent sample T-test on data with normal distribution and the Mann–Whitney U test on data with non-normal distribution.
