2.2.3. Motion Reconstruction from Motion Capture Data

Optical motion capture records the motion of entities (markers) that are external to the body, and the objective is to use the marker data to determine the positions and orientations of the body segments. The traditional approach for accomplishing this is to use the method described by Vaughan [37]: (i) select three non-collinear entities, which can be either markers or already located joints, within each body segment; (ii) define an orthogonal reference frame for the corresponding segment, based on the three selected entities; (iii) use correlation equations, based on archived anthropometric data and body measurements, to estimate the position and orientation of the body segment. When applying this method, marker trajectories are previously filtered with a low-pass filter (forward-backward 2nd order Butterworth filter), whose cutoff frequency must be selected by the analyst.

Another commonly used approach is to solve a weighted optimization problem, in order to fit the skeletal model to the measured markers, as done in the OpenSim software [40]. The fitting is carried out in two steps. First, a reference skeletal model, with virtual markers fixed to the anatomical points, is scaled in order to match the markers from a static capture, taken in a reference pose. Then, a second optimization problem finds the positions and orientations of the scaled body segments that best track the motion capture data. This last optimization uses an independent set of positions and orientations as design variables, which can be filtered and differentiated afterwards to find velocities and accelerations.

These methods present important drawbacks. In the first one, the local frames are obtained directly from the markers, which are not rigidly attached to the bones, so the obtained skeletal motion is not consistent with the rigid body constraints, i.e., the distances between joints do not remain constant. This can be addressed by enforcing kinematic consistency in a post-processing stage [41]. Another problem, common to both methods, is that they require clean capture data: marker trajectories must not contain any gaps, and the markers need to be properly labeled at every time step, something that is not always guaranteed by the motion capture system. The procedure used for fixing this problem usually involves some manual gap filling and marker labeling [20] and, in some cases of severe marker loss, it is not even possible to salvage a take. The main consequence of these drawbacks is the impossibility of knowing if a motion capture take has been successful until all post-processing has been carried out.

#### 2.2.4. Extended Kalman Filter for Motion Reconstruction

In order to overcome these drawbacks, a motion capture algorithm based on the extended Kalman filter has been developed. The filter uses a purely kinematic model for the plant, while the markers act as position sensors. The kinematic model mostly coincides with that described in Figure 4 but, in order to avoid the need for additional markers, the spherical joint at the base of the neck has been substituted by a universal joint, and metacarpophalangeal joints are modeled here as revolute pairs. Therefore, the kinematic model used in the Kalman filter has 52 degrees of freedom instead of 57. Since the Kalman filter requires using independent state variables, the position of the model must be defined by a set of independent coordinates, including 3 base body translations (pelvis), 2 relative angles at the toes, 2 relative angles at the base of the neck, and 45 Euler angles representing the absolute orientation of the remaining bodies.

The Kalman filter is based on a discrete white noise acceleration model (DWNA) [42], in which the plant is considered as a discrete-time state-space system,

$$\mathbf{x}\_{k+1} = \mathbf{F}\mathbf{x}\_k + \mathbf{F}\mathbf{a}\_k \tag{4}$$

where **x***k*+<sup>1</sup> and **x***<sup>k</sup>* are the state vector at time instants *k* + 1 and *k* respectively, **F** is the state propagation matrix, **a***<sup>k</sup>* is the process noise vector, and **Γ** is the noise gain matrix. The DWNA is a second-order kinematic model, so the state vector contains the 52 degrees of freedom, **<sup>q</sup>**, along with their first time derivatives, **. q**. Accelerations are introduced in the system through the process noise vector **a**. This vector contains the 52 independent

accelerations, being each of them a discrete-time zero-mean white sequence. Therefore, they are assumed to be constant along every time step, and their values are random variables with a zero-mean normal distribution of variance *σ*<sup>2</sup> *<sup>a</sup>* . This variance has dimensions of squared acceleration for the translational DOFs, and squared angular acceleration for the angular ones. In order to reduce the number of parameters, in this work the same numerical value will be used for all of them.

Taking into account that accelerations are assumed to remain constant along each time step, the state transition in Equation (4) particularized to any given DOF *i* is,

$$
\begin{bmatrix} q\_{k+1}^i \\ \dot{q}\_{k+1}^i \end{bmatrix} = \begin{bmatrix} 1 & \Delta t \\ 0 & 1 \end{bmatrix} \begin{bmatrix} q\_k^i \\ \dot{q}\_k^i \end{bmatrix} + \begin{bmatrix} \frac{1}{2}\Delta t^2 \\ \Delta t \end{bmatrix} a\_k^i \text{ i = 1, \dots, 52} \tag{5}
$$

where Δ*t* is the sampling period, which is fixed to 10 ms (the motion capture cameras used in this work have a maximum frame rate of 100 Hz). As can be seen in Equation (5), the state propagation and noise gain matrices defined for each DOF only depend on the time step Δ*t*, so they are constant and equal for all of them. Therefore, matrices **F** and **Γ** for the whole system are the result of assembling these individual matrices, following the structure of the state vector **x**.

The process noise covariance matrix **Q***<sup>i</sup>* has also the same form for all DOFs [42],

$$\mathbf{Q}^{i} = \begin{bmatrix} \frac{1}{4}\Delta t^{4} & \frac{1}{2}\Delta t^{3} \\ \frac{1}{2}\Delta t^{3} & \Delta t^{2} \end{bmatrix} \sigma\_{a}^{2} \tag{6}$$

The **Q** matrix for the whole system is the result of assembling these individual matrices, as done for the state transition and noise gain matrices.

The observation function **h**(**x**) provides the observation vector **z**, which in this case contains the absolute *x*, *y* and *z* coordinates of the 36 optical markers, as a function of the state vector **x**,

$$\mathbf{z}\_k = \mathbf{h}(\mathbf{x}\_k) + \mathbf{w}\_k \tag{7}$$

The additive term **w***<sup>k</sup>* represents the noise introduced by the motion capture system, along with the skin motion artifact. Since the latter is correlated to the skeletal motion, modeling the sensor noise as a random variable following a Gaussian distribution is not strictly correct, so the Kalman filter will not be optimal. All sensors are considered independent and equally affected by noise, so the observation noise covariance matrix **R** is a diagonal matrix, whose diagonal elements are all equal to the sensor noise variance *σ*<sup>2</sup> *s* , which has dimensions of squared length.

In order to compute the absolute marker positions from the system states, a recursive kinematic relationship can be established, as shown in Figure 5. The absolute position **z***<sup>i</sup>* of a marker *i*, which is attached to body *b* (right hand in Figure 5), can be obtained from the following recursive relationships,

$$\begin{aligned} \mathbf{z}\_{i} &= \mathbf{r}\_{b} + \mathbf{A}\_{b}\overline{\mathbf{m}}\_{i} \\ \mathbf{r}\_{b} &= \mathbf{r}\_{b-1} + \mathbf{A}\_{b-1}\overline{\mathbf{r}}\_{b} \end{aligned} \tag{8}$$

where **r***<sup>b</sup>* is the absolute position of the proximal joint of body *b*, **A***<sup>b</sup>* is the rotation matrix of the same body, which depends on its three orientation angles, and **m***<sup>i</sup>* is the local position vector of marker *i* within the local frame of body *b*. In the observation model, the markers are considered as rigidly attached to the skeleton, so **m***<sup>i</sup>* is a constant vector. The vector **r***<sup>b</sup>* itself can be obtained in a recursive way from the position vector **r***b*−<sup>1</sup> and orientation matrix **A***b*−<sup>1</sup> of the preceding body in the kinematic chain, knowing that **r***<sup>b</sup>* is the position of the proximal joint of body *b* in the local frame of *b* − 1, which, due to the rigid body assumption, is considered constant. This recursive process starts at the pelvis, whose position vector is contained directly in **q** and, consequently, in **x**.

**Figure 5.** Kinematic description of the observation function h(x).

The local position vectors **r***<sup>b</sup>* and **m***<sup>i</sup>* must be scaled prior to running the Kalman filter, in order for the model to adjust to the experimental data. This is performed by solving, at a reference pose, a nonlinear least squares optimization problem, in which the design variables are a set of scale factors **k** and the skeletal degrees of freedom **q**, being the objective function the quadratic error between measured and estimated marker positions,

$$\min\_{\mathbf{q}, \mathbf{k}} f(\mathbf{q}, \mathbf{k}) = \left[ \mathbf{h}\_{\mathfrak{a}}(\mathbf{x}, \mathbf{k}) - \mathbf{z} \right]^{T} \left[ \mathbf{h}\_{\mathfrak{a}}(\mathbf{x}, \mathbf{k}) - \mathbf{z} \right] \tag{9}$$

where **h***a*(**x**, **k**) is an augmented version of the observation function that also takes the scale factors as input variables. The resulting scale factors are then used to scale the **r***<sup>b</sup>* and **m***<sup>i</sup>* vectors that will be used in **h**(**x**). It has been found that the Levenberg-Marquardt algorithm works very well for this problem, converging in a very robust way even from rough initial estimates.

The Kalman filter algorithm follows a recursive predictor-corrector scheme. It uses the current estimate of the state vector, **x**ˆ *<sup>k</sup>*, along with the sensor measurements, **z***k*+1, to obtain an optimal estimate **x**ˆ *<sup>k</sup>*+<sup>1</sup> at the next time step. In the predictor stage, the state estimate is updated by means of the state transition matrix **F**, leading to the so-called *a priori* estimate **x**ˆ− *<sup>k</sup>*+1. The estimate covariance matrix **P** is updated accordingly, by using matrices **F** and **Q**,

$$\begin{aligned} \dot{\mathbf{x}}\_{k+1}^{-} &= \mathbf{F} \dot{\mathbf{x}}\_{k} \\ \mathbf{P}\_{k+1}^{-} &= \mathbf{F} \mathbf{P}\_{k} \mathbf{F}^{T} + \mathbf{Q} \end{aligned} \tag{10}$$

The state estimate at the first time step, **x**ˆ0, will contain the initial independent positions, **<sup>q</sup>**0, along with the corresponding velocities, **. q**0. The positions are obtained after solving the initial marker labeling problem, which will be described later. The initial velocities are unknown, as well as the value of **P**0, so they are both set to zero, but they converge quickly to their correct values after a short transient.

The corrector stage uses the sensor measurements **z***k*+<sup>1</sup> to find the optimal *a posteriori* estimate **x**ˆ *<sup>k</sup>*<sup>+</sup>1, as well as its corresponding covariance matrix, **P***k*<sup>+</sup>1,

$$\begin{aligned} \mathbf{K}\_{k+1} &= \mathbf{P}\_{k+1}^{-} \mathbf{H}\_{k+1}^{T} \left( \mathbf{H}\_{k+1} \mathbf{P}\_{k+1}^{-} \mathbf{H}\_{k+1}^{T} + \mathbf{R} \right)^{-1} \\ \dot{\mathbf{x}}\_{k+1} &= \dot{\mathbf{x}}\_{k+1}^{-} + \mathbf{K}\_{k+1} \left[ \mathbf{z}\_{k+1} - \mathbf{h} (\mathbf{x}\_{k+1}^{-}) \right] \\ \mathbf{P}\_{k+1} &= (\mathbf{I} - \mathbf{K}\_{k+1} \mathbf{H}\_{k+1}) \mathbf{P}\_{k+1}^{-} \end{aligned} \tag{11}$$

In these equations, **H***k*+<sup>1</sup> is the Jacobian matrix of the observation function, evaluated at **x**ˆ− *<sup>k</sup>*+1. This matrix can be computed very efficiently due to the recursive nature of **h**(**x**). Moreover, it is quite sparse, due to the usage of absolute angles as state variables: most rotation matrices will only depend on three angles, greatly simplifying their derivatives. In addition, the gradient used in the scale optimization problem shown in Equation (9) mostly coincides with this matrix, which is very convenient from the implementation point of view.

Since this algorithm is recursive and each step can be evaluated very efficiently, it can be used for real-time motion reconstruction and visualization, as opposed to the previously mentioned methods, which provide skeletal motion after post-processing the captured data. In order to achieve on-the-fly motion reconstruction, the marker labeling and occlusion issues must be addressed.

The problem of initial marker labeling is addressed by using a simple heuristic method to identify the markers. The procedure consists of checking their relative positions at the initial time step, according to a reference pose. Then, the same Levenberg-Marquardt optimization algorithm previously used for scaling the model is used here to fit the DOFs to the measured markers. This time, the objective function uses the regular observation function **h**(**x**) with the scale factors already applied to the local position vectors, so only the positions **q**<sup>0</sup> are considered as design variables. If the objective function value after the optimization (i.e., the fitting error) is below a certain threshold, the marker order is considered valid, and the iterative process of the Kalman filter can begin.

During the execution of the Kalman filter, marker labeling must be carried out on the fly, between the predictor in Equation (10) and the corrector in Equation (11). This is because, for several reasons, the raw measurement vector **z***<sup>r</sup> <sup>k</sup>*+<sup>1</sup> obtained from the cameras cannot be directly used within the corrector. First, the markers are provided as an unsorted list by the cameras. Second, some markers may be missing due to occlusions. Third, other bright objects present during the motion capture can be incorrectly identified as markers. Therefore, the raw measurement **z***<sup>r</sup> <sup>k</sup>*+<sup>1</sup> must be correctly labeled and sorted, the missing markers need to be identified, and all spurious markers have to be discarded, in order to get the "clean" measurement vector **z***k*<sup>+</sup>1. After the Kalman filter predictor has computed the a priori state estimate **x**− *<sup>k</sup>*+1, the observation function **h**(**x**) is evaluated at that point to obtain the corresponding set of estimated marker positions **z**ˆ *<sup>k</sup>*<sup>+</sup>1. Ideally, these estimated markers would coincide with the measured ones **z***<sup>r</sup> <sup>k</sup>*+1, and this fact can be used to identify the measured markers by using a simple, nearest-neighbor approach. First, a matrix of squared cross-distances **D** is built, such that

$$D\_{ij} = \left(\hat{\mathbf{z}}\_i - \mathbf{z}\_j^r\right)^T \left(\hat{\mathbf{z}}\_i - \mathbf{z}\_j^r\right) \tag{12}$$

where **z**ˆ*<sup>i</sup>* contains the estimated *x*, *y* and *z* coordinates of marker *i*, and **z***<sup>r</sup> <sup>j</sup>* is the position vector of measured marker *j*. By setting a maximum search distance, estimated markers that do not have a measured one close enough are considered as missing, and the remaining ones are assigned to their closest measured counterparts. Any marker from **z***<sup>r</sup> <sup>k</sup>*+<sup>1</sup> remaining unassigned, after all estimated markers have been either paired to their measured counterparts or marked as missing, are regarded as spurious, so they are discarded. In order to avoid resizing vectors and matrices at runtime, missing markers are set to zero in **z**, and the same is done to their corresponding rows in **H**, so they do not affect the correction.

The EKF can provide a smoothing effect depending on the tuning of its parameters, so in this case there is no need for filtering the marker trajectories. If the sensor noise variance *σ*<sup>2</sup> *<sup>s</sup>* is fixed to a constant value, the smoothing can be controlled by the process noise variance, i.e., the acceleration variance *σ*<sup>2</sup> *<sup>a</sup>* . Low values of the variance limit the accelerations the system can reach at every time step, thus having a smoothing effect on the resulting position histories, while high values of the variance allow for larger accelerations, so that the system can follow the sensors (i.e., the markers) more closely, at the expense of introducing sensor noise into the reconstructed motion. In this work, the accelerations are obtained by further filtering the independent positions, and differentiating them twice to obtain velocities and accelerations. There exist higher order state-space models that include accelerations in the state vector, but they present two major issues when used with position sensors only: the resulting accelerations are noisy and delayed, and some unwanted oscillations may appear in the resulting motion for certain values of the filter parameters.

The analyst has in this case two parameters for tuning the obtained accelerations: the process noise variance and the cutoff frequency of the Butterworth filter. In order to find their optimum values, the use of accelerometers can be of great help.

#### 2.2.5. Calculation of the Accelerations

In order to make the accelerations obtained from the optical system directly comparable to those obtained by the inertial sensors, it was necessary to transform the former into the local axes of the corresponding IMUs, and to add the gravity effect to them. Such an acceleration obtained from the optical system will be called hereafter the virtual acceleration, as it comes from a virtual accelerometer. It is possible to do the opposite, i.e., to rotate the IMU measurements to the global frame of reference instead, subtracting the acceleration of gravity afterwards. However, this procedure involves mixing data from both systems (to rotate the IMU accelerations to the global frame), so the first alternative seems more appropriate.

Figure 6 shows a segment or body of the multibody model of the human skeleton, where the black dots are joints connecting the segment with its neighbors, and the white dots are the markers attached to the segment. The small rectangle represents the IMU attached to the segment, which in turn has a marker attached to it, as also shown in Figure 3. The local reference frame of the body is denoted by B, a moving frame rigidly attached to the body, and its origin is defined in frame O by the position vector **r**<sup>B</sup> while the local position of the IMU in frame B is given by the constant vector **r***<sup>i</sup>* (*i* is the number of the IMU attached to that particular body). The following equation can be written,

$$\mathbf{r}\_{i} = \mathbf{r}\_{\mathrm{B}} + \mathbf{R}\_{\mathrm{OB}}\ddot{\mathbf{r}}\_{i} \tag{13}$$

where **r***<sup>i</sup>* is the position vector of the IMU in frame O and **R**OB is the rotation matrix between frames B and O. Then, **r***<sup>i</sup>* can be worked out as,

$$
\mathbf{\tilde{r}}\_i = \mathbf{R}\_{\text{OB}}^\mathrm{T} (\mathbf{r}\_i - \mathbf{r}\_\mathrm{B}) \tag{14}
$$

**Figure 6.** Kinematics at segment level to get the virtual acceleration (acceleration of the point of the body where the IMU is attached) from the optical motion capture recordings. The black dots represent joints connecting the segment with its neighbors. The white dots represent the markers attached to the segment.

Regarding the orientations, the following relation stands,

$$\mathbf{R}\_{\rm OB}\overline{\mathbf{R}}\_{\rm BI} = \mathbf{R}\_{\rm OE}^{\rm I}\mathbf{R}\_{\rm EI}^{\rm I} \tag{15}$$

being **R**BI the unknown constant rotation matrix between frames B and I. Superscript *i* in the rotation matrices of the right-hand side refers to the number of the IMU attached to that particular body. From Equation (15), the constant rotation matrix **R**BI can be worked out as,

$$
\overline{\mathbf{R}}\_{\rm BI} = \mathbf{R}\_{\rm CB}^{\rm T} \mathbf{R}\_{\rm CE}^{\rm I} \mathbf{R}\_{\rm EI}^{\rm I} \tag{16}
$$

The two constant terms worked out in Equation (14) and Equation (16), respectively, must be determined in order to obtain, later on, the virtual acceleration. To this end, a capture of a static pose of the subject is recorded by both the optical and inertial systems. From the positions of the markers, **r***i*, **r**<sup>B</sup> and **R**OB can be derived, so that **r***<sup>i</sup>* is calculated from Equation (14). On the other hand, the constant matrix **R***<sup>i</sup>* OE had been obtained in the calibration process carried out during the preliminary test, while **R***<sup>i</sup>* EI can be derived from the orientation provided by the IMU, so that **R**BI is calculated using Equation (16). It must be noted that this is the only point, along the process of getting the virtual acceleration, in which the orientation provided by the IMU is used. However, this does not induce a significant error, since the estimated orientations are much more accurate in static conditions.

Once the constant terms **r***<sup>i</sup>* and **R**BI have been determined in the described preprocess, the history of the virtual acceleration can be derived from the info recorded by the optical system. At each time point, the global acceleration of the point where the IMU is attached, expressed in frame O, can be obtained by differentiating Equation (13) twice with respect to time, **..**

$$
\ddot{\mathbf{r}}\_i = \ddot{\mathbf{r}}\_\mathbf{B} + \ddot{\mathbf{R}}\_\mathbf{OB} \mathbf{\tilde{r}}\_i \tag{17}
$$

where **.. <sup>r</sup>**<sup>B</sup> and **.. R**OB are calculated as the second derivative with respect to time of the position data obtained from the optical motion capture. The virtual acceleration, still expressed in frame O, is obtained by including the gravity effect into the acceleration given by the Equation (17),

$$\mathbf{a}\_{i} = \ddot{\mathbf{r}}\_{i} + \mathbf{g} \tag{18}$$

being **g** the gravity vector (9.81 m/s<sup>2</sup> in the positive vertical direction, as it would be perceived by the IMU). To get the virtual acceleration, vector **a***<sup>i</sup>* must be expressed in the local frame of the IMU, I,

$$\overline{\mathbf{a}}\_{i} = \mathbf{R}\_{\text{OI}}^{\text{T}} \mathbf{a}\_{i} \tag{19}$$

with,

$$\mathbf{R}\_{\rm OI} = \mathbf{R}\_{\rm OB} \overline{\mathbf{R}}\_{\rm BI} \tag{20}$$

where **R**OB is calculated from the optical motion capture. Compacting Equations (17)–(20) into a single expression, the virtual acceleration can be written as,

$$\overline{\mathbf{a}}\_{i} = \overline{\mathbf{R}}\_{\text{BI}}^{\text{T}} \mathbf{R}\_{\text{OB}}^{\text{T}} \left( \ddot{\mathbf{r}}\_{\text{B}} + \ddot{\mathbf{R}}\_{\text{OB}} \ddot{\mathbf{r}}\_{i} + \mathbf{g} \right) \tag{21}$$

Therefore, the acceleration directly measured by the IMU can now be compared to the virtual acceleration provided by the Equation (21) from the measurements of the optical system, and the filtering parameters applied in the latter can be adjusted so as to yield the optimal correlation. The error was measured as the root-mean-square error (RMSE) between the histories of the two accelerations compared, the results being shown in Section 3.

#### **3. Results**

#### *3.1. Preliminary Test and Calibration*

As explained in Section 2, the orientation error committed by the *i*th IMU, *i* = 1, ... ,9, at each time point, was obtained by calculating the roll, pitch and yaw angles of **R***<sup>i</sup>* OI, and comparing them with the roll, pitch and yaw angles of **R**OI, taken as reference. Figure 7 shows the error incurred by each IMU in roll, pitch and yaw angles, along the time of the calibration experiment. Maximum errors of 19◦ in yaw (around the vertical axis) with respect to the reference (optical system) were found, while mean error differences of up to 4◦ were detected among IMUs. Similar results were obtained for **R***<sup>i</sup>* OE, the rotation matrix between frames E and O for the inertial sensor *i*: differences of up to 8◦ in yaw were detected among IMUs.

**Figure 7.** Orientation errors in roll, pitch and yaw incurred by the nine IMUs with respect to the optical system (reference).

As explained in the last paragraph of Section 2.1.2, the acceleration of point 4 of the wooden plate was obtained in three different ways: (i) from the optical system, using the marker attached to point 4; (ii) from the inertial system, using the orientation provided by the inertial system; (iii) from the inertial system, using the orientation provided by the optical system. A forward-backward 2nd order Butterworth filter with a cutoff frequency of 8 Hz was applied to the optically captured trajectories of the markers, while no filtering was applied to the inertial measurements. Figure 8 gathers the global components, expressed in frame O, of the three accelerations. While the *x*- and *z*-components are similar, significant discrepancies are observed between 15 and 20 s for the *y*-component, with a maximum error of 1.9 m/s2 when using the orientation provided by the IMU. Moreover, the accelerometer shows some peaks that are not captured by the optical system, for instance when the plate touches the ground after the 30 s mark. Due to the low sampling rate of the optical system, it cannot capture high-frequency events such as impacts, regardless of the filter cutoff frequency.

**Figure 8.** Comparison of the global acceleration of point 4 of the wooden plate, obtained by three methods: from the optical system (red); from the inertial system with the orientation provided by the inertial system (blue); from the inertial system with the orientation provided by the optical system (black).
