**4. Discussion**

We developed a DTW-based algorithm for assessing motion similarity between an individual user and a virtual coach. DTW was designed to handle local changes in timing (due to speed variations) and, therefore, desirable for evaluating rehabilitation exercises for the elderly self-care at home. The effectiveness of the algorithm was validated through a follow-up experiment. In the validation experiment, the Tai Chi exercise was chosen as the representative physical exercise to verify the proposed algorithm due to two major reasons. First, the effectiveness of Tai Chi exercise for improving physical functions has been proven by many previous studies [43,44]. Second, Tai Chi exercise is a complex and whole-body motion. If the algorithm could perform well in terms of evaluating Tai Chi motion, it should be generalizable to other simpler rehabilitation exercises.

Inter-rater reliability analysis revealed that the reliability level of experts' ratings was "good" (0.75 < ICC = 0.861 < 0.90). However, the 95% confidence interval of ICC was wide (0.688–0.942), which indicates that, in the worst case, the reliability level was just "acceptable" (0.5 < ICC = 0.688 < 0.75). The wide confidence interval warned that, even though the overall agreement were high among three experts, there were non-negligible disagreement on their ratings [45]. Paired t-tests also confirmed the significant difference on performance ratings between the third expert and the other two experts

(Figure 5). The inconsistency on subjective ratings from three experts highlighted the potential benefits of applying our developed algorithm to assess the exercise performance automatically and objectively.

Strong linear relationship (r = 0.86) between the algorithm score and experts' evaluation (gold standard) implied the developed algorithm was sensitive in terms of recognizing the performance levels from different subjects as the domain experts. Unexpectedly, a detailed analysis revealed that the algorithm score was significantly higher than the experts' rating. This could be mainly due to different baselines for two evaluation methods. The algorithm evaluation was purely based on the sum of angle differences among nine corresponding body vectors and the subjects with all angle differences at 90 degrees were considered as the worst (performance score = 0). Since even the subjects rated by the experts as the worst in terms of motion performance had most of the angle differences within 45 degrees, the algorithm evaluation would overestimate the subject's performance score due to the ceiling effect [46]. To reduce this overestimation and enable our algorithm to provide similar scores as the domain experts, the linear regression equation (Figure 6) was applied to calibrate the algorithm score. The experimental results showed that the calibrated algorithm scores were comparable to the experts' ratings. Taken together, these findings demonstrated that, even though the developed DTW-based algorithm could be a good evaluation tool to rank the exercise performance among different subjects objectively, the algorithm score should be calibrated by experts' ratings on a small number of representative subjects. In this way, the good consistency between algorithm evaluation and experts' evaluation can be achieved for the practical applications.

Earlier studies used binary classification as well as three-point and four-point Likert scales to obtain experts' ratings for validating their algorithms [24–28]. This kind of validation is rough and likely results in inflated validation accuracy because of the wide performance range between two consecutive points, especially for binary classification and a 3-point Likert scale. To the best of our knowledge, there was only one reported study, which also used 0–100 score as the experts' rating as we did to validate the developed algorithm [21]. However, the highest correlation coefficient between their DTW-based algorithm score and expert's rating was 0.64, which was much lower than ours (r = 0.86). The improved performance from our study could be related to the selection of different motion features. Instead of using simple joint angles as Capecci et al. [21], 3D bone vectors of human skeletons were chosen in our study for better conservation of spatial information of the motion because joint angles could not define spatial information of two bones connected by the same joint. In addition, since there always exist theoretical upper and lower bounds (180 and 0 degrees) for any angle difference between two corresponding bone vectors, converting the DTW matching cost to a final percentage score is straightforward and reasonable in our study. It does not require training data and experience of experts. In this study, we assumed the upper bound was 90 degrees instead of 180 degrees based on an earlier study [18] and our practical exercise scenario.

It is worthwhile to mention that elimination of the confounding effect caused by body orientation offset is a major challenge for the algorithm development. In fact, both bone vector-based and joint position-based algorithms are very sensitive to the body orientation especially when evaluating complex whole-body exercises with rotational motions. Chua et al. [47] also pointed out this issue when they evaluated Tai Chi motion. We calculated the joint positions and bone vectors based on the local coordinate system of the human model instead of the world coordinate system in real-time, which can get rid of the error induced by the body orientation offset during the entire exercise. The compensation of body orientation offset had practical meaning for the elderly because they might not be able to orient themselves precisely as the standard virtual coach during the rehabilitation exercise.

In order to further examine the use and acceptance of exergaming technology for home-based physical rehabilitation by the primary target users (older people), we applied the technology acceptance model [48–51] and designed a questionnaire with 11 constructs (Appendix A) to evaluate user acceptance of our developed Tai Chi exergaming prototype system (Figure 3). Forty-one older adults (age 77.3 ± 5.4 years, height 159.3 ± 8.5 cm, weight 59.0 ± 9.5 kg) from a local senior welfare center participated in this survey. They were asked to try the prototype system and play the Tai Chi exergame before giving their questionnaire responses in a five-point Likert scale (1 corresponding to "strongly disagree" and 5 corresponding to "strongly agree"). Figure 8 presents a summary of their responses. The results showed that the older people perceived relatively high vulnerability (3.21 out of 5) and severity (3.63) in terms of difficulties in self-care and independent living, and they had high intentions (Behavior intention = 4.08) to use our system in the future. They thought our system was very useful (Perceived usefulness = 4.43), positive (Attitude = 4.29), entertaining (Hedonic motivation = 3.82) and having low privacy risk (Perceived privacy risk = 1.18). Interestingly, even though the older people were somewhat confident in their capabilities to use this system for improving their health conditions (Self-efficacy = 3.75), the expected effort (3.07) and response cost (2.72) were considerably high. Taken all together, these findings implied that our developed Tai Chi exergaming prototype system is useful for the older people performing home-based physical rehabilitation exercises. However, the prototype system needs be improved to make it easy to use and cost-effective. We collected some valuable feedback from the participants to improve our prototype system, which mainly includes the following: (1) Audio effects should be added to make the exergame more entertaining and enjoyable. (2) The standard pace of Tai Chi exergame should slow down and be adjustable by each individual. (3) The size of avatar should be enlarged to be seen clearly and timely feedback for problematic motions should be provided, and (4) social networking functions (such as sharing exergaming performance score with friends) should be further developed.

**Figure 8.** Results of the user acceptance questionnaire for Tai Chi exergaming prototype system. Remarks: (1) Scores from all older participants were averaged for each construct. (2) Except perceived privacy risk and response cost, all constructs are positively associated with user's intention to adopt the system.

There were several limitations in the current study. First, the conversion from DTW distance to a percentage score (0–100%) is based on the assumption that the maximum angle difference between two corresponding bone vectors is 90 degrees. Even though this assumption works fine for most body parts, the exact value of 90 is not always appropriate. Second, we focused on motion correctness for the performance evaluation in this study. The rhythm mismatch was not yet considered in the overall performance evaluation [28]. In addition, detailed feedback for problematic motions from certain body parts should be provided in the future study to timely inform the older individual for further improvements in rehabilitation exercises. Third, even though the primary target users for our developed Tai Chi exergaming system are older adults, 21 participants for validating DTW-based algorithm included both middle-aged and older adults, in order to cover a wide range of Tai Chi proficiency levels under practical constraints. Our next step will be to refine the developed prototype system and test it with a large number of older adults at the home environment for verifying practicality of the system. Last but not least, a single Kinect sensor often generates poor skeleton tracking performance for some rotational motions during rehabilitation exercises due to self-occlusion and limited sensing range [37]. Further research on combining data from multiple Kinect sensors to achieve more accurate and robust skeleton tracking performance is needed.
