**4. Discussion**

The purpose of this investigation was to establish the validity and reliability of two versions of the Microsoft Kinect for measuring UE and trunk kinematics during various reaching conditions. Specifically, participants were asked to perform both a non-extended and extended reach in each of three directions (forward, scaption, lateral) while their movements were recorded by the K1, K2, and the gold-standard VMC simultaneously. The K2 measured the trunk more similarly to the VMC as shown by smaller average magnitude differences in trunk flexion and lateral flexion. Validity results for trunk measurement were excellent for the K2 and modest–excellent for the K1 during extended reaching conditions intended to simulate movements that might be used by persons with chronic stroke. Reliability for trunk measurement was modest–excellent for extended reaching with the K1, with the exception of the forward direction, but varied from poor to excellent for the K2. Results for both sensors were generally excellent for measuring arm and hand displacement, excellent for measuring elbow flexion, and mixed for shoulder measurement, with reaches in the scaption and lateral directions providing more valid and reliable results than the forward direction.

The results of this study are supported by previous research that examines the validity of the K1 and K2 in terms of other functional movements. Bonnechere and colleagues [9] found similar results when comparing the K1 to VMC during the performance of four functional movements including shoulder abduction (similar to lateral reaching) and elbow flexion (similar to forward reaching). Clark and colleagues [11] found the K2 to have excellent concurrent validity for measuring trunk movements during dynamic balance tasks and anterior–posterior movements, but poor–moderate validity for static tasks and medial–lateral movements. In the current investigation, the K2 similarly shows the greatest validity for measuring trunk flexion during an extended movement in the anterior–posterior direction. Reither et al. [12] found similar results while measuring the K1, K2, and VMC simultaneously with a single participant reaching forward, reaching to the side, and performing shoulder movements in various planes, but did not investigate trunk kinematics during such movements. In summary, Reither et al. [12] similarly found a greater range in single-day correlations between K1 and VMC (r = 0.31–0.96) than between the K2 and VMC (r = 0.45–0.96) with correlation magnitudes dependent on movement plane. The authors also found varied day-to-day reliability results for both K1 and K2 and, in general, a greater direction-dependent underestimation of kinematics displayed by the K1 [12]. The current study goes beyond the methods of Reither et al. [12] by utilizing an increased sample size of participants and movements, the inclusion of extended reaches to elicit trunk compensations, analysis of the trunk along with the UE, and movements in the scaption plane along with sagittal, frontal, and transverse planes.

We found several low and negative reliability (ICC) values (Table 3), particularly for shoulder flexion, shoulder abduction, trunk flexion, and trunk lateral flexion during non-extended reaching in the forward and scaption directions for all sensors including VMC. Negative ICC values are not ideal and can often be attributed to low between-subjects variance in the phenomenon being measured [32]. Accordingly, these results might be due to small between-day variance in the kinematic variables being tested. For example, a negative ICC value (ICC = −0.53) was calculated for the K2 between days for trunk flexion during the extended forward reach, but Bland-Altman analysis shows a small mean bias (bias = −3.0◦) and LOA (LOA = −13.2–6.8◦). This suggests a relatively small mean difference, and thus satisfactory repeatability, between testing days even in the face of a negative ICC calculation that may be due to small and non-systematic variance. A more heterogeneous clinical population may improve correlation results by increasing variance in the sample. Pearson's correlations (Table A3) and Bland-Altman LOA (Table A4) were included to give a broader picture of absolute and relative reliability for all three sensors. Additional, more advanced analyses may also provide further insight into these discrepancies; for example, dynamic time warping (DTW) is an advanced signal processing technique that could provide a measure of signal match for the time series data collected by the K1 and K2 [33].

The most notable limitation to this work is the use of healthy participants rather than a sample of participants with hemiparesis. As mentioned previously, persons with hemiparesis reach significantly di fferently than unimpaired persons, namely with slower movement, less accuracy, impaired interjoint coordination, and increased use of compensatory movement at the trunk [22,23]. Targets placed beyond the reach of healthy participants can elicit a similar compensatory response at the trunk, but persons with hemiparesis exhibit less symmetry and earlier trunk recruitment in comparison [23]. Healthy reaching is simply not the same as hemiparetic reaching. However, the purpose of the current study is to validate the measurement capabilities of the K1 and K2 relative to each other and to a gold-standard VMC system. Numerous referenced studies use healthy participants for sensor validation with intentions for future clinical application [9–17]. Healthy participants are more accessible, can perform the large number of required movements without fatigue or pain, and can more readily reproduce movements across trials and testing days for validity and reliability analyses. Given that the ultimate application of this study is implementation for clinical measurement of neurologically impaired populations, the ecological validity of future work would greatly benefit from testing with a more heterogeneous sample of persons with hemiparetic stroke.

The current study provides some insights for the design of such future work; for example, it may be necessary to recruit more individuals and reduce the overall repetitions performed to better capture variability, mitigate fatigue, and enhance the generalizability of results for real-world clinical populations. In addition, the experimental protocol could be adjusted to provide detailed instruction and training for impaired populations to reduce trial variability and enhance the e fficiency of data reduction and cleaning. Given that the evidence shows that persons with hemiparetic stroke recruit the trunk earlier and more often than healthy populations [23], it may be necessary to eliminate or reduce the distance of the extended reach to maximize reaching performance and reduce frustration. Finally, given the results of the current study, it may be prudent to focus on the planes of movement best measured by the K1 and K2 due to their hardware constraints (e.g., lateral > scaption > forward).

Other variations in results might be attributed to various study limitations. First, the Kinect SDK uses a tracking algorithm that does not rely on the specific placement of markers on palpable bony landmarks as does the VMC. While this is convenient for users, it has been previously noted as a limitation in the Kinect's ability to accurately measure kinematics of movement due to variable body segmen<sup>t</sup> lengths; however, previous studies have developed algorithms through regression that may be able to correct for this during real-time tracking [9]. Second, it was clear through both observation and the relatively high standard deviations attributed to each movement (Table 1) that di fferent strategies were used for reaching by individual participants. No neutral starting point was defined a priori, and some participants returned their arm to their lap between repetitions while others remained in a flexed position. This resulted in large variations in range of motion, namely with elbow flexion. Finally, reliability results varied inconsistently for all three sensors, and it should be noted that, on top of statistical limitations, there are intra-individual di fferences across trials and across days in each participant's reaching kinematics. Participants were given similar instructions for each trial and testing day, but di fferences in the repeatability of human movement ye<sup>t</sup> exist and may be attributable to the slight variance in between-day correlation and significance testing. Participants were provided verbal instruction but no formal training at the simple reaching movements, so movement may have di ffered between movement sets and even testing days due to subtle learning e ffects. It is also possible that the placement of motion capture markers varied slightly between days, resulting in reliability di fferences. Increasing the overall sample size in the future could mitigate these intra- and inter-individual di fferences in repeatable movement.

This study shows that the K1 and K2 may serve as useful tools for objectively measuring UE and trunk kinematics, but application may depend on the body segment, joint, and movement plane of interest. Few studies have investigated their relative measurement properties, but both sensors are widely employed as the basis for VR-based interventions for persons with motor impairments including stroke and cerebral palsy [19,21]. Use of such interventions continues to grow along with client interest, professional knowledge, and technological accessibility [34]. The current investigation may inform future VR development, namely the inclusion of real-time measurement of trunk compensation using the K2.
