Gait and Balance Assessments with Augmented Reality Glasses in People with Parkinson’s Disease: Concurrent Validity and Test–Retest Reliability

van Bergem, Jara S.; van Doorn, Pieter F.; Hoogendoorn, Eva M.; Geerse, Daphne J.; Roerdink, Melvyn

doi:10.3390/s24175485

Open AccessArticle

Gait and Balance Assessments with Augmented Reality Glasses in People with Parkinson’s Disease: Concurrent Validity and Test–Retest Reliability

by

Jara S. van Bergem

^*

,

Pieter F. van Doorn

,

Eva M. Hoogendoorn

,

Daphne J. Geerse

and

Melvyn Roerdink

Department of Human Movement Sciences, Faculty of Behavioural and Movement Sciences, Vrije Universiteit Amsterdam, Amsterdam Movement Sciences, 1081 BT Amsterdam, The Netherlands

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(17), 5485; https://doi.org/10.3390/s24175485

Submission received: 19 June 2024 / Revised: 1 August 2024 / Accepted: 21 August 2024 / Published: 24 August 2024

(This article belongs to the Collection Sensors for Gait, Human Movement Analysis, and Health Monitoring)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

State-of-the-art augmented reality (AR) glasses record their 3D pose in space, enabling measurements and analyses of clinical gait and balance tests. This study’s objective was to evaluate concurrent validity and test–retest reliability for common clinical gait and balance tests in people with Parkinson’s disease: Five Times Sit To Stand (FTSTS) and Timed Up and Go (TUG) tests. Position and orientation data were collected in 22 participants with Parkinson’s disease using HoloLens 2 and Magic Leap 2 AR glasses, from which test completion durations and durations of distinct sub-parts (e.g., sit to stand, turning) were derived and compared to reference systems and over test repetitions. Regarding concurrent validity, for both tests, an excellent between-systems agreement was found for position and orientation time series (ICC_(C,1) > 0.933) and test completion durations (ICC_(A,1) > 0.984). Between-systems agreement for FTSTS (sub-)durations were all excellent (ICC_(A,1) > 0.921). TUG turning sub-durations were excellent (turn 1, ICC_(A,1) = 0.913) and moderate (turn 2, ICC_(A,1) = 0.589). Regarding test–retest reliability, the within-system test–retest variation in test completion times and sub-durations was always much greater than the between-systems variation, implying that (sub-)durations may be derived interchangeably from AR and reference system data. In conclusion, AR data are of sufficient quality to evaluate gait and balance aspects in people with Parkinson’s disease, with valid quantification of test completion durations and sub-durations of distinct FTSTS and TUG sub-parts.

Keywords:

augmented reality; Parkinson’s disease; gait and balance tests; concurrent validity; test–retest reliability

1. Introduction

Impaired posture, gait, and balance control are motor symptoms commonly observed in people with Parkinson’s disease [1]. Assessing such motor symptoms may offer healthcare professionals valuable insight into disease progression and patients’ daily life challenges, which may inform tailored treatment strategies. To this end, the Five Times Sit To Stand (FTSTS) and Timed Up and Go (TUG) tests are commonly performed [2,3]. In FTSTS, the test person is asked to stand up and sit down from a chair five times as quickly as possible with their arms crossed. Longer FTSTS completion times are associated with reduced leg muscle strength, impaired balance, and increased fall risk [4]. TUG is probably the most used clinical gait and balance test, combining distinct gait and balance aspects in a single test like transfers, gait initiation, walking, and turning. In TUG, the test person is asked to stand up from an armchair, walk 3 m, turn around, walk 3 m back, and sit back in the chair. Longer TUG completion times are associated with poorer muscle strength, poorer balance, slower gait speed, and increased fall risk [5].

The standard outcome of both FTSTS and TUG is the duration of test completion, as measured by the clinician with a stopwatch. The drawback of this method is that handling the stopwatch may hinder the clinician from fully concentrating on observing the patient for safety and visual assessment. Moreover, the stopwatch score only provides an indication of overall test completion durations while remaining blind for specific limitations in distinct sub-parts of the test, like turning or sit to stand parts. To alleviate such inherent drawbacks, (automated) instrumented tests have been introduced using sensor technology, such as body-worn IMU sensors and external marker-based or markerless 3D motion registration systems (e.g., [6,7]), providing information about the execution of (sub-parts of) the tests that may allow for a more specific assessment and treatment of motor impairments.

Augmented reality (AR) glasses, like Microsoft’s HoloLens 2 and Magic Leap 2, represent a promising emerging technology for two reasons. First, they act as a movement registration system, uniquely providing both position and orientation data in 3D using visual Simultaneous Localization and Mapping (vSLAM) algorithms [8], and second as an instrument to potentially (self-)administer tests in a standardized manner, using 3D holographic AR content to set test constraints (e.g., present a holographic pylon at 3 m from the chair indicating where the test person should turn in the TUG test) and to provide standardized instructions (e.g., ‘stand up from the chair and complete the test in 3-2-1-go’). These capabilities can enhance the evaluation of gait and balance by providing more precise and detailed measurements of the movements during (sub-parts of) the tests, thereby potentially identifying specific changes in motor function that could be critical for specific intervention strategies. Early research with healthy adults by Sun and colleagues [9] has already explored the potential of 3D position and orientation data of HoloLens 1 for deriving TUG test completion durations in comparison to stopwatch-based durations (with excellent between-systems agreement) and durations derived from IMU data (with good between-systems agreement). More recent work with healthy adults by Koop and colleagues [10] demonstrated statistical equivalence between TUG turning parameters derived from HoloLens 2 AR data and 3D motion registration data as a reference. Despite their apparent potential, AR glasses have not been validated for assessing TUG and FTSTS in clinical populations, like people with Parkinson’s disease.

The objective of this study was to evaluate concurrent validity and test–retest reliability of FTSTS and TUG tests in people with Parkinson’s disease using HoloLens 2 and Magic Leap 2 AR glasses. Specifically, we first examined the agreement between AR position and orientation time series and counterparts from reference motion registration systems, for which a good-to-excellent agreement is expected. Subsequently, we derived test completion durations as well as sub-durations for distinct sub-parts of the tests from these time series and evaluated concurrent validity in terms of between-systems absolute-agreement statistics (i.e., intraclass correlation coefficient (ICC), bias, limits of agreement) of test completion durations derived from AR data and the stopwatch (as clinical gold standard), as well as the between-systems absolute agreement of AR (sub-)durations with counterparts from reference systems. Finally, to help interpret the so-obtained between-systems absolute agreement statistics, we determined within-system test–retest reliability in terms of ICC, bias, and limits of agreement. We expected that (sub-)durations can be validly and reliably derived from AR data in people with Parkinson’s disease, with better between-systems than within-system absolute agreement statistics.

2. Materials and Methods

2.1. Subjects

A convenience sample of 22 subjects diagnosed with Parkinson’s disease, who were capable of walking independently for over 30 min, participated in this study. See Table 1 for a detailed overview of participant characteristics. Participants did not have any other neurological or orthopedic conditions that would significantly affect their walking ability. Their cognitive function was sufficient to understand the instructions provided by the researchers. Participants did not report experiencing hallucinations and had no visual or hearing impairments.

2.2. Experimental Set-Up and Procedures

Participants maintained their regular daily medication schedule and were invited for one visit to the gait laboratory at the Vrije Universiteit Amsterdam (VU). Participants completed the FTSTS test first, followed by the TUG test during one measurement session, while wearing either HoloLens 2 (N = 12) or Magic Leap 2 (N = 10) AR glasses, block randomized over participants. To evaluate test–retest reliability of AR data-derived (sub-)durations, both the FTSTS and TUG tests were performed for a second time in the same measurement session, maintaining the order of first the FTSTS test and then the TUG test. Participants initiated the FTSTS and TUG tests from a seated position with their backs touching the chair and concluded the test in the same position. A stopwatch was used to register the durations of FTSTS and TUG test completion. For FTSTS, participants were instructed to stand up and sit down five times in a sequential manner as quickly as possible, while keeping their arms crossed on their chest, without touching the backrest of the chair for all but the last sit down movement (Figure 1a). For TUG, participants were asked to perform the following sequence of actions: stand up from a standard armchair, walk 3 m, turn around, walk back to the chair, and finally, transfer from a standing to a sitting position with a turn (Figure 1b). Additionally, two reference motion registration systems were used to evaluate the concurrent validity of the AR time series and AR data-derived (sub-)duration outcomes: (1) a Microsoft Kinect v2 sensor (Kinect) as part of the Interactive Walkway (Tec4Science, VU Amsterdam, [12,13]) to record 3D position data of various body points, of which we used the head, sternum, and spine base and (2) an Inertial Measurement Unit (IMU; McRoberts B.V., The Hague, The Netherlands) worn on the lower back to record trunk orientation. None of the participants experienced freezing of gait during the tests, as observed by the researchers.

2.3. Data Acquisition

HoloLens 2 and Magic Leap 2 are state-of-the-art AR glasses registering their 3D position and orientation with regard to their surroundings at a sampling rate of 30 and 60 Hz, respectively. Specific 3D position and orientation time series contain features that are informative for distinguishing various sub-parts of the tests, like standing up, sitting down, and turning (as detailed in Section 2.4). In the Supplementary Material, videos are provided of the TUG and FTSTS tests’ performance, including a synchronized visualization of pertinent AR data. As a reference, we acquired data from IMU and Kinect sensor systems. Specifically, the IMU captured at 100 Hz the 3D accelerations and trunk rotation velocity time series, from which reference turning sub-durations of TUG were derived (as detailed in Section 2.4). The Microsoft Kinect computer vision sensor captured at 30 Hz in a markerless manner the 3D positions of various bodily points, including the head, sternum, and spine base, from which reference (sub-)durations of the FTSTS test were derived (as detailed in Section 2.4).

2.4. Data (Pre)Processing

The time series from each of the three motion registration systems was resampled to a constant rate of 60 Hz using linear interpolation and low-pass filtered using a fourth-order Butterworth filter with a cut-off frequency of 2 Hz. Temporal alignment in between-systems time series was obtained by incorporating the time lag of their maximal cross-correlation. The initial starting positions and orientations were subtracted from the time series.

2.4.1. Deriving (Sub-)Durations of the FTSTS Test

The AR and Kinect vertical position time series were used to determine the start and end of the FTSTS test (standing up and sitting down in a chair; Figure 2, vertical dashed lines). Specifically, we calculated zero crossings in the vertical velocity time series to obtain an initial indication of the start and end of the test. It is important to realize that standing up and sitting down are not strict upward and downward movements, as the initial standing up movement typically involves a forward bow [1,14] while the final sitting down movement involves a backward bow associated with placing the back against the backrest of the chair. Hence, to obtain a more representative start and end of the FTSTS test, we determined characteristic transition points in the vertical position data (instead of simply their minima) using a mathematical model consisting of the following piecewise linear function to find the definite start (Equation (1)):

y (x; b) = b_{1} + b_{2} x + b_{3} (x - b_{4}) H (x - b_{4})

(1)

where H(x − b₄) is the step function defined as follows:

H (x - b_{4}) = ├_{1 i f x > b_{4}}^{0 i f x \leq b_{4}}

(2)

In this model, b₁ represents the initial offset of the vertical position, b₂ represents the first slope (set to 0 for identifying the start frame), b₃ represents the second positive slope, and b₄ represents the breakpoint. To obtain the start of the test (frame number corresponding to b₄), we fitted the function (y) on the vertical position time series (x) from the first frame of the recording to the frame corresponding to the initial indication of the start of the test using a subspace trust region-based, nonlinear, least squares optimization method. A similar method was used to find the end of the test, where the breakpoint was found by fitting a piecewise linear function, starting with a positive slope and ending with a constant, to the vertical position time series from the initial indication of the end of the test to the end of the recording. With these frame numbers, the time series were trimmed from start to end, from which we determined (i) the between-systems agreement in the time series and (ii) the test completion durations for further statistical analyses.

From the trimmed data, each sit to stand cycle of FTSTS was divided into four phases: sitting, sit to stand, standing, and stand to sit. Sitting and standing phases were first identified from the vertical position data from each system by means of finding zero crossings in the vertical velocity. Since a zero crossing gives a single time point and not the duration of the sitting and standing phases, we applied an empirically found threshold of 2 cm to obtain the start and end frame numbers of standing and sitting phases [7], from which sitting, sit to stand, standing, and stand to sit sub-durations were derived (Figure 2). To reduce the influence of potential outliers in within-test repetitions of the so-obtained sub-durations (e.g., due to, for example, a participant taking a brief sitting rest during the test), the median of the 4 (for sitting sub-durations) or 5 (for standing, sit to stand, stand to sit sub-durations) within-test sub-durations was determined prior to further statistical analyses.

2.4.2. Deriving (Sub-)Durations of the TUG Test

The TUG test completion duration was determined from the AR vertical position time series, as described for the FTSTS test. Sub-durations for the two turns (turn 1 and turn 2) were derived from the AR orientation time series, specifically the yaw angle (Figure 3). For the IMU, yaw angle time series were also used, obtained after integrating the associated rotational velocity time series. Determining the start and end of a turn during TUG is not straightforward, as it is characterized by a rather slow transition from one state (~0° yaw) to another (~180° yaw). We modeled this transition with the following sigmoid function y (Equation (3)):

y (t; p) = p_{1} \tanh (\frac{t - p_{2}}{p_{3}}) + p_{1} + p_{4}

(3)

where tanh is the hyperbolic tangent function defined as follows:

\tanh (z) = \frac{e^{z} - e^{- z}}{e^{z} + e^{- z}}

(4)

In this model, p₁ represents the amplitude of the transition (usually ~180°), p₂ represents the timing (or center) of the transition, p₃ represents a scaling factor that adjusts the duration of the transition, and p₄ represents the (potential) starting offset (in °). The sigmoid function y was fitted to the yaw angle time series t using a subspace trust region-based, nonlinear, and least squares optimization method. The fitted model y was then used to determine the start and end of turning (Figure 3) by applying an empirically found threshold of 10° from the start of the transition to the end for each turn as determined by p₄ and p₄ + p₁, respectively. Turning sub-durations were derived from these indicators of the start and end of turns and used for further statistical analyses.

Concurrent validity (i.e., consistency agreement between AR and reference system time series and absolute between-systems agreement for all (sub-)durations of the FTSTS and TUG tests) and test–retest reliability (i.e., within-system absolute agreement between repetitions for all (sub-)durations of the FTSTS and TUG tests) were evaluated with the ICC_(C,1) and ICC_(A,1) [15]. ICC values greater than 0.50, 0.75, and 0.90 represent, respectively, moderate, good, and excellent agreement between systems or over repetitions [16]. As ICC values alone may give misleading impressions of agreement when there is a large between-subject variation, we complemented them with two Bland–Altman analysis statistics: (1) bias, indicating a systematic difference between systems or over repetitions, and (2) limits of agreement, indicating the precision of differences between systems or over repetitions [17]. Statistical differences between systems (i.e., between AR glasses and reference systems) and trials were evaluated using paired samples t-tests for normally distributed data, while differences between HoloLens 2 and Magic Leap 2 AR glasses were assessed using independent t-tests under the assumption of normality. Normality was checked using the Kolmogorov–Smirnov test. In cases where the data were not normally distributed, the non-parametric Wilcoxon signed-rank test was used for paired comparisons, and the Mann–Whitney U test was used for independent comparisons. All processed data used for the statistical analyses are provided in the Supplementary Material, from which missing values become apparent. Missing values mostly resulted from inaccurate Kinect data (i.e., storing 3D kinematics of the guarding researcher instead of the participant). Other reasons for missing values were as follows: (1) one participant did not perform the FTSTS test, (2) one participant was excluded from the between-systems comparisons because of an error in IMU recording, and (3) two participants were removed from test–retest analyses because of incomplete data for one of the two repetitions. For concurrent validity analyses, the second trial was generally used and, when this was not possible (e.g., due to the abovementioned issues with Kinect recordings), occasionally, the first trial was used to keep as many participants in the analyses.

3. Results

3.1. Concurrent Validity

3.1.1. Agreement in Time Series between AR and Reference Systems

Figure 4a shows representative examples of the AR vertical position data versus the three vertical Kinect body points of FTSTS. At a group level, the consistency agreement between AR and Kinect time series for FTSTS was excellent (ICC_(C,1) [95% CI] for AR vs. Kinect head: 0.992 [0.987–0.996], sternum: 0.976 [0.962–0.989], and spine base: 0.933 [0.916–0.951]). There were no significant differences between the consistency agreement scores of HoloLens 2 (0.988) and Magic Leap 2 (0.995) against Kinect head time series (t(15) = −1.81, p = 0.091), nor between HoloLens 2 (0.969) and Magic Leap 2 (0.980) against the Kinect sternum time series (t(15) = −0.78, p = 0.449) or between HoloLens 2 (0.921) and Magic Leap 2 (0.945) against the Kinect spine base reference time series (t(15) = −1.39, p = 0.185). Figure 4b shows a representative example of AR and IMU orientation data (yaw angle) of TUG. At a group level, the consistency agreement between AR and IMU time series for TUG was again excellent (ICC_(C,1) [95% CI], AR vs. IMU trunk: 0.986 [0.981–0.990]). Against the IMU time series, consistency agreement scores did not differ between HoloLens 2 (0.983) and Magic Leap 2 (0.989) (t(19) = −1.46, p = 0.161).

3.1.2. Agreement in (Sub-)Durations Derived from AR and Reference System Data

The absolute agreement between AR-derived test completion durations and stopwatch counterparts was excellent for both the FTSTS and TUG tests (ICC_(A,1) > 0.984), with a small but significant bias (~2% and ~4% of the mean, respectively) and narrow limits of agreement (Table 2).

The absolute agreement statistics between derived (sub-)durations from AR and reference system time series are presented in Table 2. For FTSTS, the between-systems agreement (AR vs. Kinect head data) for derived total and sub-durations was excellent (ICC_(A,1) > 0.921, with three related small but significant biases for sitting, sit to stand, and stand to sit sub-durations and narrow limits of agreement). A similar pattern of between-systems agreement statistics was observed for the comparisons of AR (sub-)durations with those derived from Kinect sternum and spine base data, with slightly worsening statistics for body points farther away from the head (see Table S1 in the Supplementary Materials).

For TUG, the between-systems agreement for sub-durations was excellent for turn 1 (ICC_(A,1) = 0.913, without bias and narrow limits of agreement) and moderate for turn 2 (ICC_(A,1) = 0.589, with a substantial bias of ~22% and wide limits of agreement; Table 2).

3.2. Test–Retest Reliability for (Sub-)Durations of FTSTS and TUG

The absolute agreement between trial 1 and trial 2 for the AR data of the FTSTS test was excellent for total durations (ICC_(A,1) = 0.914) and good to excellent for sub-durations (ICC_(A,1) > 0.830). Only the standing sub-durations showed moderate absolute agreement between repetitions (ICC_(A,1) = 0.695; Table 3). All biases were small and non-significant, while limits of agreement between test and retest trials were all greater than those seen between systems (i.e., compared to Table 2). The test–retest statistics for the reference systems data were similar or slightly worse (see Table S2 in Supplementary Materials).

The absolute agreement for TUG test completion durations between trial 1 and trial 2 for the AR data was good (ICC_(A,1) > 0.75, no bias, wider limits of agreement than those seen between systems) and moderate for turn 1 and turn 2 sub-durations (ICC_(A,1) < 0.75, no bias and again wider limits of agreement than those seen between systems; cf. Table 3 vs. Table 2). The test–retest statistics for the reference system data were again similar (see Table S2 in Supplementary Materials). Note that we used the standard TUG test without dictating turning directions (i.e., clockwise or counterclockwise). Nevertheless, the turning directions of the test and retest were highly consistent among our participants. For turn 1 at or around the 3-m marker, all our participants turned in the same direction for the test and retest. For turn 2 prior to sitting down, 18 of the 21 participants turned in the same direction for test and retest; excluding the three participants with an inconsistent turn direction slightly improved test–retest reliability statistics for turn 2 sub-durations (limits of agreement: from −0.14 (−1.28 1.00) to −0.02 (−1.06 1.02) and ICC_(A,1) from 0.684 to 0.742).

4. Discussion

The aim of this study was to evaluate concurrent validity and test–retest reliability of AR-instrumented FTSTS and TUG tests in people with Parkinson’s disease. Here, we discuss the findings associated with the three specific objectives outlined in the Introduction. The first objective was to examine the agreement between AR position and orientation time series and counterparts from reference motion registration systems. An excellent concurrent validity was observed, better than expected, as evidenced by excellent ICC values between AR and Kinect vertical position data and between AR and IMU yaw orientation data. This was true for both HoloLens 2 and Magic Leap 2 AR glasses, as our findings revealed no significant difference in consistency agreement scores between HoloLens 2 and Magic Leap 2 AR glasses, suggesting comparable accuracy in 3D position and orientation tracking. As can be appreciated from the data depicted in Figure 2, Figure 3 and Figure 4, as well as from the data visualization alongside videos of TUG and FTSTS performance in the Supplementary Material, AR 3D position and orientation data contain rich information from which indicators of various distinct sub-parts of the tests can be validly derived, as discussed further below. Although we focused on temporal aspects of TUG and FTSTS test performance, there are ample opportunities for an even finer-grained parameterization of identified sub-parts given that state-of-the-art AR glasses (i.e., HoloLens 2, Magic Leap 2) are, in principle, 3D position sensors, a unique asset compared to other wearable sensor systems [9]. Specifically, features in the vertical position time series, in the horizontal displacement time series, and in the orientation around the vertical axis (i.e., yaw angle time series) seem informative for demarcating the various sub-parts of the TUG test (Figure 5). As can be appreciated in Figure 5, the yaw angle time series (dashed line) clearly shows two ~180° turns, while the change in slope in the horizontal displacement time series (dotted line) may be indicative of deceleration and acceleration phases demarcating turning and walking phases. Deriving spatiotemporal gait parameters, like cadence and step lengths, from TUG walking parts seems well feasible using the characteristic oscillations in the vertical (and non-depicted mediolateral) position time series associated with midstance (peaks) and foot strikes (valleys) [13]. Gait parameter quantification was already successfully explored previously for healthy adults [10,18] and people with Parkinson’s disease [13] and is deemed worth studying further for standard clinical tests, like, for instance, the 10-m walk test and the 6-min walk test.

The second objective was to derive, from these valid time series, TUG and FTSTS test completion durations and sub-durations for distinct sub-parts of the tests, like turning and sit to stand durations, and to evaluate their concurrent validity against reference systems. For TUG and FTSTS test completion durations, we found excellent agreement between AR data and the stopwatch, albeit with a small but significant bias (<4%) and narrow limits of agreement, indicating that the systems can be used interchangeably. Likewise, between-systems agreement statistics were excellent for all (sub-)durations for the FTSTS test, with high ICCs and narrow limits of agreement, yet with three small but significant biases (~6%) for sitting, sit to stand and stand to sit sub-durations, annulling each other subsequently given that they were, in absolute terms, similar in magnitude but opposite in direction, indicative of an interconnected between-systems bias in the identification of frame numbers for the start and end of the sitting phase, demarcating stand to sit, sitting, and sit to stand sub-durations. For the TUG test, the sub-durations of the two turns showed different agreement statistics. While the between-systems agreement for sub-durations for the first turn was excellent (see also Koop and colleagues [10]), the agreement scores for the second turn were moderate. This difference between the two turns may be explained by our observation that the second turn is not a distinct sub-part but overlaps with the sitting down part of the TUG test in a combined sitting down whilst turning movement. Also, turning participants sometimes looked for the location of the chair and, therefore, turned their heads (as captured with AR yaw data) before they actually turned around with their trunk (as captured with IMU yaw data). Finally, there is an ordered sequence during turning, which generally starts with the head and is followed by the trunk and is enclosed by the head again [19], which may all have affected between-systems differences in sub-durations to some extent.

The third objective was to determine within-system test–retest reliability to help interpret the abovementioned between-systems absolute agreement statistics. An interesting observation from a statistical point of view was that the within-system test–retest variation in completion times and sub-durations (cf. Table 3) was always much greater than the between-systems variation (cf. Table 2). This was reflected in the limits of agreement, which are much wider for the test–retest evaluation (for AR and reference systems alike, see also Table S2) than for the between-systems evaluation (i.e., compare limits of agreement between Table 2 and Table 3). On the one hand, this is positive, as test completion durations and sub-durations may then be derived interchangeably from AR and reference system data. On the other hand, the fairly large variation seen over repeated measurements, for both AR and reference systems (see also [20,21,22]), may limit their sensitivity for detecting longitudinal changes with disease progression, medication, or rehabilitation intervention [23].

Overall, it seems fair to conclude that AR 3D position and orientation data are valid and contain rich features from which (sub-)durations of TUG and FTSTS performance can be validly derived in people with Parkinson’s disease, with—for both AR and reference systems—a better between-systems than within-system absolute agreement. What are the future prospects of these findings? We envision a scenario where AR data are used to not only quantify test completion durations (similar to the stopwatch) but, given the rich information in the data and the excellent time series concurrent validity results presented here, also to provide a more comprehensive quantitative assessment of clinically relevant sub-durations, like the turn in TUG [6]. With AR, such quantitative tests may also be automatized, as test instructions (e.g., ‘3-2-1-go’) and test constraints (AR visual indicator for TUG turn at 3 m from the chair) can readily be provided. Automated test administration resulting in valid and reliable test scores is much needed and provides opportunities to improve care. That is, physical therapy is increasingly given remotely at home with AR [23,24], for which (self-)monitoring of treatment progress becomes key. Because such home-based AR gait and balance exergaming intervention programs will typically be remotely prescribed by the therapist [23,24], in principle, also AR TUG and FTSTS assessments may be prescribed as part of the intervention program. Insight into progress can so be obtained at higher intervals than the standard pre–post-intervention assessment in clinical practice or clinical research. This is relevant as multiple longitudinal assessments may help mitigate the effect of confounding factors, like daily fluctuations, thus ensuring a more reliable assessment of change over time and thereby enhancing the quality of clinical and research findings. It may also provide unique insight into dose–response relationships of prescribed AR interventions and/or concomitant medication in a time-effective and patient-friendly manner. Ultimately, remote parameterization of progress could help reduce the number of contact moments between patient and healthcare provider and also change those consults. That is, instead of administering tests, the session may be used to discuss the patient’s test results and any further needs for intervention moving forward, which is expected to increase care efficiency and patient satisfaction. Our ongoing research will proceed along those lines.

5. Conclusions

From this study, it can be concluded that AR data are valid and informative for quantifying TUG and FTSTS test performance outcomes. TUG and FTSTS test completion durations, as well as various sub-durations of distinct sub-parts of the test, can be determined interchangeably from AR data and reference system data in a cross-sectional assessment in persons with Parkinson’s disease. However, the relatively poorer within-system test–retest reliability (for AR and reference systems alike) compared to between-systems agreement should be kept in mind when performing longitudinal assessments with either system.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/s24175485/s1, Table S1: Concurrent validity statistics for FTSTS (sub-)durations (in s): between-systems absolute agreement statistics for (sub-)durations derived from AR and Kinect sternum and spine base data. Table S2: Test–retest reliability: absolute agreement statistics for (sub-)durations (in s) derived from reference systems data. Video S1: FTSTS test example, with visualization of vertical AR position data. Video S2: TUG test example, with visualization of AR 3D position and orientation data. Datafile S1: FTSTS between repetitions comparison. Datafile S2: FTSTS between systems comparison. Datafile S3: FTSTS ICC for consistency. Datafile S4: TUG between repetitions comparison. Datafile S5: TUG between systems comparison. Datafile S6: TUG ICC for consistency.

Author Contributions

Conceptualization, D.J.G. and M.R.; methodology, J.S.v.B., P.F.v.D., E.M.H., D.J.G. and M.R.; software, J.S.v.B. and P.F.v.D.; validation, M.R.; formal analysis, J.S.v.B. and P.F.v.D.; investigation, J.S.v.B. and E.M.H.; resources, M.R.; data curation, J.S.v.B. and P.F.v.D.; writing—original draft preparation, J.S.v.B. and M.R.; writing—review and editing, J.S.v.B., P.F.v.D., E.M.H., D.J.G. and M.R.; visualization, J.S.v.B., P.F.v.D. and E.M.H.; supervision, D.J.G. and M.R.; project administration, D.J.G. and M.R.; funding acquisition, D.J.G. and M.R. All authors have read and agreed to the published version of the manuscript.

Funding

This project is funded by the EMIL project financial support to third parties, which is funded by the European Union (Grant ID E115506). Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union. Neither the European Union nor the granting authority can be held responsible for them. Sensors 24 05485 i001

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and was approved by the Medical research Ethics Committees United, the Netherlands (R22.076, NL82441.100.22, 21 November 2022).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Data supporting reported results can be found in the Supplementary Materials.

Acknowledgments

We would like to thank Lotte Hardeman, Annelotte Geene, and Annejet van Dam for their help with performing the measurements.

Conflicts of Interest

M.R. is a scientific advisor with share options for Strolll Ltd., a digital therapeutics company building AR software for physical rehabilitation. The other authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Baer, G.D.; Ashburn, A.M. Trunk movements in older subjects during sit-to-stand. Arch. Phys. Med. Rehabilit. 1995, 76, 844–849. [Google Scholar] [CrossRef] [PubMed]
Whitney, S.L.; Wrisley, D.M.; Marchetti, G.F.; Gee, M.A.; Redfern, M.S.; Furman, J.M. Clinical measurement of sit-to-stand performance in people with balance disorders: Validity of data for the Five-Times-Sit-to-Stand Test. Phys. Ther. 2005, 85, 1034–1045. [Google Scholar] [CrossRef] [PubMed]
Podsiadlo, D.; Richardson, S. The timed “Up & Go”: A test of basic functional mobility for frail elderly persons. J. Am. Geriatr. Soc. 1991, 39, 142–148. [Google Scholar] [CrossRef]
Lord, S.R.; Murray, S.M.; Chapman, K.; Munro, B.; Tiedemann, A. Sit-to-stand performance depends on sensation, speed, balance, and psychological status in addition to strength in older people. J. Gerontol. Ser. A 2002, 57, M539–M543. [Google Scholar] [CrossRef]
Schoene, D.; Wu, S.S.-M.; Mikolaizak, A.S.; Menant, J.C.; Smith, S.T.; Delbaere, K.; Lord, S.R. Discriminative ability and predictive validity of the timed up and go test in identifying older people who fall: Systematic review and meta-analysis. J. Am. Geriatr. Soc. 2013, 61, 202–208. [Google Scholar] [CrossRef] [PubMed]
Böttinger, M.J.; Labudek, S.; Schoene, D.; Jansen, C.-P.; Stefanakis, M.-E.; Litz, E.; Bauer, J.M.; Becker, C.; Gordt-Oesterwind, K. “TiC-TUG”: Technology in clinical practice using the instrumented timed up and go test—A scoping review. Aging Clin. Exp. Res. 2024, 36, 100. [Google Scholar] [CrossRef]
Ejupi, A.; Brodie, M.; Gschwind, Y.J.; Lord, S.R.; Zagler, W.L.; Delbaere, K. Kinect-Based Five-Times-Sit-to-Stand Test for Clinical and In-Home Assessment of Fall Risk in Older People. Gerontology 2015, 62, 118–124. [Google Scholar] [CrossRef]
Theodorou, C.; Velisavljevic, V.; Dyo, V.; Nonyelu, F. Visual SLAM algorithms and their application for AR, mapping, localization and wayfinding. Array 2022, 15, 100222. [Google Scholar] [CrossRef]
Sun, R.; Aldunate, R.G.; Sosnoff, J.J. The Validity of a Mixed Reality-Based Automated Functional Mobility Assessment. Sensors 2019, 19, 2183. [Google Scholar] [CrossRef]
Miller Koop, M.; Rosenfeldt, A.B.; Owen, K.; Penko, A.L.; Streicher, M.C.; Albright, A.; Alberts, J.L. The Microsoft HoloLens 2 Provides Accurate Measures of Gait, Turning, and Functional Mobility in Healthy Adults. Sensors 2022, 22, 2009. [Google Scholar] [CrossRef]
Nieuwboer, A.; Rochester, L.; Herman, T.; Vandenberghe, W.; Emil, G.E.; Thomaes, T.; Giladi, N. Reliability of the new freezing of gait questionnaire: Agreement between patients with Parkinson’s disease and their carers. Gait Posture 2009, 30, 459–463. [Google Scholar] [CrossRef] [PubMed]
Geerse, D.J.; Coolen, B.H.; Roerdink, M. Kinematic Validation of a Multi-Kinect v2 Instrumented 10-Meter Walkway for Quantitative Gait Assessments. PLoS ONE 2015, 10, e0139913. [Google Scholar] [CrossRef]
Geerse, D.J.; Coolen, B.; Roerdink, M. Quantifying Spatiotemporal Gait Parameters with HoloLens in Healthy Adults and People with Parkinson’s Disease: Test-Retest Reliability, Concurrent Validity, and Face Validity. Sensors 2020, 20, 3216. [Google Scholar] [CrossRef] [PubMed]
Yoshida, K.; An, Q.; Yozu, A.; Chiba, R.; Takakusaki, K.; Yamakawa, H.; Tamura, Y.; Yamashita, A.; Asama, H. Visual and Vestibular Inputs Affect Muscle Synergies Responsible for Body Extension and Stabilization in Sit-to-Stand Motion. Front. Neurosci. 2018, 12, 1042. [Google Scholar] [CrossRef]
McGraw, K.O.; Wong, S.P. Forming inferences about some intraclass correlation coefficients. Psychol. Methods 1996, 1, 30–46. [Google Scholar] [CrossRef]
Koo, T.K.; Li, M.Y. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J. Chiropr. Med. 2016, 15, 155–163. [Google Scholar] [CrossRef]
Bland, J.M.; Altman, D.G. Measuring agreement in method comparison studies. Stat. Methods Med. Res. 1999, 8, 135–160. [Google Scholar] [CrossRef] [PubMed]
Miller Koop, M.; Rosenfeldt, A.B.; Owen, K.; Zimmerman, E.; Johnston, J.; Streicher, M.C.; Albright, A.; Penko, A.L.; Alberts, J.L. The Microsoft HoloLens 2 Provides Accurate Biomechanical Measures of Performance During Military-Relevant Activities in Healthy Adults. Mil. Med. 2023, 188, 92–101. [Google Scholar] [CrossRef] [PubMed]
Kim, K.; Fricke, M.; Bock, O. Eye-Head-Trunk Coordination While Walking and Turning in a Simulated Grocery Shopping Task. J. Mot. Behav. 2021, 53, 575–582. [Google Scholar] [CrossRef]
Lim, L.I.; van Wegen, E.E.; de Goede, C.J.; Jones, D.; Rochester, L.; Hetherington, V.; Nieuwboer, A.; Willems, A.M.; Kwakkel, G. Measuring gait and gait-related activities in Parkinson’s patients own home environment: A reliability, responsiveness and feasibility study. Park. Relat. Disord. 2005, 11, 19–24. [Google Scholar] [CrossRef]
Paul, S.S.; Canning, C.G.; Sherrington, C.; Fung, V.S. Reproducibility of measures of leg muscle power, leg muscle strength, postural sway and mobility in people with Parkinson’s disease. Gait Posture 2012, 36, 639–642. [Google Scholar] [CrossRef] [PubMed]
Spagnuolo, G.; Faria, C.D.C.M.; da Silva, B.A.; Ovando, A.C.; Gomes-Osman, J.; Swarowsky, A. Are functional mobility tests responsive to group physical therapy intervention in individuals with Parkinson’s disease? NeuroRehabilitation 2018, 42, 465–472. [Google Scholar] [CrossRef] [PubMed]
Hardeman, L.E.S.; Geerse, D.J.; Hoogendoorn, E.M.; Nonnekes, J.; Roerdink, M. Remotely prescribed and monitored home-based gait-and-balance therapeutic exergaming using augmented reality (AR) glasses: Protocol for a clinical feasibility study in people with Parkinson’s disease. Pilot Feasibility Stud. 2024, 10, 54. [Google Scholar] [CrossRef] [PubMed]
Hsu, P.-Y.; Singer, J.; Keysor, J.J. The evolution of augmented reality to augment physical therapy: A scoping review. J. Rehabil. Assist. Technol. Eng. 2024, 11, 20556683241252092. [Google Scholar] [CrossRef] [PubMed]

Figure 1. (a) Participant performing the Five Times Sit To Stand (FTSTS) test; data are recorded with a stopwatch, AR glasses, and a Microsoft Kinect v2 sensor. (b) Participant performing the Timed Up and Go (TUG) test; note that the same chair in the same position was used for standing up and sitting down; data are recorded with a stopwatch, AR glasses, and an Inertial Measurement Unit (IMU) worn on the lower back.

Figure 2. Representative AR vertical position time series of a participant performing FTSTS, with vertical dashed lines indicating the start and end of the test (from which test completion duration was derived) and with specific markers (see legend) indicating the start and end of the sit to stand, standing, stand to sit, and sitting sub-parts of the test (from which sub-durations were derived).

Figure 3. Representative AR yaw angle time series (dashed line) of a participant performing TUG, with the gray solid line representing the fitted sigmoid model y that was used to derive turning sub-durations. Statistical analysis.

Figure 4. (a) Representative examples of AR (HoloLens 2) vertical position data vs. Kinect vertical position data during FTSTS. (b) Representative example of AR (Magic Leap 2) yaw orientation data vs. IMU trunk yaw data during TUG.

Figure 5. Illustration of relevant AR data time series of a participant performing the TUG test, with apparent features to identify distinct sub-parts of the test, like standing up, walking, and turning, from which various spatiotemporal parameters may be derived.

Table 1. Participant characteristics.

Characteristics	Data (Mean ± SD [Range] Unless Indicated Otherwise)
Age (years)	66.3 ± 8.8 [51–82]
Weight (kg)	79.1 ± 9.9 [59.0–92.9]
Height (cm)	176.1 ± 10.3 [154–191]
Sex, male/female	16/6
Diagnostic time (years)	7.5 ± 4.8 [1–20]
Modified Hoehn and Yahr stage, 2/2.5	14/8
Freezing of gait, yes/no *	11/11
MDS-UPDRS III score	31.7 ± 11.3 [13–61]
Fall history (number of falls over the previous year)	2.5 ± 3.3 [0–10]

MDS-UPDRS III = MDS-unified Parkinson’s disease rating scale part III. * The presence of freezing of gait is defined by a non-zero score on the New Freezing of Gait Questionnaire [11].

Table 2. Concurrent validity statistics for FTSTS and TUG (sub-)durations (in s): between-systems absolute agreement statistics for (sub-)durations derived from AR and reference systems data.

		Mean ± SD	Mean ± SD	Bias (95% Limits of Agreement)	t-Statistics	ICC_(A,1)
		AR	Stopwatch
FTSTS	Completion duration	12.3 ± 3.7	12.1 ± 3.7	−0.25 (−0.66 0.17)	t(20) = −5.28, p < 0.001	0.996
TUG	Completion duration	10.3 ± 3.1	9.9 ± 3.2	−0.44 (−1.19 0.31)	t(21) = −5.37, p < 0.001	0.984
		AR	Kinect Head
FTSTS *	Completion duration	11.9 ± 3.91	11.91 ± 3.81	−0.04 (−0.38 0.3)	t(17) = −1.09, p = 0.293	0.999
	Sitting sub-duration	0.69 ± 0.49	0.62 ± 0.46	−0.07 (−0.25 0.11)	t(16) = −3.13, p = 0.006	0.971
	Sit to stand sub-duration	0.58 ± 0.16	0.62 ± 0.16	0.04 (−0.03 0.12)	t(16) = −4.49, p < 0.001	0.943
	Standing sub-duration	0.34 ± 0.15	0.32 ± 0.16	−0.02 (−0.09 0.06)	t(16) = 1.82, p = 0.087	0.965
	Stand to sit sub-duration	0.61 ± 0.16	0.65 ± 0.16	0.04 (−0.07 0.14)	t(16) = −3.05, p = 0.008	0.921
		AR	IMU Trunk
TUG	Turn 1 sub-duration	1.75 ± 0.64	1.8 ± 0.70	−0.09 (−0.4 0.6)	t(20) = −1.51, p = 0.146	0.913
TUG	Turn 2 sub-duration	2.01 ± 0.71	1.61 ± 0.62	−0.40 (−1.44 0.64)	t(20) = 3.44, p = 0.003	0.589

Significant biases are presented with the p-values in bold. * One outlier participant (p4 in the Supplementary Data Files) was excluded from the sub-duration results due to insufficient peaks in the Kinect time series only, preventing our peak detection algorithm from accurately identifying the frame number start and end indices.

Table 3. Test–retest reliability for FTSTS and TUG: absolute agreement statistics for (sub-)durations (in s) derived from AR data.

		Mean ± SD	Mean ± SD	Bias (95% Limits of Agreement)	t-Statistics	ICC_(A,1)
		AR Trial 1	AR Trial 2
FTSTS	Completion duration	12.82 ± 4.33	12.28 ± 3.83	−0.54 (−3.78 2.70)	t(19) = 1.45, p = 0.163	0.914
	Sitting sub-duration	0.78 ± 0.45	0.75 ± 0.48	−0.03 (−0.28 0.22)	t(19) = 1.04, p = 0.314	0.964
	Sit to stand sub-duration	0.61 ± 0.19	0.59 ± 0.15	−0.02 (−0.21 0.17)	t(19) = 1.02, p = 0.322	0.830
	Standing sub-duration	0.37 ± 0.15	0.38 ± 0.18	0.01 (−0.25 0.26)	t(19) = -0.32, p = 0.756	0.695
	Stand to sit sub-duration	0.61 ± 0.16	0.62 ± 0.15	0.01 (−0.13 0.14)	t(19) = -0.54, p = 0.598	0.903
TUG	Completion duration	10.83 ± 2.71	10.32 ± 3.21	−0.51 (−3.33 2.32)	t(20) = 1.62, p = 0.122	0.874
	Turn 1 sub-duration	1.82 ± 0.73	1.78 ± 0.63	−0.04 (−1.05 0.98)	t(20) = 0.34, p = 0.740	0.721
	Turn 2 sub-duration	2.18 ± 0.76	2.04 ± 0.71	−0.14 (−1.28 1.00)	t(20) = 1.07, p = 0.295	0.684

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

van Bergem, J.S.; van Doorn, P.F.; Hoogendoorn, E.M.; Geerse, D.J.; Roerdink, M. Gait and Balance Assessments with Augmented Reality Glasses in People with Parkinson’s Disease: Concurrent Validity and Test–Retest Reliability. Sensors 2024, 24, 5485. https://doi.org/10.3390/s24175485

AMA Style

van Bergem JS, van Doorn PF, Hoogendoorn EM, Geerse DJ, Roerdink M. Gait and Balance Assessments with Augmented Reality Glasses in People with Parkinson’s Disease: Concurrent Validity and Test–Retest Reliability. Sensors. 2024; 24(17):5485. https://doi.org/10.3390/s24175485

Chicago/Turabian Style

van Bergem, Jara S., Pieter F. van Doorn, Eva M. Hoogendoorn, Daphne J. Geerse, and Melvyn Roerdink. 2024. "Gait and Balance Assessments with Augmented Reality Glasses in People with Parkinson’s Disease: Concurrent Validity and Test–Retest Reliability" Sensors 24, no. 17: 5485. https://doi.org/10.3390/s24175485

APA Style

van Bergem, J. S., van Doorn, P. F., Hoogendoorn, E. M., Geerse, D. J., & Roerdink, M. (2024). Gait and Balance Assessments with Augmented Reality Glasses in People with Parkinson’s Disease: Concurrent Validity and Test–Retest Reliability. Sensors, 24(17), 5485. https://doi.org/10.3390/s24175485

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Gait and Balance Assessments with Augmented Reality Glasses in People with Parkinson’s Disease: Concurrent Validity and Test–Retest Reliability

Abstract

1. Introduction

2. Materials and Methods

2.1. Subjects

2.2. Experimental Set-Up and Procedures

2.3. Data Acquisition

2.4. Data (Pre)Processing

2.4.1. Deriving (Sub-)Durations of the FTSTS Test

2.4.2. Deriving (Sub-)Durations of the TUG Test

3. Results

3.1. Concurrent Validity

3.1.1. Agreement in Time Series between AR and Reference Systems

3.1.2. Agreement in (Sub-)Durations Derived from AR and Reference System Data

3.2. Test–Retest Reliability for (Sub-)Durations of FTSTS and TUG

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI