Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Open AccessArticle

Peer-Review Record

Limited Interchangeability of Smartwatches and Lace-Mounted IMUs for Running Gait Analysis

Sensors 2025, 25(17), 5553; https://doi.org/10.3390/s25175553

by Theodor Meingast¹, Bryson Carrier^1,2, Amanda Melvin¹, Kenneth M. Kozloff¹, Alexandra F. DeJong Lempke³

and Adam S. Lepley^1,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3:

Gordon Alderink

Reviewer 4: Anonymous

Sensors 2025, 25(17), 5553; https://doi.org/10.3390/s25175553

Submission received: 4 August 2025 / Revised: 29 August 2025 / Accepted: 4 September 2025 / Published: 5 September 2025

(This article belongs to the Special Issue Human Activity Recognition Based on Sensors: Challenges and Perspectives)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Could you please clarify the specific activity levels of the "physically active adults"? Clarify the information on population tested
Could you provide additional rationale for the choice of the 10% equivalence window, explaining why this specific threshold is clinically or biomechanically meaningful?
Could you ensure that all mentions of "stride length" in the results and discussion clearly specify that the smartwatch data was corrected from step length to avoid any potential confusion?
Could you explain the reason for the one-trial discrepancy in the data, where the smartwatch captured 192 trials and the lace-mounted IMU sensors captured 191?
Could you expand the discussion on the sex-based differences in average cadence agreement, possibly exploring additional hypotheses like the influence of sex-specific running forms or arm swing on the wrist-based sensor data?

Author Response

We would first and foremost like to thank the reviewer for taking the time to review this manuscript. Your thoughtful review has led us to make necessary improvements to the manuscript. We acknowledge the time and commitment you have invested in reviewing this article and appreciate your input. We believe that we have addressed your concerns, as well as those from the other reviewers, with the revised draft and are confident that the comments and subsequent changes made to the manuscript will improve the quality and readability of the paper. Please find our responses to your comments below, where we have left your comments in regular font and provided a brief explanation of our edits in red font. In the paper, we have highlighted any changes in yellow for your convenience.

Could you please clarify the specific activity levels of the "physically active adults"? Clarify the information on population tested

Thank you for this comment. We have clarified that participants were included if they “self-reported participating in moderate to vigorous physical activity at least three days per week.” This has been added to Page 3, Lines 113-114.

Could you provide additional rationale for the choice of the 10% equivalence window, explaining why this specific threshold is clinically or biomechanically meaningful?

We selected a 10% equivalence window a priori based on two considerations. First, although there is no consensus surrounding an acceptable equivalence boundary for wearable devices and biomechanical assessments, the 10% window is commonly used in wearable device-validation and measurement-agreement studies of movement behaviors and several validation papers investigating wearable deices have implemented the 10% window, helping to facilitate comparability across studies. Second, a 10% difference can be viewed as biomechanically meaningful for key spatiotemporal running variables, particularly cadence, as previous authors have demonstrated differences in step rate of 5-10% produces clinically relevant changes in lower extremity loading and kinematics. Thus, we believe that 10% equivalence window was appropriate for the current analysis. We have provided further clarification in the methods section (Page 5, Lines 176-179) to disclose the additional rationale, which now reads: “A 10% equivalence window was selected to align with prior wearable validation research where it is widely applied for cross-study comparisons, and because 5–10% changes in spatiotemporal variables, specifically step rate, are known to produce clinically meaningful alterations in lower-extremity loading and running kinematics.[27-30]”

Could you ensure that all mentions of "stride length" in the results and discussion clearly specify that the smartwatch data was corrected from step length to avoid any potential confusion?

Thank you for bringing this to our attention. To avoid any potential confusion, we have added clearly specified language in the results and discussion to highlight that the stride length value was corrected for the smartwatch.

Could you explain the reason for the one-trial discrepancy in the data, where the smartwatch captured 192 trials and the lace-mounted IMU sensors captured 191?

We apologize for the confusion. The discrepancy in the number between devices was due to instances in which one device did not report a value for a given variable, resulting in missing data. If a device failed to report a spatiotemporal parameter for a trial, that trial was not included in the corresponding agreement analysis. We have clarified this point to indicate that differences in sample size reflect missing data (page 5, lines 193-195): “Discrepancies in sample size across analyses reflect instances where one device did not report a given variable, and these trials were therefore excluded from that pairwise comparison.”

Could you expand the discussion on the sex-based differences in average cadence agreement, possibly exploring additional hypotheses like the influence of sex-specific running forms or arm swing on the wrist-based sensor data?

We appreciate the reviewer’s comment and agree that sex-based differences in cadence could warrant further discussion. In response, we have expanded the Discussion section to explore potential biomechanical explanations, including the influence of sex-specific running forms and arm swing on wrist-based cadence detection. The new section (Page 10, Lines 252-258) now reads: “For sex, differences in average cadence agreement may also highlight underlying biomechanical differences between males and females, potentially influencing how wrist-based sensors detect arm swing and estimate step frequency [34,35]. Particularly, females often display greater upper extremity motion, including greater trunk rotation and arm swing, which may influence wrist accelerometer signals and contribute to discrepancies between wrist- and shoe-mounted cadence estimates [36].”

Reviewer 2 Report

Comments and Suggestions for Authors

The authors of the article present a strategy for comparison and evaluation of two wearables: a smartwatch and an inertial sensor located on the foot. They monitor spatiotemporal parameters of the running cycle under both controlled and uncontrolled conditions. The topic addressed is relevant and timely, as it deals with an area of growing impact in sports and digital health. The introduction and methodology are well presented, allowing replication of the experiment. However, there are aspects that should be explained in greater detail and presented more clearly, preferably with additional graphical support:

In the methodology, the authors should clarify that this is strictly a comparative study between two different systems and that the parameters obtained are not being validated, as there is no gold standard system available for validation.

They should also include a diagram or figure showing the positioning of the systems being compared. This is particularly important for the foot-mounted IMU, for which neither the orientation of the axes nor the sampling frequency are specified.

While it is appreciated that the sample was divided by sex, height and weight should also be presented in a stratified manner (e.g., 33 female / 32 male; height: 171.0 ± 8.9 cm; weight: 70.9 ± 15.2 kg). Also, the authors must incorpórate in the manuscript the sample calculated for statistical significant results and how it was calculated.

It is not entirely clear how the spatiotemporal parameters were derived from the foot-mounted IMU. Did the authors use accelerations, angular velocities, or angles? This methodological section should be expanded, ideally including an algorithmic flowchart.

The text between lines 135 and 137 appears to be poorly written:
(surface: 135 concrete sidewalks; 2.5 km and 5 km elevation gain: 7.0 m; 10 km and 20 km elevation gain: 136 of 18 105.7 m).

Regarding the results, the authors should better explain why greater agreement was observed in men than in women. Could this be related to an issue in their foot-mounted IMU estimation algorithm?

Concerning the results presented in the tables, the Bland–Altman analysis is not well presented. These plots should be shown graphically to illustrate how many measurements fall within the limits of agreement. However, given the absence of a gold-standard reference, the usefulness of Bland–Altman in this context is questionable. Since the goal is not to demonstrate which device is more accurate, but rather how concordant their measurements are, I would recommend removing this analysis. If retained, it should at least be shown graphically.

Without validation against a laboratory standard, the results must be interpreted as a relative comparison between devices, not as definitive evidence of accuracy—even if the authors conclude that the systems are not interchangeable for most metrics except cadence.

Author Response

Thank you for this important clarification. We agree that this study is solely a comparative analysis between the smartwatch and lace-mounted IMU devices and have added language in the methods section to avoid any confusion. The new text (Page 3-4, Lines 134-138) now reads: “This study was designed as a direct comparison between the two devices to evaluate their agreement in measuring spatiotemporal running variables. Importantly, we did not aim to validate these devices against a criterion measure (e.g., 3D motion capture or force plates); therefore, the results should be interpreted solely as a comparative analysis of cross-device agreement.”

Thank for you for this suggestion. To address this comment, we have added a new figure (Figure 1) showing an image of one of our participants wearing the lace-mounted IMU sensors (Page 4).

Thank you for this suggestion. We have completed stratifications for height (<166cm, 166-175cm, >175cm) and weight (<60kg, 60-70kg, 70-80kg, 80-90kg, >90kg). Notably, there are no major differences in the findings, as average cadence continues to have acceptable agreement and peak cadence, SL, and GCT continue to have poor agreement across these stratifications. We have included appropriate language in the methods to introduce these stratifications and have decided to include the equivalency plots for height and weight stratifications in the supplemental/appendix files (Figures A5 and A6.)

Thank you for this comment. All spatiotemporal metrics from the shoe-mounted IMUs were pre-processed onboard the sensors and exported in the form of the variables reported in this study. The RunScribe algorithms used to derive these parameters are proprietary and not publicly available; therefore, we are unable to provide specific details on what raw sensor data are used in the calculations. Given that the purpose of this study was to evaluate the agreement between commercially available consumer devices, our analysis was based on the processed outputs provided directly by the devices, consistent with how they would be used in applied consumer and field settings.

The text between lines 135 and 137 appears to be poorly written:
(surface: 135 concrete sidewalks; 2.5 km and 5 km elevation gain: 7.0 m; 10 km and 20 km elevation gain: 136 of 18 105.7 m).

We thank the reviewer for pointing this out. We agree that the original phrasing was unclear and have revised the text for clarity. We have updated the text to more clearly describe the surface conditions and elevation gains across the outdoor courses. (Page 4, Lines 147-149)

Regarding the results, the authors should better explain why greater agreement was observed in men than in women. Could this be related to an issue in their foot-mounted IMU estimation algorithm?

We appreciate the reviewer’s comment and agree that sex-based differences in cadence could warrant further discussion. In response to this comment, and one from another reviewer, we have expanded the Discussion section to explore potential biomechanical explanations, including the influence of sex-specific running forms and arm swing on wrist-based cadence detection. We also have text in the Discussion that states that sex may be accounted for differently in device algorithms, or that algorithm development and validation may not include equal distribution of sexes. The full new section (Page 10, Lines 244-249; 252-258) now reads: “While the underlying reasons for these differences remain unclear, due in part to the proprietary nature of device algorithms, possible explanations include that algorithm development may have been based primarily on data from male participants in these conditions, or that device-specific methods account for confounding factors (such as sex, distance, or environment) in inconsistent ways.”… “For sex, differences in average cadence agreement may also highlight underlying biomechanical differences between males and females, potentially influencing how wrist-based sensors detect arm swing and estimate step frequency [34,35]. Particularly, females often display greater upper extremity motion, including greater trunk rotation and arm swing, which may influence wrist accelerometer signals and contribute to discrepancies between wrist- and shoe-mounted cadence estimates [36].”

Thank you for bringing this to our attention. While our initial intent was to provide a comprehensive statistical assessment, we agree that the Bland-Altman analysis is not the most appropriate method in the absence of a gold-standard criterion and given the comparative aim of this study. In line with the reviewer’s recommendation, we have removed the Bland-Altman analyses and associated graphical plots from the supplemental materials.

Thank you for this important point. We agree it is important for the reader to understand that we did not use a criterion comparison for analysis and thus cannot provide any conclusions as to the validity or accuracy of each sensor, and rather the agreement between sensors. We believe we have included this in a variety of points throughout the manuscript including the methods, which was added during this review (Page 3-4, Lines 134-138), and the Discussion (Page 10, Lines 252-258; Page 11-12, Lines 315-320). We have also modified language throughout to avoid any discussion of validity or accuracy of each sensor based on our results.

Reviewer 3 Report

Comments and Suggestions for Authors

This was a well run study and the manuscript is well-organized and written. The background was satisfactory showing gaps in the literature; a clear purpose statement. The methods section was complete, ie, the study could be replicated. Results, discussion and conclusion were consistent with the study's purpose.

There were a few instances of word choice and tense error. I made a suggestion for the first part of this discussion.

See specific comments in the pdf.

Comments for author File: Comments.pdf

Author Response

There were a few instances of word choice and tense error. I made a suggestion for the first part of this discussion.

See specific comments in the pdf.

Thank you for the specific comments in the PDF. We have corrected the language regarding high/low terminology and have instead used the suggested terms, which have been highlighted in the updated version. We have also added the requested additions to the beginning of the discussion section (Page 10, Lines 220-225; 235-237), discussing the overall problem and highlighting the importance of the research.

Reviewer 4 Report

Comments and Suggestions for Authors The authors present a comparison between a commercial smartwatch and a system based on lace-mounted IMUs for monitoring the main parameters related to running and walking in healthy subjects. The study was carried out systematically, and the results are clear from a technical point of view; however, in my opinion, they are unclear in terms of scientific impact. The key finding is that IMU-based sensors placed in or near the foot work better than a smartwatch placed on the user's wrist. This result is clear for specific parameters such as stride length and ground contact time, but is not significant for average cadence. Leaving aside the fact that a sensor near the foot, i.e., the actuated part, has different outcomes than a sensor placed near the hand, i.e., a region distal to the actuated part, my concerns about the study can be summarised in the following points: 1. Smartwatches are tools used to monitor one's physical condition and provide information on the quality of training. Heart rate and walking speed are usually important information for those who train. Why is the comparison of the parameters used by the authors important? What impact does the mismatch between the two technologies have on a person who bases their training on them? This narrative requires attention from the authors. 2. The positioning of IMU-based sensors would seem to be better than that of smartwatches for monitoring technical parameters such as those identified by the authors. Can the comparison be considered fair? 3. The authors have no control over how the parameters are calculated, making it difficult to understand whether the differences are hardware- or computation-related. 4. The smartwatch used is now outdated compared to the latest technology. Do the authors believe that this fact may be relevant in the current technological context? 5. Why does the number of tracks not match for the two technologies in the statistical analysis? Were any tracks excluded? If so, why and according to what criteria?

Author Response

The authors present a comparison between a commercial smartwatch and a system based on lace-mounted IMUs for monitoring the main parameters related to running and walking in healthy subjects. The study was carried out systematically, and the results are clear from a technical point of view; however, in my opinion, they are unclear in terms of scientific impact. The key finding is that IMU-based sensors placed in or near the foot work better than a smartwatch placed on the user's wrist. This result is clear for specific parameters such as stride length and ground contact time, but is not significant for average cadence. Leaving aside the fact that a sensor near the foot, i.e., the actuated part, has different outcomes than a sensor placed near the hand, i.e., a region distal to the actuated part, my concerns about the study can be summarised in the following points:

Smartwatches are tools used to monitor one's physical condition and provide information on the quality of training. Heart rate and walking speed are usually important information for those who train. Why is the comparison of the parameters used by the authors important? What impact does the mismatch between the two technologies have on a person who bases their training on them? This narrative requires attention from the authors.

Thank you for this comment, as we agree that it is essential to clarify why the comparison of smartwatch and IMU parameters is relevant and what impact device disagreement may have in applied settings. We have revised the Introduction and Discussion sections to emphasize the practical implications of our findings. Specifically, we note that runners, coaches, and clinicians increasingly use spatiotemporal variables to monitor performance and injury risk. Discrepancies between smartwatch and IMU-derived values could therefore lead to different interpretations and decisions depending on what devices is used. The revised text can be found in the Introduction (Page 3, Lines 96-102) and Discussion (Page 10, Lines 220-225; 235-237).

The positioning of IMU-based sensors would seem to be better than that of smartwatches for monitoring technical parameters such as those identified by the authors. Can the comparison be considered fair?

We thank the reviewer for raising this point. We agree that lace-mounted IMUs are positioned closer to the site of measurement and are therefore likely to provide more direct estimates of spatiotemporal parameters compared to wrist-worn smartwatches. However, both device types are widely marketed and adopted by consumers to monitor spatiotemporal variables and are used in applied settings, with large adoption of smartwatches by runners to assess these variables. Thus, from the applied perspective, these devices represent two common options for tracking the same variables during running. Our goal was not to establish which device is superior, but rather to evaluate the degree of agreement between two technologies that runners, coaches, and clinicians may use interchangeably. Wrist-based smartwatches estimate spatiotemporal variables using signals from the upper extremity, often in combination with anthropometric parameters, and in outdoor conditions, integrated GPS. In contrast, foot-mounted IMUs capture lower-limb accelerations directly, allowing for more precise detection of gait events. While future iterations of smartwatch hardware and software may refine sampling rates and algorithms, the intrinsic differences in sensor placement and signal sources are expected to continue to influence device agreement. Providing this information is important for end-users, as differences in device placement and algorithm design may lead to divergent interpretations of performance or injury risk if devices are assumed to be equivalent. Along with the above comment (comment 1), we believe the additions in the Introduction (Page 3, Lines 96-102) and Discussion (Page 10, Lines 220-225; 235-237) help to clarify this rationale.

The authors have no control over how the parameters are calculated, making it difficult to understand whether the differences are hardware- or computation-related.

Thank you for this comment, and we agree that a key limitation of comparing commercial wearables is the inability to know whether differences in reported values arise from hardware specifications, proprietary algorithms, or other device-specific processing steps. We have added to our current language to fully disclose this limitation and emphasized that discrepancies in agreement may stem from factors such as sampling frequency, filtering methods, and proprietary definitions of gait events, all of which are not publicly disclosed by manufacturers. (page 10, lines 244-249): “While the underlying reasons for these differences remain unclear, due in part to the proprietary nature of device algorithms, possible explanations include that algorithm development may have been based primarily on data from male participants in these conditions, or that device-specific methods account for confounding factors (such as sex, distance, or environment) in inconsistent ways.” We have also added standalone text in at the end of the discussion in the limitations section (page 12, lines 320-324): “Lastly, as mentioned previously, consumer wearable devices do not disclose proprietary algorithms used to determine the spatiotemporal variables reported in this study. Thus, an additional limitation is that we are unable to discern whether disagreement between devices arises from hardware/sensor or software/algorithm differences between devices.”

The smartwatch used is now outdated compared to the latest technology. Do the authors believe that this fact may be relevant in the current technological context?

Thank you for this comment. We believe this is an inherent limitation when reporting on wearable technologies, as it is challenging to keep pace with regular updates in technology. We believe that this highlights a need for continued, regular independent evaluation to keep researchers and consumers informed. Thus, we have added text into our discussion to highlight this point (page 12, lines 324-329): “Similarly, an inherent limitation in working with wearable technology is that new models and software updates are regularly released. Our investigation kept the devices and software version constant to address this concern; however, regular testing and independent evaluation is important for the ever-evolving hardware and sensor changes, and software and algorithm updates in these devices.”

Why does the number of tracks not match for the two technologies in the statistical analysis? Were any tracks excluded? If so, why and according to what criteria?

We apologize for any confusion. The discrepancy in the number between devices was due to instances in which one device did not report a value for a given variable, resulting in missing data. If a device failed to report a spatiotemporal parameter for a trial, that trial was not included in the corresponding agreement analysis. We have clarified this point to indicate that differences in sample size reflect missing data (page 5, lines 193-195): “Discrepancies in sample size across analyses reflect instances where one device did not report a given variable, and these trials were therefore excluded from that pairwise comparison.”

Article Menu

Limited Interchangeability of Smartwatches and Lace-Mounted IMUs for Running Gait Analysis

Further Information

Guidelines

MDPI Initiatives

Follow MDPI