Next Article in Journal
Vital Views into Drone-Based GPR Application: Precise Mapping of Soil-to-Rock Boundaries and Ground Water Level for Foundation Engineering and Site-Specific Response
Previous Article in Journal
Assessment of Driver’s Head Acceleration during a Possible Car Skidding Effect
 
 
Article
Peer-Review Record

Machine Learning Regressors to Estimate Continuous Oxygen Uptakes (V˙O2)

Appl. Sci. 2024, 14(17), 7888; https://doi.org/10.3390/app14177888
by Daeeon Hong 1 and Sukkyu Sun 2,*
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2024, 14(17), 7888; https://doi.org/10.3390/app14177888
Submission received: 25 June 2024 / Revised: 2 September 2024 / Accepted: 3 September 2024 / Published: 5 September 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Authors developed a real-time VO2 estimation model using regression model. The paper used Age', 'Weight', 'Height', 'HR', 'Sex', and 'time' as parameters for 𝑉𝑂2 estimation model. But still some questions need to be addressed for better understanding by the reader.

The literature review need to be discussed with their limitations. 

The proposed method need to be explained in more detail.

How the model will behave for a disease person?

Did the results are validated by the experts.  

Comments on the Quality of English Language

Many locations typo and grammatical errors are noticed and needs through correction. 

Author Response

Comment 1: The literature review need to be discussed with their limitations. 

Response 1: Thank you very much for your valuable feedback. I fully agree that clearly outlining the limitations in the literature review will significantly enhance the completeness of the paper. In response, I have added the following section to address the limitations in prior studies on page 10: "The first significant limitation of this study is that the model relies primarily on data from athletes, which may not represent the general population or those with sedentary lifestyles. Additionally, the lack of data from individuals with chronic health conditions, such as cardiovascular diseases or COPD, restricts its applicability. Furthermore, the model's accuracy depends on the precision and consistency of data from wearable devices, which can be affected by sensor performance and user adherence, potentially impacting real-world reliability. By employing adaptive learning systems, wearables can be generalized across diverse populations, including non-athletic individuals and those with chronic conditions, addressing the generalizability and accuracy issues noted in traditional  estimation methods. 

Second, while our study provides valuable insights into VO2 estimation, it is influential to concede the gender imbalance in our sample. Specifically, 87.76% of the participants were male, and only 12.24% were female. This imbalance may limit the generalizability of our findings, particularly concerning female populations, as differences in physiological responses between genders could affect VO2 estimation."

This addition provides a more critical evaluation of the existing research, highlighting specific areas where prior models may fall short and how our study aims to address these gaps.

 

Comment 2: The literature review need to be discussed with their limitations. 

Response 2: We fully accept this point and appreciate your constructive feedback. In response, we have revised the literature review to include a discussion of the limitations of the comparative studies. This addition highlights the challenges faced by previous models and methods, providing a clearer context for the novelty and improvements of our approach.
Thank you again for your valuable input.

 

Comment 3: The proposed method need to be explained in more detail.

Response 3: Thank you for your valuable suggestion. As per your feedback, we have provided a more detailed explanation of the proposed method to enhance clarity. For instance, we have elaborated on the feature selection process, explaining why variables such as age, weight, height, HR, sex, and time were chosen for the VO2 estimation model. Additionally, we have clarified the model training and validation procedures, including the rationale behind the use of 5-fold cross-validation and hyperparameter tuning. We also included a more comprehensive description of how missing data and anomalies were handled to ensure robust model performance on page 4: "This normalization enables fair comparisons across individuals of varying weights and allows the model to capture the relative oxygen consumption per unit of body mass. The features, including age, weight, height, HR, sex, and time, were selected based on their known physiological influence on oxygen consumption and performance in previous VO2 estimation studies (Table 1). These variables are critical in capturing the variability in metabolic and cardiorespiratory responses during exercise.", "Additionally, 5-fold cross-validation was incorporated to assess the stability and generalization capability of the models (Figure 3). This method ensures that the model is robust and performs well on unseen data, helping to avoid overfitting and ensure that predictions generalize well to the broader population.", and "Before model training, missing data were addressed using XGBoost's built-in handling techniques, which automatically manage incomplete datasets. Additionally, anomalies detected in the data, such as extreme outliers, were filtered out using Z-score thresholds to ensure model reliability and accuracy."

 

Comment 4: How the model will behave for a disease person?

Response 4: Thank you for your valuable feedback. As you rightly pointed out, the current model is based on data collected from athletes through a retrospective study. We acknowledge that the model’s applicability to individuals with chronic conditions, such as COPD, is currently limited. However, we have addressed this concern in the manuscript on page 10, where we state the following:
"The first significant limitation of this study is that the model relies primarily on data from athletes, which may not represent the general population or those with sedentary lifestyles. Additionally, the lack of data from individuals with chronic health conditions, such as cardiovascular diseases or COPD, restricts its applicability. Furthermore, the model's accuracy depends on the precision and consistency of data from wearable devices, which can be affected by sensor performance and user adherence, potentially impacting real-world reliability. By employing adaptive learning systems, wearables can be generalized across diverse populations, including non-athletic individuals and those with chronic conditions, addressing the generalizability and accuracy issues noted in traditional  estimation methods. 

Second, while our study provides valuable insights into estimation, it is influential to concede the gender imbalance in our sample. Specifically, 87.76% of the participants were male, and only 12.24% were female. This imbalance may limit the generalizability of our findings, particularly concerning female populations, as differences in physiological responses between genders could affect  estimation."
We believe that this addition helps clarify the limitations of the current model and outlines potential future directions for addressing these issues in broader populations.

 

Comment 5: Did the results are validated by the experts.  

Response 5: Thank you for your insightful question regarding the validation of our results. At this stage, the results have not undergone formal validation by external experts, such as exercise physiologists or clinical professionals. However, the model's performance was rigorously evaluated using statistical validation techniques, including 5-fold cross-validation and various performance metrics (e.g., MAE, MSE, R-squared), to ensure robustness and reliability.

We recognize the importance of expert validation to further confirm the real-world applicability of our findings. As a next step, we plan to collaborate with domain experts in sports science and clinical fields to validate the model's predictions in practical settings. This will help ensure the model’s accuracy and relevance for broader applications, including its use in real-time health monitoring and fitness assessment.

Reviewer 2 Report

Comments and Suggestions for Authors

In this study, the authors explore various regression models to develop a real-time 𝑉𝑂2 and VO2max estimation model, utilizing a dataset from PhysioNet. Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine, Gradient Boosting Regressor, Linear Regression, and AdaBoost Regressor models were included.

 

Although the topic is interesting, I found several aspects that should be revised.

 

Major observations.

My main concern is regarding the inputs of the model. I understand that the target was to estimate de continuous oxygen consumption (VO2). However, Figure 5 indicates that one of the inputs was the VO2 measure. I mean, the target variable was also an input variable. That is not correct. And of course, this is an explanation of the excellent result in the correlation coefficient between the estimation and the measured value.

Also, the input variable “VO2max_category_mapped”, in its different categories, was one of the most important features in the model. If the aim is to develop a model that generates estimations in real-time, this implies knowing these categories in each individual previously, which cannot be practical.

There is no comparison to results from other authors.

Section 2 (Materials and Methods) should be improved. There is a mix of results and texts that could be in the Introduction section. Also, some texts are irrelevant because is a repetition of a previous text.

In some part of section 2.4, the authors refer to “the positive class weight scale is configured to 33.8 to address the class imbalance in the dataset.” However, this is not a classification problem. why is necessary to configure such a value? Authors are using parameters with values from other studies, but there is no clear support for this decision.

Figure 2 must be improved. The text in the box seems to indicate that the excluded participants were n=703 and n=666.

 

Section 3 (Results) should be improved. Some values of SD are missing in the text. The authors talk about significant differences but no statistic test was applied to probe that.

Although a Pearson correlation test was applied, there is no information concerning the distribution of the variables that were included in the test. Maybe the Spearman test could be more adequate.

One of the inputs was the variable “time”. However, there is no information about the description and interpretation of this variable.

What kind of codification was applied to the variable VO2max_category_mapped? One-hot encoding?

What was the aim of the correlation matrix in this work?

In Table 2, it is not clear if the results are for training or test datasets. Also, I suppose that the metrics for quantifying the error and the correlation were computed for each participant. In that case, Table 2 should indicate the mean and SD values.

The authors should show the course of the VO2 (measured and estimated) for several participants (the best, the worst, and others).

In Figure 4, give more information to understand that the bottom figure is a zoom of the top figure.

Section 4 (Discussion) should be improved. There is no discussion in this section. The authors are repeating the results but there is not any analysis or comparison with other studies. Also, there are son figures that should be in the Result section.

 

 

Minor observation.

The abstract section must be improved, in special the concerning to methodology and results.

Define acronyms before their use (for example: ACSM, COPD, MAE, MSE, etc.).

 

Authors must check when referring to VO2 and VO2max. The names are interchanged occasionally (for example in sections 2.3 and 3.2).

In section 2.3, what do the authors want to say with "Z-score anomalies"?

There is Figure 2, but not Figure 1.

There are three Figure 3. Some of them have two titles.

 

Some references have the author´s name written in the wrong style

Comments on the Quality of English Language

N/A

Author Response

Comment 1: My main concern is regarding the inputs of the model. I understand that the target was to estimate de continuous oxygen consumption (VO2). However, Figure 5 indicates that one of the inputs was the VO2 measure. I mean, the target variable was also an input variable. That is not correct. And of course, this is an explanation of the excellent result in the correlation coefficient between the estimation and the measured value.

Response 1: Thank you for your thoughtful comment. I acknowledge that the visualization may have incorrectly implied the use of the VO2 variable as both an input and a target. This was a limitation of the way the figure was generated. However, I can confirm that the actual model training only used the variables "['Age', 'Weight', 'Height', 'HR', 'Sex', 'time']," as reflected in the submitted code. Based on your feedback, I have also revised the correlation figure to ensure it accurately represents the input variables used in the model. I sincerely appreciate your input, which has helped improve the clarity of the visualization. 

Code link (line 829): https://github.com/SeanPresent/VO2max_Estimation/blob/main/xgboost_ml.py

Comment 2: Also, the input variable “VO2max_category_mapped”, in its different categories, was one of the most important features in the model. If the aim is to develop a model that generates estimations in real-time, this implies knowing these categories in each individual previously, which cannot be practical.

Response 2: We appreciate this feedback. We have revised the model to exclude VO2max_category_mapped as a feature to ensure that the model remains feasible for real-time estimation without prior knowledge of these categories. The model has been updated with more practical real-time inputs to improve its applicability.

Comment 3: There is no comparison to results from other authors.

Response 3: Thank you for this observation. We have now included a section in the discussion that compares our results with those of other relevant studies. This comparison highlights the strengths and limitations of our model and situates our work within the existing literature on VO2 estimation on page 10: "Compared to the study from R. M, et al [15], our study differs in model selection, input variables, and target population. We use machine learning models like XGBoost and LightGBM for real-time continuous VO2 estimation, relying on variables such as age, weight, height, heart rate, sex, and time—readily available from wearables. In contrast, the study focuses on submaximal exercise stages, which may limit real-time application. Our dataset, primarily composed of athletes from PhysioNet, offers a more homogeneous sample but limits generalizability to broader populations. The previous study likely includes a more diverse population, potentially increasing variability but broadening the applicability of their findings. However, we use 5-fold cross-validation to ensure robustness, while the R. M, et al study may follow a different validation approach. Both studies use metrics like MAE and RMSE, allowing for direct comparison. However, our focus on minimizing real-time prediction errors better suits continuous monitoring applications. Hyperparameter tuning played a critical role in our model's optimization, particularly with XGBoost, using random search to refine key parameters. The R. M, et al  study may adopt a different strategy for model optimization, highlighting varying approaches to improving performance. In terms of practical application, our study is designed for real-time VO2 estimation using wearable devices, whereas the study focuses on retrospective analysis. Both studies acknowledge limitations; our study highlights dataset imbalance and suggests more diverse samples for future research, while the R. M, et al study likely addresses the limitations of submaximal stages and population diversity. "

 

Comment 4: Section 2 (Materials and Methods) should be improved. There is a mix of results and texts that could be in the Introduction section. Also, some texts are irrelevant because is a repetition of a previous text.

Response 4: We agree with your feedback. We have revised Section 2 to remove any redundant or repetitive information, moving any background material to the introduction. The methods section has been streamlined to present only the relevant methodological details on page 10 : "This normalization enables fair comparisons across individuals of varying weights and allows the model to capture the relative oxygen consumption per unit of body mass. The features, including age, weight, height, HR, sex, and time, were selected based on their known physiological influence on oxygen consumption and performance in previous VO2 estimation studies (Table 1). These variables are critical in capturing the variability in metabolic and cardiorespiratory responses during exercise.", "Additionally, 5-fold cross-validation was incorporated to assess the stability and generalization capability of the models (Figure 3). This method ensures that the model is robust and performs well on unseen data, helping to avoid overfitting and ensure that predictions generalize well to the broader population.", and "Before model training, missing data were addressed using XGBoost's built-in handling techniques, which automatically manage incomplete datasets. Additionally, anomalies detected in the data, such as extreme outliers, were filtered out using Z-score thresholds to ensure model reliability and accuracy."

Comment  5: In some part of section 2.4, the authors refer to “the positive class weight scale is configured to 33.8 to address the class imbalance in the dataset.” However, this is not a classification problem. why is necessary to configure such a value? Authors are using parameters with values from other studies, but there is no clear support for this decision.

Response 5: We appreciate this clarification. Upon review, we realize that the mention of class weight was incorrect for this regression task. We have removed any references to class imbalance configuration, as it is not relevant to our regression model.

Comment 6: Figure 2 must be improved. The text in the box seems to indicate that the excluded participants were n=703 and n=666.

Response 6: Thank you for pointing this out. We have revised Figure 2 to accurately reflect the exclusion of participants on figure explanation part, clarifying that the correct values are n=703 and n=666. The figure has been updated to ensure clarity on page 4: "Participants demographic with exclusion criteria. A total of 857 participants were initially considered for the study. After excluding 154 participants due to age-related criteria and removing 37 participants based on Z-score anomaly detection, the final dataset comprised 666 participants. This dataset was then stratified into 80% (n=532) for model training and 20% (n=134) for testing."

Comment 7: Section 3 (Results) should be improved. Some values of SD are missing in the text. The authors talk about significant differences but no statistic test was applied to probe that.

Response 7: We have now included the missing standard deviation (SD) values in the results section. Additionally, statistical tests have been applied to verify significant differences, and we have added explanations for these tests on page 9: " Fine-tuning of the XGBoost model further improved its performance, achieving a MAE of 0.1217 (± 0.0020) and a mean MAPE of 0.0056 (±0.0001) on validation."

Comment 8: Although a Pearson correlation test was applied, there is no information concerning the distribution of the variables that were included in the test. Maybe the Spearman test could be more adequate.

Response 8: Thank you for the suggestion. We have reviewed the distribution of the variables and found that Spearman correlation is indeed more appropriate for some cases. We have updated the analysis to include Spearman correlation where necessary and provided explanations for these choices in the revised manuscript.

Comment 9: One of the inputs was the variable “time”. However, there is no information about the description and interpretation of this variable.

Response 9: We have now included a detailed explanation of the "time" variable in the methods section. The "time" variable was referenced based on several established formulas, as it accounts for the progression of recording intervals. As time progresses, additional variables are captured and incorporated into the model, which is why its inclusion is essential for VO2 estimation.

Comment 10: What kind of codification was applied to the variable VO2max_category_mapped? One-hot encoding?

Response 10: Thank you for your question. The VO2max_category_mapped variable represents fitness categories such as "Good," "Excellent,", "Fair" and "Superior," which were numerically coded for the model. Instead of using one-hot encoding, these categories were assigned specific numerical values to reflect their ranking, allowing the model to interpret and utilize this information effectively.

Comment 11: What was the aim of the correlation matrix in this work?

Response 11: The correlation matrix was included to assess the relationships between the input features and the target variable, which helped guide the feature selection process. We have now clarified the purpose of the correlation matrix in the methods section.

Comment 12: In Table 2, it is not clear if the results are for training or test datasets. Also, I suppose that the metrics for quantifying the error and the correlation were computed for each participant. In that case, Table 2 should indicate the mean and SD values.

Response 12: We have updated Table 2 to clearly indicate whether the results are from the training or test datasets. Additionally, we have included the mean and SD values for each metric to provide a more complete picture of the model’s performance.

Comment 13: The authors should show the course of the VO2 (measured and estimated) for several participants (the best, the worst, and others).

Response 13: Thank you for your insightful feedback. We have categorized the VO2max levels by referencing Garmin's VO2max standard rating to divide participants into different fitness levels. Based on your suggestion, we have now included visualizations of the measured and estimated VO2 courses for several participants, highlighting the best, worst, and average cases. This addition helps illustrate the model's performance across different scenarios. We greatly appreciate your constructive feedback.

Comment 14: The reviewer mentioned that Figure 4 was unclear and that the bottom figure should be clearly described as a zoomed-in version of the top figure.

Response 14: We have updated the caption for Figure 4 (current figure ) to clearly state that the bottom figure is a zoomed-in view of the top figure, providing additional information for clarity.

Comment 15: Section 4 (Discussion) should be improved. There is no discussion in this section. The authors are repeating the results but there is not any analysis or comparison with other studies. Also, there are son figures that should be in the Result section.

Response 15: Thank you for your valuable feedback. We have revised the discussion section to include a more in-depth analysis and a comparison with relevant studies. Additionally, we have moved the figures that were more appropriate for the results section. We have also added a discussion on the limitations of the current study and suggested areas for further research, which we believe enhances the completeness of the paper. Your feedback has been greatly appreciated and has helped strengthen the overall quality of the manuscript.

Minor Observations

  • Abstract Improvement: We have revised the abstract to better reflect the methodology and results in a clear and concise manner.

  • Acronym Definitions: All acronyms (e.g., ACSM, COPD, MAE, MSE) have been defined at their first mention in the manuscript.

  • Consistency of VO2 and VO2max: We have reviewed the manuscript to ensure consistent usage of VO2 and VO2max throughout the text, particularly in sections 2.3 and 3.2.

  • Z-score Anomalies: The term “Z-score anomalies” has been clarified in Section 2.3 to explain its role in detecting outliers in the data.

  • Missing Figure 1 and Duplicate Figure 3: The figure numbering has been corrected, ensuring that all figures have unique titles and that Figure 1 has been added where appropriate.

Reviewer 3 Report

Comments and Suggestions for Authors Dear Corresponding Author, Thank you for submitting your article to the journal. While congratulating your team for completing such complex work, I would like to offer my analysis to improve the final result:   *Brief summary* Your study uses XGBoost, one of the best Machine Learning techniques, to develop a model for estimating actual oxygen consumption and maximum oxygen consumption. Based on the reported data, it appears to be excellent in terms of predictive accuracy and fit compared to other models, which is promising. Models for estimating physiological parameters, such as actual oxygen consumption and maximum oxygen consumption, have important practical implications for health monitoring and performance evaluation. In the literature, Authoritative Authors have developed oxygen consumption estimation models on very large samples of the populations of interest, divided by sex, age, and weight. Some heart rate monitors currently used for exercise prescription and training implement these functions quite satisfactorily.   *Specific comments* ·         Despite these premises, the methodological approach you used does not seem to be very correct, because the experimental design lacks a description of the instrument used to measure oxygen consumption, its calibration, and so on. This information, instead, was reported by the Authors of the work from which you took the experimental data (copy and paste): "Treadmill Maximal Exercise Tests from the Exercise Physiology and Human Performance Lab of the University of Malaga - Denis Mongin, Jeronimo García Romero, Jose Ramon Alvero Cruz. Published: Dec. 10, 2021. Version: 1.0.1 https://physionet.org/content/treadmill-exercise-cardioresp/1.0.1/#files-panel" ·         In the abstract, like the article by Denis Morgin et al. mentioned above, you reported 992 tests done on males and females between 10 and 63 years old. These numbers, however, were misleading for the reader, because the oxygen consumption estimate was actually performed on only 532 tests, which is about half of what was reported in the abstract. ·         Moreover, the analyzed sample is completely unbalanced: 12.24% females and 87.76% males, with a mean age between 29.94 years (±7.44 SD) and 32.72 years (±8.86 SD) respectively, and not between 10 and 63 years as reported in the abstract. ·         Therefore, it would probably be more appropriate to use only male subjects and for the age range of 32.72 years (±8.86 SD) for the consumption estimate.   *Conclusion* The work, in my humble opinion, should be revised using the reported indications. It is recommended to always use the same notation for oxygen consumption and maximum aerobic power by inserting a dot above the V̇: oxygen consumption V̇O2 and maximum oxygen consumption V̇O2max. It is also recommended to always use mL/min and not ml/min as reported in various parts of the text, along with mL/min/kg or mL·min-1·kg-1 instead of mL/kg/min. I look forward to reading an improved version of the work. Best regards

Author Response

Comment 1 : Description of the Instrumentation and Calibration: The reviewer points out that the paper lacks a detailed description of the instrument used to measure oxygen consumption and its calibration, which is crucial for replicability and clarity. This could be considered a request for more information about the methodological setup.

Response 1 : Thank you for your valuable suggestion. We have now included a more detailed description of the instrument used to measure oxygen consumption, along with its calibration process. This information has been added to the Methods section to ensure better clarity and replicability. The details regarding the equipment were referenced from the original dataset source and have been appropriately cited on page 2 : "The measurements were performed between 2008 and 2018, with athletes undergoing maximal GETs on a treadmill connected to a gas analyzer system. The respiratory parameters, including oxygen consumption and pulmonary ventilation, were measured breath-by-breath using a CPX MedGraphics gas analyzer system (Medical Graphics, MN, USA) connected to a PowerJog J series treadmill, while heart rate was monitored with a Mortara 12-lead ECG device."

Comment 2 : Clarification of the Data Sample in the Abstract: The reviewer highlights a discrepancy between the data mentioned in the abstract (992 tests) and the actual tests used for the analysis (532 tests), suggesting that the abstract should more accurately reflect the dataset used.

 Response 2:  Thank you for pointing out this discrepancy. We have revised the abstract to accurately reflect the number of tests used for the analysis, indicating that 532 tests were included in the study. The changes have been made to ensure consistency with the data used in the analysis on page 1 : "Validation using a dataset of 132 participants yielded the following results Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared (), Root Mean Squared Logarithmic Error (RMSLE), and Mean Absolute Percentage Error (MAPE) metrics revealed the following results: MAE of 0.1338, MSE of 0.0550, RMSE of 0.2345, of 0.9997, RMSLE of 0.0128, and MAPE of 0.0056. This study demonstrates the effectiveness of various regressor models in developing a continuous  estimation model that has promising performance metrics."

Comment 3 : Sample Imbalance: The reviewer notes that the sample is highly unbalanced with respect to gender (87.76% males and 12.24% females), suggesting this imbalance needs to be addressed or acknowledged more clearly. 

Response 3 : We appreciate your feedback regarding the sample imbalance. We have now included a section in the Discussion that explicitly addresses the gender imbalance in the dataset. This imbalance is acknowledged as a limitation of the study, and we propose that future work could benefit from a more balanced sample to improve generalizability at page 10 : "Second, while our study provides valuable insights into VO2 estimation, it is influential to concede the gender imbalance in our sample. Specifically, 87.76% of the participants were male, and only 12.24% were female. This imbalance may limit the generalizability of our findings, particularly concerning female populations, as differences in physiological responses between genders could affect VO2 estimation."

 

Comment 4 : Appropriateness of the Sample for Analysis: The reviewer questions whether it would be more appropriate to only use male subjects for the oxygen consumption estimation due to the gender imbalance and suggests focusing on the specific age range (mean age of 32.72 years ±8.86 SD).

Response 4: Thank you for your insightful comment. We agree that focusing exclusively on male subjects or a more specific age range could lead to a more precise estimation model. However, at this stage of our research, adjusting the sample to include only male participants or narrowing the age range presents some practical challenges, especially as we are in the final stages of the study. Nonetheless, we recognize that this approach would be an excellent direction for future research. We sincerely appreciate your suggestion, as it highlights a valuable potential research topic for follow-up studies, which we plan to explore in subsequent work.

Comment 5 : Notation and Units Consistency: The reviewer advises the authors to use consistent notation for oxygen consumption (e.g., V̇O2 and V̇O2max with a dot above the "V") and to standardize the units (mL/min instead of ml/min, mL/min/kg instead of mL/kg/min) for greater precision and consistency throughout the paper.

Response 5 : Thank you for your valuable feedback. As per your suggestion, we have revised the manuscript to ensure consistent notation for oxygen consumption, using  V̇O2 and V̇O2max with a dot above the "V" throughout the text. Additionally, we have standardized the units to mL/min and mL/min/kg for better precision and consistency, as recommended.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Authors well addressed the queries raised by the reviewers. The paper can be considered for publication. 

References need to be represented uniformly. 

Author Response

Comment 1: References need to be represented uniformly. 

Response 1: Thank you for pointing this out. We have reviewed all the references and ensured they are now uniformly formatted according to the journal’s guidelines. We appreciate your feedback and attention to detail.

Reviewer 2 Report

Comments and Suggestions for Authors

Although the authors answer all my observations, not all the answers are included in the manuscript and also some answers should be improved

Previous Comment 1: My main concern is regarding the inputs of the model. I understand that the target was to estimate de continuous oxygen consumption (VO2). However, Figure 5 indicates that one of the inputs was the VO2 measure. I mean, the target variable was also an input variable. That is not correct. And of course, this is an explanation of the excellent result in the correlation coefficient between the estimation and the measured value. It was partially solved. It is unclear whether the figures related to the "feature importance" will be included or not in the manuscript. If yes, the figure should be modified in order to include only the selected inputs. Also, a description and discussion of this figure should be included in the manuscript.

Previous Comment 2: It was solved

Previous Comment 3: There is no comparison to results from other authors. It was partially solved. Although the authors improve the discussion section including a comparison with a R. M, et al study, they should include values to understand how much the new approach improve the estimation of continuous oxygen uptakes. Also, other studies should be included.

Previous Comment 4: Section 2 (Materials and Methods) should be improved. There is a mix of results and texts that could be in the Introduction section. Also, some texts are irrelevant because is a repetition of a previous text. It was partially solved. How the following is useful for the methodology “XGBoost has been widely used across different research domains, showcasing its versatility and effectiveness. Notable applications include predicting stock prices in financial markets [27], proactive damage estimation in infrastructure management [28], early earthquake magnitude prediction in earthquake research [29], forecasting air quality in environmental science [30], improving intrusion detection in cybersecurity [31], predicting treatment responsiveness in healthcare [32], and providing accurate forecasts of virus spread during the COVID-19 pandemic for public health management [33].” Also, this will be in the result section: “Yet, predictions on the test data to estimate 𝑉𝑂2𝑉̇𝑂2 ml/kg/min values, demonstrate the suitability of XGBoost as the most appropriate model for predicting 𝑉𝑂2𝑉̇𝑂2 ml/kg/min values

 

Previous Comment 5: It was solved.

 

Previous Comment 6: Figure 2 must be improved. The text in the box seems to indicate that the excluded participants were n=703 and n=666. It was not solved. I mean, for example, the textbox contains the following: n=703 Participants: excluded due to the age group based on standard ratings. It looks like 703 participants were excluded due to the age group based on standard ratings and it is not true. The text “excluded due to the age group based on standard ratings” should be in the textbox related to n=154 excluded

Previous Comment 7: Section 3 (Results) should be improved. Some values of SD are missing in the text. The authors talk about significant differences but no statistic test was applied to probe that. It was not solved. The following is in the Result section (also in Discussion section) “at 2421.38 ml/min (±SD) and VO2 min at 31.94 209 ml/kg/min (±SD) for males, and 1670.74 ml/min (±SD) and 27.59 ml/kg/min (±SD) for 210 females, respectively.

Previous Comment 8: It was solved.

Previous Comment 9: It was solved.

Previous Comment 10: It was solved.

Previous Comment 11: What was the aim of the correlation matrix in this work? It was partially solved. I understand that authors used the correlation matrix to eliminate features with high correlation. However, the features Height and Weight have a correlation of r=0.73 and both were used as inputs. Why?

Previous Comment 12: In Table 2, it is not clear if the results are for training or test datasets. Also, I suppose that the metrics for quantifying the error and the correlation were computed for each participant. In that case, Table 2 should indicate the mean and SD values. It was partially solved. Please, improve the caption in the table and indicate the SD values

Previous Comment 13: The authors should show the course of the VO2 (measured and estimated) for several participants (the best, the worst, and others). It was not solved.

Previous Comment 14: It was solved

Previous Comment 15: Section 4 (Discussion) should be improved. There is no discussion in this section. The authors are repeating the results but there is not any analysis or comparison with other studies. Also, there are some figures that should be in the Result section. It was partially solved. The figures that were moved to the Result section must be modified, and also described and discussed.

 

Other comments:

Review the writing: “Validation using a test dataset of 132 participants yielded the following results Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), R-squared (𝑅2), Root Mean Squared Logarithmic Error (RMSLE), and Mean Absolute Percentage Error (MAPE) metrics revealed the following results”.

 

Improve the writing: Figure 2. This flowchart is indicating from input parameters with modeling approach. It was 195 finalized to approach continuous 𝑉𝑂2𝑚𝑎𝑥𝑉̇𝑂2𝑚𝑎𝑥 estimation.

 

The caption must be revised (Pearson or Spearman): “Figure 3. The image represents a correlation matrix …”

 

Table 2: Please, indicate what model was used with Cross Validation

 

Some references have the author´s name written in the wrong style. For example:

10. B. Carrier, M. M. Helm, K. Cruz, B. Barrios, and J. W. Navalta,

 

7. M. A, K. C, G. AJ, E. NM, G. Z, and S. RM,

Comments on the Quality of English Language

Some phrases must be rewritten.

 

 

Author Response

Thank you for your valuable feedback and insightful comments. Your suggestions have allowed us to further refine and improve the clarity of our paper. We are grateful for the opportunity to address the issues raised, and we believe that the revisions have strengthened the overall quality of the manuscript. Your input has been instrumental in helping us present the work more clearly and effectively.

Comment 1: The main concern was regarding the use of VO2 as both an input and target variable. This issue was partially solved. Additionally, it is unclear whether the "feature importance" figures will be included in the manuscript. If they are included, they should only display selected inputs and be accompanied by a description and discussion.

Response 1: Thank you for your valuable feedback. We have addressed the issue and made the necessary corrections. Figure 5 has been revised to only display the selected input variables, and a detailed description and discussion of the figure have been added to the manuscript. While we hope this resolves your concern, please know that we have made every effort to carefully revise and improve the manuscript based on all feedback received. Even if this specific revision may not be perfect, we have done our utmost to respond to every comment thoroughly on page 8 : "To provide a comprehensive evaluation of the model's performance, we selected three participants based on the accuracy of their  predictions: the participant with the smallest prediction error ("best"), the participant with the largest prediction error ("worst"), and the participant whose prediction error was closest to the median ("other"). The absolute prediction error for each participant was calculated by taking the mean of the absolute differences between the measured  and the predicted  values. The "best" participant was identified as the one with the lowest mean absolute error, while the "worst" participant had the highest. The "other" participant was chosen by finding the individual whose error was closest to the median error across all participants."

Comment 3: There is no comparison to results from other authors. It was partially solved. Although the authors improve the discussion section including a comparison with a R. M, et al study, they should include values to understand how much the new approach improve the estimation of continuous oxygen uptakes. Also, other studies should be included.

Response 3: Thank you for your feedback. We encountered difficulty in finding many prior studies conducted under the same settings, which made direct comparison challenging. As a result, we focused on methodological comparisons with the available studies. Given these constraints, there are limits to how much additional content we can provide in this area. We appreciate your understanding and hope that the revised manuscript still provides valuable insights.

 

Comment 6: Figure 2 still incorrectly shows that 703 participants were excluded due to age group, which is inaccurate. The exclusion due to age group should be listed for 154 participants.

Response 6: We apologize for this oversight. We have corrected Figure 2 to accurately reflect that 154 participants were excluded based on age group, while 703 participants were excluded for other reasons. The figure has been updated accordingly.

Comment 7: Some values of standard deviation (SD) are missing, and there was no statistical test applied to support claims of significant differences.

Response 7: Thank you for pointing this out. We have added the missing SD values and conducted statistical tests to support the claims of significant differences. The results section has been updated to reflect these changes, Page 9: "Moreover,  and min values showcased significant gender differences, with males displaying higher values: at 2421.38 ml/min (±985.61 SD) and VO2 min at 31.94 ml/kg/min (±13.05 SD), compared to females at 1670.74 ml/min (±693.45 SD) and 27.59 ml/kg/min (±10.85 SD) respectively."

Comment 11: It was partially solved. The authors mentioned using the correlation matrix to eliminate features with high correlation, but Height and Weight, which have a correlation of r=0.73, were both used as inputs. Why?

Response 11: We acknowledge your concern regarding the correlation between Height and Weight. Although these variables are moderately correlated, they provide distinct and valuable information for VO2 estimation. Height and Weight impact VO2 differently, and after further analysis, we determined that keeping both variables enhances model accuracy without significant multicollinearity issues. We have now clarified this in the manuscript.

Comment 12: In Table 2, it is unclear whether the results pertain to training or test datasets. Also, mean and SD values should be indicated.

Response 12: Thank you for your feedback. We have revised the caption for Table 2 to clearly indicate whether the results represent training or test datasets. Regarding the standard deviation (SD), we have calculated and reported the SD for the metrics based on the mean values obtained from 5-fold cross-validation. While it is not feasible to compute SD for each fold individually, the SD provided represents the variation across the mean values from the cross-validation process.

Comment 14: The authors should show the course of the VO2 (measured and estimated) for several participants (the best, the worst, and others). This was not solved.

Response 14: Thank you for your suggestion. We have now included visualizations showing the VO2 course (both measured and estimated) for several participants, representing the best, worst, and intermediate cases. These additions help demonstrate the model’s performance across different scenarios, as per your request, and replaced to figure 5 at page 11.

Comment 15: The discussion section still lacks sufficient analysis and comparison with other studies. The figures moved to the result section need further modification and discussion.

Response 15: We have now revised the figures in the result section, providing more detailed descriptions and discussions to enhance clarity. Additionally, we have expanded the discussion section to include further analysis and a broader comparison with other studies, as recommended.

 

Other Comments:

  • Writing Review:
    Thank you for your feedback. We have revised the text for clarity and improved the overall writing to ensure it meets the journal’s standards.
  • Figure 3 Caption:
    We have corrected the caption to clearly indicate whether Pearson or Spearman correlation was used.
  • Table 2:
    We have now specified which model was used with cross-validation in Table 2.
  • Reference Formatting:
    We have reviewed all references and corrected any inconsistencies in author name formatting to ensure uniformity.

Reviewer 3 Report

Comments and Suggestions for Authors

Dear Corresponding Author, Thank you for addressing almost all of the suggested revisions to your manuscript. The updated version of your article seems to be much clearer to the reader, and I believe it is now suitable for publication in this form. Kind regards.

Author Response

Comment 1: Dear Corresponding Author, Thank you for addressing almost all of the suggested revisions to your manuscript. The updated version of your article seems to be much clearer to the reader, and I believe it is now suitable for publication in this form. Kind regards.

Response 1: Thank you very much for your kind feedback and for giving us the opportunity to revise and improve our manuscript. We are grateful for your guidance throughout the review process, as it has greatly contributed to enhancing the clarity and quality of our work.

We sincerely appreciate your consideration, and we are delighted to hear that the manuscript is now deemed suitable for publication.

Thank you once again for your support.

 

Round 3

Reviewer 2 Report

Comments and Suggestions for Authors

The authors have presented a revised version with some modifications.

Regarding my previous comment related to Figure 1, I still consider that Figure 1 can create confusion, so I would prefer not to include and eliminate Figure 1 in the manuscript. Really, this figure is not necessary. It is enough with the text that actually is included in the section “2.3. Data Portion”: “A total of 857 participants, excluding 154 based on age criteria and 37 participants' data were excluded upon the Z-1 score anomalies. Subsequently, the remaining 666 participants were divided into 80% (n=532) for training and 20% (n=134) for testing (Figure 1)”. Although, I suggest changing the previous text with this other (taken from the caption of Figure 1): “A total of 857 participants were initially considered for the study. After excluding 154 participants due to age-related criteria and removing 37 participants based on Z-score anomaly detection, the final dataset comprised 666 participants. This dataset was then stratified into 80% (n=532) for model training and 20% (n=134) for testing.”

 

About my previous comment related to “There is no comparison to results from other authors”, the authors indicated that “We encountered difficulty in finding many prior studies conducted under the same settings, which made direct comparison challenging”. I understand that and it is a normal situation. However, the authors also indicated in the manuscript the following, suggesting that there are similar studies, of course not in the same setting or conditions, but can be used for comparison:

The field has primarily focused on incorporating 𝑉̇𝑂2𝑚𝑎𝑥 into machine learning models using various approaches such as Graph Neural Networks (GNN) [13] and regression models [14]. These studies typically utilize demographic factors like age, weight, and height to predict 𝑉̇𝑂2𝑚𝑎𝑥 [12, 15].”

The studies utilizing PhysioNet [20] for VO2max estimation encounter limitations impacting the accuracy and reliability of their findings. It includes challenges in measuring accurate 𝑉̇𝑂2𝑚𝑎𝑥 values differences between predicted and actual values. Specific equations and methods as the American College of Sports Medicine (ACSM) running equation [21], are concerns about generalizability and methodological issues [22, 23]”

Besides that, the authors included Table 1 with different equations from other studies to compute V02max, and they wrote in the manuscript "In the study, we explore several established equations presented by Edvardsen et al., Jones et al., Hansen et al., and Drinkwater (Table 1)". It would be interesting to know how much improvement the AI algorithms can generate in relation to such equations.

 

There are several mistakes related to the numbers that were reported in the “Tables”, “Result section” and “Discussion section”. Please, check carefully again the whole manuscript. Examples:

In the Result section: with 𝑉̇𝑂2𝑚𝑎𝑥 at 2428.76 ml/min (±991.13 SD)

In the Discussion section: V𝑂2𝑚𝑎𝑥 at 2421.38 ml/min 303 (±985.61 SD)

 

In the Result section (to check the value and the corresponding figure): Specifically, the 𝑅2 value, which represents the correlation between the predicted 𝑉̇𝑂2 ml/kg/min and the actual values, was found to be 0.9691 (Figure 3).

In the Discussion section (to check the value and the corresponding figure): The 𝑅2 value of the model stood at 0.9665, indicating a high level of explanatory power in predicting 𝑉̇𝑂2𝑚𝑎𝑥 (Figure 4).

 

In the Result section: demonstrating a mean MAE of 0.1217

In the Discussion section: achieving a MAE of 0.1217 (± 0.0020)

 

In the Result section: The gender distribution revealed that the majority of participants were male, accounting for approximately 88.15%, while females comprised the remaining 11.85%.

In the Discussion section: Specifically, 87.76% of the participants were male, and only 12.24% were female.

 

In the Result section (it should be male / females): The participants' height and weight further characterized gender disparities, with males presenting an average height of approximately 176.72 cm (±6.73 SD) and weight of 76.24 kg (±9.95 SD), and males indicated averages of 166.11 cm (±8.17 SD) and 61.67 kg (±10. 98 SD), respectively.

In the Result section: The Pearson correlation matrix is a statistical tool operated to quantify the strength and direction of relationships between variables in a dataset

 

In the Discussion section: The Pearson correlation coefficient matrix provides insights into 𝑉̇𝑂2 and its relationship with other parameters.

Comments on the Quality of English Language

Some phrases must be rewritten.

Author Response

Thank you very much for your excellent feedback. It allowed me to carefully review and refine much of the manuscript. I greatly appreciate receiving such high-quality feedback, and it has been invaluable in improving the work.

 

Comment 1: Regarding my previous comment related to Figure 1, I still consider that Figure 1 can create confusion, so I would prefer not to include and eliminate Figure 1 in the manuscript. Really, this figure is not necessary. It is enough with the text that actually is included in the section “2.3. Data Portion”: “A total of 857 participants, excluding 154 based on age criteria and 37 participants' data were excluded upon the Z-1 score anomalies. Subsequently, the remaining 666 participants were divided into 80% (n=532) for training and 20% (n=134) for testing (Figure 1)”. Although, I suggest changing the previous text with this other (taken from the caption of Figure 1): “A total of 857 participants were initially considered for the study. After excluding 154 participants due to age-related criteria and removing 37 participants based on Z-score anomaly detection, the final dataset comprised 666 participants. This dataset was then stratified into 80% (n=532) for model training and 20% (n=134) for testing.”

Response 1: Thank you for your suggestion. We have removed Figure 1 from the manuscript as recommended and have updated the text in the "2.3. Data Portion" section to reflect the wording you suggested. We believe this change improves clarity and aligns with your feedback.

 

Comment 2: About my previous comment related to “There is no comparison to results from other authors”, the authors indicated that “We encountered difficulty in finding many prior studies conducted under the same settings, which made direct comparison challenging”. I understand that and it is a normal situation. However, the authors also indicated in the manuscript the following, suggesting that there are similar studies, of course not in the same setting or conditions, but can be used for comparison:

The field has primarily focused on incorporating ?̇?2??? into machine learning models using various approaches such as Graph Neural Networks (GNN) [13] and regression models [14]. These studies typically utilize demographic factors like age, weight, and height to predict ?̇?2??? [12, 15].”

The studies utilizing PhysioNet [20] for VO2max estimation encounter limitations impacting the accuracy and reliability of their findings. It includes challenges in measuring accurate ?̇?2??? values differences between predicted and actual values. Specific equations and methods as the American College of Sports Medicine (ACSM) running equation [21], are concerns about generalizability and methodological issues [22, 23]”

Besides that, the authors included Table 1 with different equations from other studies to compute V02max, and they wrote in the manuscript "In the study, we explore several established equations presented by Edvardsen et al., Jones et al., Hansen et al., and Drinkwater (Table 1)". It would be interesting to know how much improvement the AI algorithms can generate in relation to such equations.

Response 2: Thank you for highlighting this. We acknowledge that similar studies, although not in the same settings, can indeed offer valuable comparisons. We have now expanded the discussion to include a comparison of the AI algorithm's performance relative to the established equations presented in Table 1. This should provide a clearer understanding of the improvements our approach offers over traditional methods on pg 10: "Our study also provides an insightful comparison between the machine learning-based approach and traditional VO2max estimation equations, such as those presented by Edvardsen et al., Jones et al., Hansen et al., and Drinkwater (Table 1). Unlike the conventional equations, which rely heavily on demographic factors like age, weight, and height, our machine learning models incorporate real-time data, allowing for a more accurate and individualized prediction of VO2max. The improvements in MAE, RMSE, and other performance metrics highlight the enhanced predictive power of AI algorithms, particularly when fine-tuned through methods like hyperparameter optimization in XGBoost. These advancements are critical for applications requiring continuous monitoring and real-time feedback, which are incomprehensible with static equations."

 

Comment 3: There are several mistakes related to the numbers that were reported in the “Tables”, “Result section” and “Discussion section”. Please, check carefully again the whole manuscript. Examples:

In the Result section: with ?̇?2??? at 2428.76 ml/min (±991.13 SD)

In the Discussion section: V?2??? at 2421.38 ml/min 303 (±985.61 SD)

 

In the Result section (to check the value and the corresponding figure): Specifically, the ?2 value, which represents the correlation between the predicted ?̇?2 ml/kg/min and the actual values, was found to be 0.9691 (Figure 3).

In the Discussion section (to check the value and the corresponding figure): The ?2 value of the model stood at 0.9665, indicating a high level of explanatory power in predicting ?̇?2??? (Figure 4).

 

In the Result section: demonstrating a mean MAE of 0.1217

In the Discussion section: achieving a MAE of 0.1217 (± 0.0020)

Response 3: We sincerely apologize for the inconsistencies in the reported numbers. We have thoroughly reviewed the entire manuscript and corrected these discrepancies to ensure consistency across all sections. The corrected values now align accurately in the tables, results, and discussion sections.

 

Comment 4: In the Result section: The gender distribution revealed that the majority of participants were male, accounting for approximately 88.15%, while females comprised the remaining 11.85%.

In the Discussion section: Specifically, 87.76% of the participants were male, and only 12.24% were female.

Response 4: Thank you for bringing this to our attention. We have corrected the gender distribution data to ensure that the percentages are consistent throughout the manuscript. The final percentages accurately reflect the gender distribution of the participants on pg 5: "Based on the analysis of participant characteristics, our study comprised a total of 532 unique individuals for the training dataset. The gender distribution revealed that the majority of participants were male, accounting for approximately 87.76%, while females comprised the remaining 12.24%."

 

Comment 5: In the Result section (it should be male / females): The participants' height and weight further characterized gender disparities, with males presenting an average height of approximately 176.72 cm (±6.73 SD) and weight of 76.24 kg (±9.95 SD), and males indicated averages of 166.11 cm (±8.17 SD) and 61.67 kg (±10. 98 SD), respectively.

Response 5: We appreciate your observation. The results section has been corrected to clearly distinguish between the height and weight of male and female participants. The revised text now accurately describes the gender-specific data.

 

Comment 6: In the Discussion section: The Pearson correlation coefficient matrix provides insights into ?̇?2 and its relationship with other parameters.

Response 6: Thank you for this feedback. We have revised the descriptions of the Pearson correlation matrix in both the results and discussion sections to ensure they are consistent and clearly convey the role of the matrix in analyzing the relationships between variables.

page 10 ) "The Spearman correlation coefficient matrix provides insights into  and its relationship with other parameters. Although we followed the equations to select parameters, height and weight showed a relatively high correlation, along with a smaller positive correlation with age. This suggests that taller and heavier individuals tend to have higher  values. The scatter plot displaying the relationship between actual  (ml/kg/min) and its estimation demonstrates strong predictive accuracy, as indicated by a high R-squared value of 0.9691. This strong linear relationship confirms the reliability of the continuous estimation methods used in this study."

Back to TopTop