*3.2. Prediction Models for VAT, VRCP, and Vmax*

Complete MLR prediction models for males and females are presented in Table 3 (left columns), while Figure 2 shows their performance in the derivation cohort (illustrated as an analysis of observed vs. predicted values). The importance of all CPET variables, based on XGBoost selection, included in the modeling is presented in Figure 3. The following variables showed the strongest impact in building the models: VO2, [La−], VE age, and BMI. Model performance is presented as R2, along with RMSE and MAE. Briefly, R2 for male equations ranged from 0.57 for VAT and Vmax to 0.62 for VRCP. For female formulae, R2 ranged from 0.62 for VAT to 0.67 for VRCP. The obtained RMSE was the lowest for the female VAT equation (=0.828) and the highest for the male Vmax (=1.202), while the observed MAE was the lowest for the female VAT equation (=0.657) and the highest for male Vmax (=0.944).



Abbreviations: RMSE, root mean square error; MAE, mean absolute error; R2, adjusted R2; VAT, velocity at anaerobic threshold; Age, age in years; BMI, body mass index (kg·m−2); VO2max, relative maximum oxygen uptake (mL·min−1·kg−1); VO2AT, relative oxygen uptake at anaerobic threshold (mL·min−1·kg−1); [La−]bAT, blood lactate concentration at anaerobic threshold (mmol·L−1); VERCP, pulmonary ventilation at RCP (L·min−1); VRCP, velocity at respiratory compensation point; VO2RCP, relative oxygen uptake at respiratory compensation point (mL·min−1·kg<sup>−</sup>1); [La<sup>−</sup>]bRCP, blood lactate concentration at respiratory compensation point; BM, body mass; Vmax, maximal velocity; VEmax, maximal pulmonary ventilation (L·min−1). RMSE and MAE are explained in km·h<sup>−</sup>1. Model performance at the stage of derivation has been shown in the left columns. Briefly, our equations showed high accuracy and explained approximately 60–70% of the differences between participants. The results of internal validation via the 10-fold cross technique are presented in the right columns, and they showed a precise transferability, despite a limited sample size for internal validation. We are presenting one R<sup>2</sup> because of the 10-fold cross-validation for the same group of participants as the derived validation.

#### *3.3. Internal Validation*

The evaluation of each model is also presented in Table 3 (right columns). In summary, the performance of our prediction equations was similar to that observed in the derivation cohort. A slightly higher RMSE and MAE were noted. Overall, RMSE values are located between 0.838–1.205 km·h−<sup>1</sup> and MAE between 0.665–0.944 km·h−1. The best working model (defined as having the highest replicability and the lowest risk of inaccuracies in the test set) was for VAT for females (RMSE = 0.838, MAE = 0.665), and the worst was for males Vmax (RMSE = 1.205, MAE = 0.944). The most and least accurate models were the same in regards to the derivation and validation. Figure 4 illustrates the Bland–Altman plots, with a comparison of observed vs. predicted velocity using newly derived prediction models at the stage of validation.

**Figure 2.** Performance of novel prediction equations for treadmill velocity. Abbreviations: VAT, velocity at anaerobic threshold; VRCP, velocity at respiratory compensation point; Vmax, maximal velocity. Colored dotted lines illustrate a 1:1 correspondence between measured and predicted velocities, respectively green for males (left row; (**A**–**C**) panels) and red for females (right row; (**D**–**F**) panels).

**Figure 3.** Heat map showing the importance variables regarding predicted velocity based on XGBoost selection. Abbreviation: VO2max, maximal oxygen uptake; VO2RCP, relative oxygen uptake at respiratory compensation point; VO2AT, relative oxygen uptake at anaerobic threshold; [La−]bRCP, blood lactate concentration at respiratory compensation point; [La−]bAT, blood lactate concentration at anaerobic threshold; [La−]bmax, maximal blood lactate concentration; VEmax, maximal pulmonary ventilation; VERCP, pulmonary ventilation at respiratory compensation point; VEAT, pulmonary ventilation at anaerobic threshold; RERmax, maximal respiratory exchange ratio; RERRCP, respiratory exchange ratio at respiratory compensation point; RERAT, respiratory exchange ratio at anaerobic threshold; HRmax, maximal heart rate; HRRCP, heart rate at respiratory compensation point; HRAT, heart rate at anaerobic threshold; BF, body fat; FFM, fat-free mass; BMI, body mass index; VAT, velocity at anaerobic threshold; VRCP, velocity at respiratory compensation point; Vmax, maximal velocity. Panel (**A**) presents data for males, while panel (**B**) shows the data for females. The cross means that the variable has not fulfilled preliminary selection-stage requirements (only in HR). The maps present a variable's importance regarding the predicted velocity during the model-building stage. In the final prediction models, only the variables with significant impact (*p* < 0.05) were included.

**Figure 4.** Bland–Altman Plots comparing observed with predicted velocity during internal validation. Abbreviations: VAT, velocity at anaerobic threshold; VRCP, velocity at respiratory compensation point; Vmax, maximal velocity. Colored dotted lines present a 95% confidence interval of agreement, green for males (left row; (**A**–**C**) panels) and red for females (right row; (**D**–**F**) panels), respectively.

#### **4. Discussion**

In the current study, we applied advanced machine-learning properties in a comprehensive evaluation of running physiology. The obtained equations include several physiological-only measures (both anthropometric and directly measured during CPET) to provide a feasible utility for the prediction of VAT, VRCP, and Vmax with substantial accuracy. The availability of this type of machine-learning tool in exercise diagnostics

enables better training recommendations for EA and facilitates rehabilitation prescriptions for patients suffering from cardiovascular or respiratory diseases [7,31]. The novelty and main advantage are that there are no comparable studies that first select the variables with the strongest predictive abilities, and then directly evaluate their accuracy in the derived prediction models. An additional attribute is a relatively large group of healthy adult EA (*n* = 4001) who have undergone the CPET under an identical protocol, by which the maximum precision and similarity of the collected data were obtained. This enabled us to better examine whether parameters such as age [32], BC and BF [33] or VO2max [16] exerted a possible significant impact on the predictive performance of the model (as they were previously classified as relevant variables in the literature. Moreover, the inclusion criteria enable us to avoid the disturbing influence of factors such as smoking [34] or medications [35].
