*2.3. Somatic, [La*<sup>−</sup>*]b Measurements, and CPET Protocol*

First, body mass (BM) stratified by body fat (BF) and fat-free mass (FFM) measurements were obtained via the bioimpedance method (BIA) using a body composition (BC) analyzer (Tanita, MC 718, Tokyo, Japan) with the multifrequency of 5 kHz/50 kHz/250 kHz. Conditions during BC and CPET were the same: 40 m2 indoor, air-conditioned area, 40–60% humidity, temperature 20–22 ◦C, altitude 100 m ASL, and the subjects had their skin cleaned before testing. In our standardized laboratory practice, each EA had received recovery and dietary instructions via email a few days prior to testing to enable them to prepare appropriately for the CPET and BC tests. Our recommendations included: eating a high carbohydrate meal 2–3 h before the CPET and staying hydrated with sports drinks, and female EAs were advised to be well beyond their menstrual phase [22]. They also received information stating that the CPET would be performed on a mechanical TE and that they should be familiar with the characteristics of this type of effort, as well as the running technique involved.

Running tests were performed on a mechanical TE (h/p/Cosmos quasar, Nussdorf-Traunstein, Germany). CPET indices were measured using the breath-by-breath method during 15 s intervals [23], utilizing a Hans Rudolph V2 Mask (Hans Rudolph, Inc., Shawnee, KS, USA), a gas exchange analyzing device Cosmed Quark CPET (Rome, Italy), and specialized software (Quark PFT Suite powered by Omnia 10.0E). The gas analyzer device was calibrated prior to the testing protocol (16% O2; 5% CO2; ventilation accuracy ±2% or 100 mL·min−1). The analyzer measurement mode takes into account the manufacturer's standard settings, i.e., 3-step smoothing and removing erroneous breaths from the analysis. HR was measured through the ANT+ and torso belt as a part of the Cosmed Quark set (accuracy similar to ECG; ±1 beats·min−1). [La−]b was examined using a Super GL2 analyzer (Müller Gerätebau GmbH, Freital, Germany) employing an enzymatic-amperometric electrochemical technique. The lactate analyzer was also calibrated before each round of analysis for each participant.

CPET began with a 5 min preparatory protocol (walking or slow running at a declared "conversation" pace). The primary speed was 7–12 km·h−<sup>1</sup> at a 1% inclination (the differences in the starting pace resulted from the training level of the participants and were selected on the basis of an interview on their previous sports results). The pace was increased by 1 km·h−<sup>1</sup> every 2 min. VO2 or HR plateau (no increase in VO2 or HR with an increase in CPET intensity) or volitional inability to maintain intensity was the moment when the test was terminated [23,24]. Subjects were encouraged verbally to make a maximum effort. HR was considered the highest value at CPET (not averaged). Maximal VO2 was recorded as an average from stable VO2 in 10 s intervals directly before the termination of the CPET [23,24]. AT and RCP were assessed via non-direct methods based on the ventilatory concept. AT was achieved if the following measures were fulfilled: (1) VE/VO2 curve started to grow with the constant VE/VCO2 curve and (2) end-tidal partial pressure of O2 started to grow with the constant end-tidal partial pressure of CO2 [25]. RCP was achieved if the following measures were fulfilled: (1) a reduction in partial pressure of end-tidal CO2 (PetCO2) after attaining a maximal intensity; (2) a fast nonlinear growth in VE (second deflection); (3) the VE/VCO2 ratio achieved the lowest value and started to grow; and (4) a nonlinear growth in VCO2 versus VO2 (linearity divergence) was achieved [25]. [La−]b was assessed by obtaining a 20 μL blood sample from a fingertip: before the test, after any speed increase, and 3 min after termination. A sample for [La−]b analysis was taken during running without interruption or pace decrease. Each time, the sample was from the same initial puncture. The first few drops were drained onto a swab and the proper blood sample was drawn. In further analysis, the corresponding values of [La−]b for AT, RCP, and maximal VO2 were determined.

#### *2.4. Data Analysis and Predictors Selection*

Data were saved into an Excel file (Microsoft Corporation, Redmond, WA, USA) and Python script. Further, they were calculated according to frequency (percentage) and mean (±standard deviation; SD, or 95% confidence intervals; CI) for continuous variables and the median for categorical variables. Intergroup differences (each was a continuous variable) were calculated using the Student t-test for independent variables. If there were lacking data (only for [La−]b; in 1190 cases for males and 266 cases for females in total), imputation with the random forest method (RF) was applied to fill in the gaps [26,27].

The XGBoost machine learning approach was used to select variables with the highest prediction value [28]. In order to select the variables, the population was divided into 3 groups: 60% for derivation (building group), 20% for testing (testing group), and the remaining 20% for validation (validation group). After selection, 11 variables were included in the further analysis: VO2max, VO2RCP, VO2AT, [La−]bRCP, [La−]bAT, VEmax, VERCP, age, BM, BMI, and BF. Next, selected parameters were input into multiple linear regression (MLR) modeling. As a result, only significant predictors (with *p* < 0.05) were included in the final models. The derived equations are characterized by the coefficient of determination (R2), root mean square error (RMSE), and mean absolute error (MAE). A 10-fold crossvalidation technique [29] and the Bland–Altman plots analysis [30] were used to establish the model's precision and accuracy during internal validation. To clarify, in the 10-fold crossvalidation, the population is divided into 10 random parts. The candidate model is built on [10 − 1 = 9] training sets; then, the derived model is evaluated on the test set consisting of the remaining one part. By respectively conducting building procedures on training sets and validation on the test set 10 times, we chose the final formula with the lowest possible inaccuracy validation score (defined in this paper as the lowest RMSE and MAE) [29]. Other implemented tests to reach the complete fulfillment of MLR modeling requirements include Ramsey's RESET test (for the correctness of specificity in MLR equations), the Chow test (for stability assessment between different coefficients), and the Durbin–Watson test (for autocorrelation of residuals). Each model was examined under the above-mentioned requirements and any irregularities have not been noted.

Our comprehensive machine learning approach enables the evaluation of each formula according to preliminary variable precision (at the stage of selection), accuracy (during model building), and recall (in internal validation).

The Ggplot 2 package (version-6.0-90; Available from: https://cran.r-project.org/web/ packages/caret/index.html, accessed on 21 June 2022) in RStudio (R Core Team, Vienna, Austria; version 3.6.4), GraphPad Prism (GraphPad Software; San Diego, CA, USA; version 9.0.0 for Mac OS), and STATA software (StataCorp, College Station, TX, USA; version 15.1) were used for statistical analysis. A two-sided *p*-value < 0.05 was considered as the significance borderline.

#### **3. Results**

#### *3.1. Somatic Measurements and CPET Results*

The participants' anthropometric data are presented in Table 1. The full population consisted of 4001 people, of which 3330 (83.23%) were male and 671 (16.77%) were female. All data showed a normal distribution. The mean age was 35.90 ± 8.15 years for males and 33.86 ± 7.74 years for females and the overall age ranged from 18 up to 74 years. BMI was 24.07 ± 2.44 kg·m−<sup>2</sup> for men, while women had 21.64 ± 2.38 kg·m−2. BF percentage was relatively low, estimate at 15.49 ± 4.53 in males and 22.04 ± 5.46 in females. Significant differences between genders has been observed for height (*p* < 0.0001), BM (*p* < 0.0001), BMI (*p* < 0.0001), BF (*p* < 0.0001), and FFM (*p* < 0.0001).

CPET results are presented in Table 2. Among other measured variables, VAT was 10.97 ± 1.40 km·h−<sup>1</sup> and 9.64 ± 1.36 km·h−<sup>1</sup> for males and females, respectively. VRCP was 14.02 ± 1.74 km·h−<sup>1</sup> and 12.29 ± 1.68 km·h−<sup>1</sup> for males and females, respectively. The Vmax obtained during CPET was 16.07 ± 1.93 km·h−<sup>1</sup> and 14.12 ± 1.85 km·h−<sup>1</sup> for males and females, respectively. The starting protocol velocity was 8.61 ± 1.28 km·h−<sup>1</sup> for males and 7.60 ± 1.08 km·h−<sup>1</sup> for females. When comparing both genders, significant differences (all *p* < 0.0001) were found for all the measured variables except [La−]b at AT (*p* = 0.99), maximal respiratory exchange ratio (*p* = 0.77), and maximal HR (*p* = 0.15).

**Table 2.** CPET characteristics.


Abbreviations: CI, 95% confidence interval; SD, standard deviation; VO2AT, oxygen uptake at anaerobic threshold; RERAT, respiratory exchange ratio at anaerobic threshold; HRAT, heart rate at anaerobic threshold; VEAT, pulmonary ventilation at anaerobic threshold; fRAT, respiratory frequency at anaerobic threshold; [La−]bAT, lactate concentration at anaerobic threshold; VO2RCP, oxygen uptake at respiratory compensation point; RERRCP, respiratory exchange ratio at respiratory compensation point; HRRCP, heart rate at respiratory compensation point; VERCP, pulmonary ventilation at respiratory compensation point; fRRCP, respiratory frequency at respiratory compensation point; [La−]bmax, lactate concentration at respiratory compensation point; VO2max, maximal oxygen uptake; RERmax, maximal respiratory exchange ratio; HRmax, maximal heart rate; VEmax, maximal pulmonary ventilation; fRmax, maximal respiratory frequency; [La−]bmax, maximal lactate concentration; VAT, velocity at anaerobic threshold; VRCP, velocity at respiratory compensation point; Vmax, maximal velocity; VS, protocol starting velocity. Comparisons between subgroups (*p*-value) were obtained by Student *t*-test for independent variables.
