• Primary outcome measures

There were five studies (Kiviniemi\_2007, Kiviniemi\_2010, Nuuttila\_2017, Schmitt\_2017 and Vesterinen\_2016) with significant intragroup VO2max improvements in the HRV-guided training group (*n* = 95), while no significant changes were found in Javaloyes\_2019 (*n* = 9). On the other hand, in three studies (Kiviniemi\_2010, Nuuttila\_2017, and Vesterinen\_2016), there were also significant intragroup VO2max improvements in the control group (*n* = 47). The overall risk of bias was considered high in every study but for Javaloyes\_2019, which was considered unclear. A random-effects meta-analysis of the six studies revealed a statistically significant (*p* < 0.0001) treatment effect for VO2max in the HRV-guided training intervention (ES = 0.402; 95% CI = 0.273, 0.531). Moreover, the other training intervention was also statistically beneficial (*p* < 0.0001) for VO2max improvements in the control group (ES = 0.215; 95% CI = 0.101, 0.329). However, the ES for the VO2max was significantly higher (*p* < 0.0001) in the HRV-guided training group. The heterogeneity observed in the meta-analysis was significant and high in the overall analysis (*p* < 0.0001; I<sup>2</sup> = 94.24%) and for the experimental (*p* < 0.0001; I <sup>2</sup> = 9.36%) and the control group (*p* < 0.0001; I<sup>2</sup> = 92.26%) (Figure 3).

**Figure 3.** Standard differences in means (SDM) between post- and premeasures for VO2max in included studies, segmented by the control group (CG) and heart-rate-variability-guided training group (HRV-G). Squares represent the SDM for each trial; the diamond represents the pooled SDM across trials; weight determines how much each individual study contributes to the pooled estimate; 95%CI, confidence interval.

• Moderator analyses

Owing to the high heterogeneity observed in the meta-analysis, the potential moderating effect of the following was considered to be of interest: (a) the athletes' level (elite vs. amateur) and (b) the sex of the participants ('men vs. women' vs. 'men and women'). We had originally planned to take into account the intervention duration; however, it was not finally included as a subgroup owing to there being only one study that considered an intervention period of 15 days (Schmitt\_2017) while the others conducted an eight-week intervention. The sample size was used for the metaregression. Following the moderating variables (Table 3), the athletes' level (elite vs. amateur) brought about statistically significant improvements (*p* < 0.0001) in both subgroups, while there were statistically significant differences between the subgroups (*p* < 0.0001) in favor of the nonprofessional subgroup (elite, ES = 0.17; amateur, ES = 0.36). According to the sex subgroups ('men vs. women' vs. 'men and women'), there were statistically significant improvements (*p* < 0.0001) in the three subgroups and statistically significant differences (*p* < 0.0001) between the three subgroups in favor of the women (men, ES = 0.33; women, ES = 0.40; men and women, ES = 0.19). The metaregression findings (Figure 4) revealed that the sample size of the studies was directly related to the ES magnitude (regression coefficient = −0.016; standard error = 0.003; lower limit = −0.023; upper Limit = −0.011; Z-value = −5.42; *p* ≤ 0.0001).



Note: SMD, standard mean difference; CI, confidence interval; VO2max, maximal oxygen uptake; I2 = I-squared.

**Figure 4.** Metaregression of the number of participants (sample size) on standard differences in means (Std diff in means).

#### **5. Discussion**

#### *5.1. Summary of Main Results*

Six RCT studies evaluating the effects of an HRV-guided training intervention on endurance athletes were included in this review. The results of the meta-analyses provide some evidence that either HRV-guided training or traditional training may improve their performance in terms of VO2max (HRV-G: ES = 0.402, *p* < 0.0001; CG: ES = 0.215, *p* < 0.0001). However, more favorable outcomes (*p* < 0.0001) for the experimental groups compared to the control groups were recorded across the studies. Moderators indicated larger effect sizes for interventions involving amateur endurance athletes (ES = 0.36, *p* < 0.0001) and women (ES = 0.40, *p* < 0.0001). On the other hand, the sample size of the studies was directly related to the ES magnitude (*p* < 0.0001).

#### *5.2. Overall Completeness and Applicability of the Evidence*

The total sample size of the studies meeting our original inclusion criteria was sufficiently large to warrant restricting the results to a meta-analysis of the RCTs. Data on the primary outcome (VO2max) were measured directly using a gas exchange analysis system and a maximal test in each study. This is the most accurate way to obtain cardiorespiratory data. However, some studies implemented this test using a treadmill (Kiviniemi\_2007, Nuuttila\_2017, Schmitt\_2017, and Vesterinen\_2016) and others using a cycle ergometer (Javaloyes\_2019 and Kiviniemi\_2010). In the first case, training was based on running (Kiviniemi\_2007, Nuuttila\_2017, and Vesterinen\_2016) and skiing (Schmitt\_2017), which implies similar technical execution in the test. In the second case, the Javaloyes\_2019 study was carried out on cyclists, whereas the Kiviniemi\_2010 study sample was composed of runners. Statistical improvements regarding VO2max were found in the Kiviniemi\_2007 and Kiviniemi\_2010 studies. However, the specificity of the test may be a source of variability and potential imprecision in the second study results. Following the training specificity principle [37], the body's physiological and metabolic responses and training adaptations are specific to the type of exercise and the muscle groups involved. Thus, the evaluation method should be as similar as possible to the training in order to obtain the most reliable results. This needs to be taken into account when interpreting the results.

Despite the intervention durations being quite homogeneous in the included studies (eight weeks for each study apart from Kiviniemi\_2007 and Schmitt\_2017), the total duration of the training process, preparation weeks included, endurance sport modality, and training intensities used for the control group (standard training) were different. There was also a marked heterogeneity in the sample of the included studies: elite (Javaloyes\_2019 and Schmitt\_2017) and amateur (Kiviniemi\_2007, Kiviniemi\_2010, Nuuttila\_2017, and Vesterinen\_2016) participants, or samples comprising only men (Javaloyes\_2019, Kiviniemi\_2007, Kiviniemi\_2010, and Nuuttila\_2017), women (Kiviniemi\_2010), or men and women (Schmitt\_2017 and Vesterinen\_2016). A standardized training protocol should be recommended to ensure the optimal benefits regarding VO2max.

#### *5.3. Quality of the Evidence*

The quality of the evidence from the included studies can be considered unclear. Despite each study being a randomized controlled trial, the sequence generation or the allocation concealment was considered skewed in half of them. The performance bias was high only in Javaloyes\_20019, while the detection bias was unclear in all the studies because incomplete blinding was considered. Attrition was high in Kiviniemi\_2010, Nuuttila\_2017, and Vesterinen\_2016 because of the high follow-up rates. In addition, the reporting bias was generally unclear due to the lack of a registered protocol.

#### *5.4. Potential Biases in the Review Process*

Although the systematic nature of the review process followed here decreases the potential for bias, the risk of bias in the review process remains. The greatest risk of bias present in this review was the study selection; specifically, the decision to limit the inclusion criteria to individual endurance sports, thus reducing the number of studies included and causing a potential limitation in the results.
