Distance and Time-Based Run/Walk Tests

Seventeen high-quality studies examined the criterion-related validity of the distance run/walk or walk tests (see Supplementary Table S4). Four and two studies showed that the 2 km walk [36–39] and 1.5-mile run/walk [40,41] tests, respectively, were valid for assessing cardiorespiratory fitness (r = 0.80–0.93, all *p* < 0.05). Four studies [42–45] observed that the 1-mile walk test was an accurate test for estimating VO2max (r = 0.81–0.88, all *p* < 0.05), while another two studies [46,47] showed that it was not a valid test (r = 0.69, 13.3% E, *p* < 0.05; mean differences range from 2.360 to 9.131 mL/kg/min, all *p* < 0.001, respectively). The treadmill jogging test reported contradictory results: one study [48] found it to have high validity for assessing cardiorespiratory fitness (r = 0.84, both *p* < 0.001); whereas another study [41] revealed that it was not a valid test (r = 0.50, *p* < 0.05).

Five high-quality studies investigated the criterion-related validity of the time-based run/walk or walk tests (see Supplementary Table S4). These studies showed that the 3 min walk, [49] 6 min walk, [50–52] and the 12 min run/walk [41] tests were valid for assessing cardiorespiratory fitness (r = 0.70–0.95, all *p* < 0.05). Additionally, one original high-quality study reported that the University Montreal test [53] was valid for estimating cardiorespiratory fitness (r = 0.71, *p* < 0.001; mean difference = 0.025 ± 7.445 mL/kg/min., *p* > 0.05).

A meta-analysis [30] consisting of 102 studies on adults determined that the criterionrelated validity of the distance run/walk field tests for estimating cardiorespiratory fitness ranged from low to high, with the 1.5-mile (rp = 0.80; 95% CI: 0.72–0.80) and 12 min run/walk tests (rp = 0.79; 95% CI: 0.71–0.87) being the best predictors (see Supplementary Table S5).

#### Twenty-Metre Shuttle Run Test

Nine high-quality studies analysed the criterion-related validity of the 20 m shuttle run test [41,54–58] or modifications of it [55,57,59–61] (see Supplementary Table S4). Four studies [41,55–57] reported that the 20 m shuttle run was a valid test for assessing cardiorespiratory fitness (r = 0.82–0.94, all *p* < 0.05). However, one study [58] concluded that this test was not valid for assessing cardiorespiratory fitness (mean differences range from −0.54 ± 6.23 to −2.94 ± 6.55 mL/kg/min, all *p* < 0.01). Two studies [59,60] proved that the incremental shuttle walk test was not valid (r = 0.72, 19% E, both *p* < 0.001), while one study [61] found that this test was valid for assessing cardiorespiratory fitness (mean difference = 0.14 ± 9.27mL/kg/min, *p* > 0.05). Moreover, two studies [55,57] reported that the 20 m square shuttle run test was valid (r = 0.95, both *p* < 0.001).

A meta-analysis [28] which included 24 studies in adults found that the 20 m shuttle run test had a moderate-to-high criterion-related validity for estimating VO2max (rp = 0.79–0.94; 95% CI: 0.56–1.00) (see Supplementary Table S5).

#### Step Tests

Eleven high-quality studies analysed the criterion-related validity of the step tests (see Supplementary Table S4). Four studies observed that the Danish step [62], the Queen's College step [63], and the 2 min step [64] tests were not valid for estimating VO2max (r = 0.034–0.72, all *p* < 0.05). However, another eight studies proved the validity of the modified Canadian aerobic fitness [65], 6 min single 15 cm step [66], YMCA step [67–71], Tecumseh step [70] and modified Harvard step [72] tests (r = 0.80–0.91, all *p* < 0.05).

A systematic review [32] comprised of 11 studies on adults investigated the criterionrelated validity of the step tests (see Supplementary Table S5). Validity measures were varied, and a broad range of correlation coefficients were reported across the 11 studies (r = 0.469–0.95; all *p* < 0.005) with conflicting results in most of the step test protocols. The study concluded that the Chester step test was the best predictor for assessing cardiorespiratory fitness.

#### 3.2.2. Muscular Strength

Table 2 shows a summary of the different levels of evidence found for the criterionrelated validity of muscular strength, flexibility and motor fitness tests.


**Table 2.** Levels of evidence of muscular strength, flexibility and motor fitness tests.

Indicates high validity; moderate validity; low/null validity; inconclusive validity.

Maximal Isometric Strength

Four high-quality studies assessed the criterion-related validity of hand maximal isometric strength, using the handgrip strength tests (see Supplementary Table S4). Three high-quality studies reported that the TKK dynamometer [73–75] was valid (mean difference range −0.20, *p* > 0.05 to 2.02 kg *p* < 0.001) (r = 0.98, *p* < 0.001). However, three studies showed inconclusive results about the validity of the DynEx dynamometer [73,75,76], and two studies observed that the Jamar dynamometer [73,76] was less accurate than the TKK and DynEx dynamometer for estimating hand maximal isometric strength.

### Endurance Strength

Four high-quality studies assessed the criterion-related validity of trunk endurance strength (see Supplementary Table S4). Two studies [77,78] suggested that the Biering– Sørensen (r = 0.84–98, *p* < 0.01) test was valid, whereas another study [79] reported acceptable validity (r = 0.60–0.71, *p* < 0.05). One study showed that the prone bridging test [80] was valid for assessing trunk endurance strength (no mean difference, *p* > 0.05).
