*2.5. Levels of Evidence*

Three evidence levels [27] were constructed: (1) strong evidence: consistent findings in three or more high-quality studies; (2) moderate evidence: consistent findings in two high-quality studies; and (3) limited evidence: consistent findings in multiple low-quality studies, inconsistent results found in multiple high-quality studies, or results based on one single study. The degree of criterion-related validity of the field-based fitness test will be discussed for those tests on which we found strong or moderate evidence that the test is (or not) valid. The results of low- or very low-quality studies can be seen in the Supplementary Material 2.

#### **3. Results**

The literature search yielded 9202 and 27 additional records were identified through other sources (see the PRISMA flowchart in Figure 1). After the removal of duplicate references (1805 studies), and the screening of titles and abstracts (7233 studies), we excluded 9038 studies. A total of 博191 full-text studies were assessed for eligibility, and 85 studies (six systematic reviews) were excluded due to reasons indicated in Figure 1.

**Figure 1.** Flow chart of retrieved and selected articles.

Finally, a total of 101 original studies (see Supplementary Table S3) addressed the criterion-related validity of field-based fitness tests in adults aged 19–64 years. The sample size involved 10,632 participants (see Supplementary Table S4). Eighty-six and seventyeight original studies reported female (*n* = 5539) and male (*n* = 4722) sample proportions, respectively; however, in 7 seven studies, sex was not specified.

A total of four meta-analyses [28–31] and one systematic review [32] were included in the present systematic review (see Supplementary Table S2). The sample size involved 9985 participants with ages ranging from 19 to 64 years (see Supplementary Table S5).
