1. Introduction
Functional Performance Tests (FPTs) analyze performance aspects, functional abilities, and dysfunctional movement patterns [
1,
2]. They enable the investigation of diverse physiological functions, including flexibility, endurance, strength, balance, coordination, and motor control, in different body regions [
3,
4]. Evidence shows that the functionality and integrity of the lower extremity posterior chain (PC) are essential for sports performance [
5,
6]. The PC comprises the structures of the so-called superficial backline of the lower extremity, namely the myofascial structures of the planar foot, the calf, the dorsal thigh, and the gluteal area. The Bunkie Test (BT) and the isokinetic measurement of the knee muscles are applied to test PC structures [
7,
8], mainly the hamstrings.
The BT was initially designed to detect imbalances of musculoskeletal chains linked via connective tissue [
7] and aims to assess the structures of the PC, mainly the biceps femoris muscle of the tested leg, the gluteal muscles of both sides, and the contralateral back muscles [
9]. Therefore, in contrast to other FPTs, which mostly assess isolated muscles or muscle groups, the BT accounts for the fact that muscles generally function in chains with their surrounding connective tissues [
10,
11]. The test is easy to understand and conduct; it is neither time- nor cost-intensive, and no special equipment is needed [
9,
12]. The BT is, therefore, frequently applied in daily practice [
1,
13,
14,
15]. Nevertheless, the initial study description lacked precise clarification regarding the standardized procedure for conducting the test [
7]. Hence, existing studies show differences in the testing procedure, test conduction, and evaluation, which led to incomparable performance results, inconsistent test conduction, and missing normative values. Additionally, there is a shortage of studies that further investigate the quality criteria of the test, such as reliability [
1,
3,
9,
13,
14,
15].
In contrast to the BT, isokinetic testing of the lower extremities is an established and effective method, widely considered a valid ‘gold standard’ and well described in prior studies [
16]. Opposed to the BT (i.e., PC), for the isokinetic measurement, the evaluation of muscle strength balance between the knee extensors and flexors during concentric movement is a standard procedure known as the hamstrings/quadriceps ratio (H/Q ratio), defining the ratio of the concentric muscle peak force [
17,
18,
19]. Often, the correct value of the H/Q ratio is a goal to reestablish proper muscle balance and stability of the knee joint [
19]. Isokinetic dynamometer measurements are safe, but, in contrast to the BT, they are relatively expensive and require space, time, and expertise [
20,
21,
22], making them often unsuitable for applicability in daily clinical or sports practice [
23].
Due to the different advantages, both tests have their own justification for application in screening and assessment. Although the two tests aim to test other structures (isolated muscles versus total PC) and modalities (e.g., peak force versus endurance), both are applied to, i.e., detect potential musculoskeletal dysfunctions in patients with self-reported knee pain. Yet, it is unclear whether the tests show similar results.
Therefore, the main goal of the study is to evaluate the diagnostic accuracy of the two tests in patients with self-reported pain in the lower extremity (i.e., knee area) and healthy controls by comparing the sensitivity and specificity of the index test (i.e., BT) with the standard assessment (i.e., isokinetic measurement). Further, to contribute to the existing body of literature, we additionally report reliability measures of the BT of a preceding test trial. We defined the primary hypothesis as H0: The probability of obtaining a positive test result (defining criteria, see methods) is the same for patients with dysfunctions in the lower extremities and healthy participants for both tests. The secondary hypothesis is H0: The investigated index test (i.e., BT) shows similar results in terms of sensitivity and specificity compared with the standard test (i.e., isokinetic measurement, H/Q ratio).
2. Materials and Methods
This cross-sectional study on diagnostic accuracy in an unpaired, between-subjects design [
24,
25] was conducted in accordance with the Declaration of Helsinki and followed all governmental and hygienic guidelines concerning the COVID-19 pandemic. The university’s ethical committee approved the study protocol (209/21 S-KH). The study was registered in the German Clinical Trials Register (DRKS S00024076). All participants provided written, informed consent before testing. To improve the quality of reporting, the study follows the Statement for Reporting Studies of Diagnostic Accuracy (STARD) [
26]. Further, we refer to the studies by Hess et al. [
27] and Sitch, Dekkers, Scholefield, and Takwoingi [
25] for analyzing and reporting the results. As the study did not receive any additional funding, no funders played a role in the design, conduct, or reporting of this study.
2.1. Participants
Patients with self-reported pain in the lower extremities (knee area) (details see
Table A1) and healthy participants (all male/female/diverse, 18–40 years old) were included in this study. Patients and healthy participants were all recreationally active (
Table 1) [
28,
29]. Participants and patients were excluded if they had current musculoskeletal pain in the shoulder girdle, the neck, or the elbows and other nonspecific musculoskeletal disorders, e.g., rheumatic disorders. In addition, participants were excluded if they were pregnant, in the nursing period, diagnosed with a neurological disorder, or took medication that affects perception or proprioception. Participants’ characteristics are shown in
Table 1.
For estimating the sample size, we assumed—based on our primary hypothesis—that with α = 0.05 and β = 0.8, 80% of the patients with pain can be detected correctly with the index test (sensitivity), and 30% of the healthy participants are considered as such (specificity). This would result in a sample size of 19 healthy participants and 19 patients. With an add-up of 10% to account for potential dropouts, our total sample consisted of 42 participants.
2.2. Study Procedure
Patients were allowed to participate in the measurements, consisting of one 60-min session, if they met the inclusion criteria and gave informed consent. The data from the healthy participants was collected in another study (DRKS00027923), where both evaluated tests were performed—amongst others—and included in this study only if participants explicitly agreed on that in the informed consent. Study participants were instructed to avoid heavy physical exercise and alcohol drinking 24 h before the examination. Further, they should not drink caffeine or smoke and refrain from eating heavy meals two hours before the intervention [
30]. Participants and patients warmed up with five minutes of cycling at 80 W at a self-selected cadence. Participants performed the two tests in a random order (coin tossing) with a ten-minute break in between [
31]. For the participants, the respective first leg assessed was randomly allocated as the reported dominant or non-dominant leg; for the patients, it was always the dominant leg first. The dominant leg was determined as the ‘preferred leg to kick a ball with’. All measurements were conducted and supervised by a trained physiotherapist with more than seven years of practical experience who had already performed the test numerous times, for example, during previous studies on the topic [
9,
13]. The examiner was blinded concerning the affected leg/side. All participants were blinded concerning the study’s outcome and did not receive further information concerning the respective testing methods a priori.
2.3. Measurement
2.3.1. Bunkie Test
We instructed and conducted the standardized version of the BT for the so-called posterior power line, which comprises the structures of the PC, based on our prior study [
9] (
Figure 1). For the BT, participants placed their forearms on a mat with the shoulders directly above the elbows and their heels on a box with a height of 20 cm, and both legs straightened. Participants were instructed to continue constant breathing during the test, to avoid breath holding, and to immediately report any feeling of fatigue, burning, cramping, pain, or strain in the muscle. To assess the dominant leg, participants lifted their pelvis to a neutral position, marked with a rubber band, stretched horizontally between two fixed stators (
Figure 1). Then, they raised the non-dominant foot off the box, where the height was visually referenced with a 10-cm box. Performance was measured in seconds with a stopwatch when the contralateral leg was lifted. The test stopped when the participant either reported any sensation of pain or cramping or ended the test due to fatigue. If participants started to deviate from the neutral standardized body position, they were verbally corrected by the examiner and were allowed to adjust the position once for each body part. The test was halted if there were additional deviations from the neutral position or if participants were unable to either adjust to or sustain the neutral position any longer. After a one-minute pause, the testing procedure was repeated for the contralateral leg [
1,
3,
7,
9,
15].
2.3.2. Isokinetic Testing
The isokinetic strength testing (Isomed2000, D & R Ferstl GmbH, Hemau, Germany) for the knee flexor and extensor muscles was performed over a range of motion of 90° (0° to 90° knee flexion; 0° = entirely straight leg) at an angular speed of 60°/s and 120°/s. Participants were positioned on the seat (backrest inclination of 10°) and fixed with straps over the shoulders, across the waist, and over the middle of the thigh to avoid unwanted movement. The axis of rotation of the dynamometer was carefully visually aligned with the knee’s axis of rotation. Before the test session, participants got detailed instructions concerning the individual test and did a familiarization trial for each condition, consisting of five submaximal dynamic contractions. For the respective test, for each velocity, one set of five maximal voluntary concentric flexion and extension contractions was performed consecutively. Between each warm-up and test trial, there was a three-minute break. The angle and torque values (Nm) were captured with proEMG at 1000 Hz (prophysics AG, Kloten, Switzerland) and further processed in Matlab (R2020b, The MathWorks Inc., Natick, MA, USA). The mean peak torque value between the second and fourth repetitions (flexion/extension) was used for further analysis. These values were used to calculate the concentric hamstrings/quadriceps ratio (H/Q ratio) [
17,
32,
33,
34].
2.3.3. Statistical Analyses
For statistical analysis, the software R (R version 3.5.1) was used [
35]. All participants characteristic variables and test data were normally distributed (Shapiro–Wilk test and Levene test for equality of variances). We calculated the mean and standard deviation (SD) and compared the differences between the groups’ participants’ characteristics with the Student’s
t-test. Further, the effect sizes (Cohen’s d) were calculated and interpreted as d (0.01) = very small, d (0.20) = small, d (0.50) = moderate, d (0.80) = large, d (1.20) = very large, and d (2.00) = huge [
36].
For diagnostic accuracy, the statistical models proposed by Knottnerus and Muris [
37] were used. Therefore, the main dependent variables are considered binary outcomes. For the BT, one of the primary outcomes is the myofascial performance of the PC, which is measured via performance duration in seconds. For comparability of the tests, we performed a binary classification of the BT, where every outcome with a correct identification of the dysfunctional leg (lower performance value) indicates a positive test result. De Witt and Venter [
7] propose that side differences of ≥4 seconds would indicate malfunctioning of the tested myofascial line and therefore have to be considered as clinically relevant outcomes, which we analyzed separately for patients. For participants, such performance differences between legs (≥4 s) were considered a positive test result [
7]. Similarly, for the isokinetic measurement, the binary outcome, a H/Q ratio of ≥0.6, was proposed to indicate proper musculoskeletal functioning, as expected in healthy participants, and therefore indicates a negative test result [
17,
33,
34]. If the H/Q ratio of one leg (healthy versus injured for patients and left versus right leg for participants) was ≥0.6 and the other leg < 0.6, this was a positive test result. We further describe and discuss the rationale for the binary classification in the discussion section.
The binary classification results of the BT and the isokinetic measurement test are shown via a 2 × 2 table. The sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), as well as their respective 95% confidence intervals (CI), are calculated for both of the two tests with the epiR package of the statistical software R (R version 3.5.1) [
35]. The sensitivity is therefore calculated as A/(A+C), the specificity as D/(B+D), the PPV as A/(A+B), and the NPV as D/(C+D), where A indicates a correct positive, B a false positive, C a false negative, and D a correct negative test result. The sensitivity and specificity are compared with the Chi-Square test with Yates’ continuity correction. A
p-value of ≤0.05 is considered statistically significant.
4. Discussion
For the BT, the patients’ results differed significantly between the healthy and injured legs, whereas no differences were detected for the healthy participants. In contrast, neither the participants’ nor the patients’ isokinetic measurement results differed between legs. The results of the Chi-Square tests were only significant for the BT, which means that the probability of obtaining a positive test result is not the same for patients with dysfunctions in the lower extremities and healthy participants. Further, unexpectedly, the results for sensitivity and specificity were better for the index test (BT) compared with the gold standard test (isokinetic measurement).
4.1. Diagnostic Accuracy of the Investigated Tests
As proposed by Villafane, Gobbo, Peranzoni, Naik, Imperio, Cleland, and Negrini [
12], the validity of our study was defined as the tests’ ability to discriminate between patients with knee pain and healthy participants. Diagnostic accuracy allows a classification of the current health status (e.g., impaired vs. healthy) and is defined as ‘the amount of agreement between the results from the index test and those from the reference standard’ [
24,
38]. It is, therefore, highly relevant for the practical applicability of the test as an assessment tool [
24].
4.1.1. Isokinetic Measurement
In our study, we did not find a difference in H/Q ratios between patients and participants or between the healthy and injured legs in patients using a cut-off value of 0.6. For the H/Q ratio, values between 0.52 and 0.67 are considered optimal. Mandroukas et al. [
39] summarize that existing study results for the H/Q ratio vary from 0.5 to 0.83 [
8,
40]. Grygorowicz, Kubacki, Pilis, Gieremek, and Rzepka [
33] provide 0.6 as the normative value at the H/Q ratio for 60°/s, and in their review, Kellis, Sahinis, and Baltzopoulos [
8] stated that the conventional ratio values at 60°/s are around 0.6. They found the cut-off values for the traditional ratio at 60°/s varying from 0.47 to 0.66 [
8]. A lower limit value than 0.6 led, according to Dauty et al. [
41], to fewer predictive possibilities. Although the chosen cut-off point of 0.6 seems to be arguable according to prior literature [
8,
18,
33,
39,
41,
42,
43,
44], in general, the optimum value for the H/Q ratio depends on angular velocities, meaning the greater the angular velocity, the higher the H/Q ratio value [
33,
40]. Further, for more functional ratios at 60°/s, the reported H/Q ratio is around 0.8 [
8]. Nevertheless, as descriptive test data did not differ greatly between the group of patients and healthy participants, except for the average maximum knee flexion torque at 120°/s, it is unlikely that changes in the chosen cut-off point would have altered the results.
4.1.2. Bunkie Test
De Witt and Venter [
7] report that side differences of ≥4 s are noticeable, meaning an increased injury risk, and should be followed by specific rehabilitation and temporary exclusion from professional sports. Nevertheless, the proposed normative values of prior studies vary. De Witt and Venter [
7] suggest a typical holding time of 20–40 s, with only endurance athletes likely reaching the 40-s mark. In contrast, Brumitt [
1] reports an average value of 40 s for an atypically healthy population. These inconclusive reports are the reason why a standardized test version was recommended in a prior study [
9].
4.2. Reliability of the Applied Tests
4.2.1. Isokinetic Measurement
The test-retest reliability of the isokinetic measurement for the lower extremity was reported to be good to excellent in previous studies [
22,
45]. Additionally, Habets et al. [
46] reported good intra-rater reliability. For the H/Q ratio specifically, Mau-Moeller et al. [
47] reported a moderate-to-high intra-session reliability for conventional ratios.
4.2.2. Bunkie Test
In contrast to the isokinetic measurement, there is a lack of studies investigating the reliability of the BT [
9], which is why we investigated the inter-rater reliability in a pre-project (DRKS S00023801). We assessed whether the examiner’s level of experience influences the BT results. Therefore, a physical therapist with ten years of clinical experience and a sports science student of the bachelor program with no further education (both blinded) rated the performance in the BT of 20 participants (healthy (9) or with current pain in the lower limb or back (11)). The inter-rater reliability (ICC3) was calculated for each tested leg separately and was based on a single-rating, consistency, two-way mixed effects model using the software R (version 3.5.1) [
48,
49,
50]. ICC results were classified as <0.50 poor, 0.50–0.75 moderate, 0.75–0.90 good, and >0.90 excellent [
49]. To further assess whether the examiners identified participants with dysfunctions correctly, we compared the performance results between the legs with the Wilcoxon signed rank sum test with continuity correction, as the data were not normally distributed, which we tested for with the Shapiro–Wilk test. We included all 20 participants in the final analysis (9 m/11 f; age: 25.8 ± 3.5 years; height: 175.7 ± 8.0 cm; weight: 71.7 ± 12.1 kg). The test results for the left leg (ICC 0.28) and the right leg (ICC 0.14) showed poor inter-rater reliability. The experienced examiner rated the left and right leg significantly differently in patients (
p = 0.022) but not in healthy participants (
p = 0.051). In contrast, the students’ ratings did not differ between the legs for patients (
p = 0.674) and healthy participants (
p = 0.560), from which we conclude that the experienced examiner correctly identified the injured/dysfunctional side. Therefore, we ensured that participants and patients were rated by the same experienced physical therapist in this study during all measurements.
Although the experienced examiner correctly identified the leg with dysfunction in a population of healthy participants and patients, this was not the case for the inexperienced examiner. In contrast to the isokinetic measurement, the inter-rater reliability was poor. Nevertheless, if an experienced examiner conducts the test, there were significant differences in performance ratings between the healthy and injured legs in patients in this study. In addition, the patients with dysfunctions in the lower extremities were more likely compared with healthy participants to obtain a positive test result, which supports the application of the BT as an additional assessment tool to identify potential dysfunctions in the lower extremities. Further studies should investigate the inter-rater reliability between two examiners with a similar level of experience and the intra-rater reliability.
4.3. Comparison of the Two Applied Tests: Testing the Posterior Chain
In this study, we compared two FPTs, which are said to be able to detect injuries and potentially deteriorating motor functions in the area of the PC. Yet, the tests differ in their aim, set-up efforts, and conduct. Therefore, it is interesting to compare the two tests regarding their application as FPTs for the PC. It must be noted that the BT is an isometric holding test, whereas the isokinetic testing in our study was a concentric muscle movement. Further, the BT investigates the total PC, whereas the isokinetic measurement claims to test the knee flexors and extensors’ maximum torque value precisely.
An advantage of the BT over the isokinetic measurement is that it tests various connected PC muscles, which are activated during the test [
9]. Currently, commonly applied isometric trunk holding tests were designed and are validated, similar to the isokinetic measurement of the knee flexors, to test only specific muscles (e.g., the Sorensen test for evaluating the back extensor muscles and predicting low back pain [
12,
51,
52,
53], or the prone bridging test for the abdominal muscles [
54]). Nevertheless, current findings that muscles work together in chains linked via the surrounding connective tissue [
10] and co-activation of the PC muscles must be taken into account [
10,
11] (i.e., also during the Sorensen test, there is co-activation of the gluteal and the hamstrings muscles [
51]). This leads to the assumption that the BT might cover findings that are not considered by the isokinetic measurement. This could explain the differences between the tests found in our study.
Another promising, recently proposed test is the standing 90:20 isometric posterior chain test, which evaluates the applied force of the total chain on a pressure plate [
55,
56]. Still, both the BT and the standing 90:20 isometric posterior chain test lack standardized assessment concerning the interpretation of the results (i.e., norm values) and comparison studies with well-established performance tests. In addition, we would recommend that further studies investigate other injuries and orthopedic disorders of the PC, e.g., ankle sprain or hamstring strain.
4.4. Limitations
We are aware that this study has several limitations. First, there might be some form of spectrum and selection bias, which occurs when there are, e.g., more advanced cases in the population than in the study sample. There might have also been light prior injuries or myofascial dysfunctions in healthy participants, which they might not have noticed and would be against classifying them as ‘healthy’ [
37]. Yet, this is closely linked to screening and assessment in daily practice, where there is never a completely clear distinction between healthy persons and persons with, i.e., slowly progressing myofascial imbalances over time.
In our study, we mostly referred to ‘soft’ measures when categorizing patients and healthy participants. Although Knottnerus and Muris [
37] recommend additionally including ‘harder’ investigations (e.g., X-rays) in diagnostic studies, it is common in daily clinical and sports practice that therapists and practitioners cannot refer to such additional diagnostic material. Further, for most non-traumatic orthopedic diseases (i.e., low back pain), such ‘hard investigations’ (i.e., imaging techniques) are not recommended in the first line by clinical guidelines [
57], as symptoms and objective findings often do not match [
58]. By referring to self-reported measures instead (i.e., pain), we additionally addressed observer bias, which depends on physicians’ ability to accurately detect, e.g., a history of an ankle sprain [
37]. For inter- and intra-individual test comparison, when assessing the force of the lower extremities’ PC, it is not only recommended to differentiate between the dominant and the non-dominant leg but also to standardize the test results with regard to the person’s body weight [
59,
60]. As there is a lack of proposed procedures for standardization, at least for the BT, we did not apply this procedure.
We are aware that there is insufficient data concerning the reliability of the BT. We addressed concerns regarding inter-rater reliability by choosing one experienced examiner to evaluate the test. Missing data concerning the intra-rater and inter-measurement reliability of the BT must be considered a limitation when interpreting the results of this study.
In addition, it could have been advantageous for the H/Q ratio to prefer a specific range instead of 0.6 as the cut-off value. Yet, we chose 4 s (no range) as the cut-off value for the BT. Therefore, to increase comparability, we decided to go for a cut-off value (i.e., 0.6) instead of a cut-off range for the isokinetic measurement, too.
4.5. Clinical Implications
The BT serves as a FPT that proves to be easily applicable in both routine clinical and sports practices due to its minimal resource requirements and time efficiency. Despite its frequent utilization, the BT lacks comprehensive, high-quality evaluation studies. In contrast, isokinetic measurement is currently considered the gold standard, providing precise data that aligns with a vast body of prior research. However, its applicability in professional sports settings is hindered by its reliance on expensive equipment and specialized expertise.
This study demonstrates that the BT may present a straightforward, cost-effective, and time-efficient alternative for assessing potential dysfunctions in the lower extremity. Consequently, athletic trainers, physiotherapists, sports scientists, or medical doctors may incorporate this test in screening, evaluating rehabilitation progress, and making decisions regarding the return to sports. It is crucial to emphasize that accurate BT assessment demands expertise. Furthermore, none of the examined tests should be used as stand-alone diagnostic criteria; instead, they should be regarded as additional tools in the overall assessment process.