*2.6. Statistical Analysis*

Using the above variables and samples, we built a multivariable linear regression model. The general form of this model is:

$$Y\_{\text{diagonis}} = \beta\_0 + \beta\_1 X\_1 + \beta\_2 X\_2 + \dots + \beta\_q X\_q \tag{3}$$

The outcome variable *Ydiagnosis* was a diagnosis of whether the participant was a patient with ACLD, such that positive values correspond with patients with ACLD and negative values correspond to the Control group. In the data used for training, the ACLD group had *Ydiagnosis* = 1 and the Control group had *Ydiagnosis* = −1. *X<sup>i</sup>* are the predictive features in the data obtained above and *β<sup>i</sup>* are the linear weighting coefficients for the predictive features. The formulation and prediction of the model were conducted in MATLAB.

#### **3. Results**

For these 43 samples (the affected legs of the 25 patients in the ACLD group and both legs of the 9 participants in the Control group), the number of variables selected will affect the final accuracy. The final prediction accuracy changes with feature selection changes are shown in Table 2, where 5-fold cross-validation was used to estimate the predictive ability of the regression model. The second column showed whether only the composite index was used as features. If not, the first column showed the first three features (knee flexion and mean values of muscle force during the stance/swing phase) + the number of features retained in the composite index. If yes, the first column showed only the number of features retained in the composite index. The composite index produced eight features when 90% of the PCA information content was preserved. Therefore, when just using the composite index as features and considering all eight features produced by the composite index, the maximum accuracy achieved was 81.4%. The last column showed the *p* values in the *t*-test for the coefficients of the features of the composite index during regression. The smaller the *p* value, the more significant the corresponding feature. *p* < 0.001 indicated very significant findings and was replaced by 0.001 in Table 2. When using the first 3 features + the composite index, the accuracy gradually increased as the features produced by the composite index increased. When the features produced by the composite index were more than three, the accuracy remained the same and the *p* value of the newly introduced features increased and was not significant. When the features produced by the composite index were equal to five, the *p* value of the last feature was 0.999, indicating that the newly introduced feature had no new information. For a comprehensive comparison, the optimal condition was to select six features (the first three features + three composite index features), and the accuracy rate after 5-fold cross-validation was 81.4%. For comparison and validation, under the condition of using only three composite index features, the accuracy was 79.1%.

The classification ability evaluation of the optimal condition is shown in Table 3. The actual results of classification and the accuracy, precision, recall, specificity, and F1-score were used to evaluate the classification ability of the regression model under the optimal condition. Most of the actual results were correctly classified. All evaluation criteria were above 80%, which proved the good performance of the regression model.

Finally, multivariable linear regression was performed on all samples, and the resulting model is shown in Table 4. In Table 4, the coefficients of the average muscle force during the swing phase, Composite Index 1, and Composite Index 2 were negative and their absolute values were the largest among all coefficients. The *p* value of Composite Index 1 was less than 0.001, the *p* value of Composite Index 2 was 0.006, and the *p* value of Composite Index 3 was 0.208. The overall *p* value of the regression model was less than 0.001.


**Table 2.** Accuracy changes as the number of features change.

If the composite index is used as features only, the first column shows 0 + the number of features retained in the composite index. If not, the first column shows three features (knee flexion and mean values of muscle force during the stance/swing phase) + the number of features retained in the composite index. *p* < 0.001 indicates very significant results and is replaced by 0.001 in the last column.

**Table 3.** Evaluation criteria for the classification ability.


TP = true positive, samples are classified as positive (ACLD group) and the judgment is correct. FN = false negative, samples are classified as negative (Control group) and the judgment is wrong. FP = false positive, samples are classified as positive and the judgment is wrong. TN = true negative, samples are classified as negative and the judgment is correct. Accuracy = (TP + TN)/(TP + TN + FP + FN). Precision = TP/(TP + FP). Recall = TP/(TP + FN). F1-score represents the harmonic mean of precision and recall. ACLD, anterior cruciate ligament deficiency.

**Table 4.** Multivariable linear regression model of all samples.


<sup>a</sup> *p* value for the *t*-test of each regression coefficient. <sup>b</sup> *p* value for the F-test on the model. Composite index 1–3 represent the first three features of the composite index, respectively. RMSE, root mean squared error.

#### **4. Discussion**

The multivariable linear regression model using the composite index was able to predict, with 81.4% accuracy, whether participants had ACLD. Under the optimal condition (Table 3), data were well classified, and the evaluation criteria were greater than 80%. Among them, the value of precision was high (87.0%), meaning that the correct proportion of the samples classified as the ACLD group was high. Our model was very capable in diagnosing patients with ACLD. The F1-score was high (83.3%), indicating that our model was effective.

As shown in Table 2, when the composite index was used as features only, the best accuracy of 81.4% was achieved by retaining all eight variables. With only one variable of the composite index, there was still 72.1% accuracy. After importing the first three features and retaining three features of the composite index, the best accuracy of 81.4% was achieved. Therefore, this composite index characterized the information of kinematics and dynamics. Using only the three features of the composite index can also achieve an accuracy of 79.1%. Based on the optimal condition of six features, when more composite index features were imported, the accuracy remained unchanged and the *p* value of the coefficient continued to increase closer to 1, indicating that the introduction of more features was no longer significant. Therefore, the composite index contained more information in the model. Most information in knee flexion and muscle force can be covered in the composite index.

Interpretation of the model must be taken with caution (Table 4). The R-squared value of the regression was 0.542, which indicated that the model was able to explain 54.2% of the variance in the diagnosis of patients with ACLD [30]. For all samples, the optimal features used for regression were significant at the *p* = 0.21 level by the *t*-test. For the F-test on the model, *p* < 0.001 indicated that the fitting process of the model was very significant. The root mean squared error, which estimated the standard deviation of the error distribution, was equal to 0.73, indicating that the model fit well. As shown in Table 4, there was a significant (*p* < 0.001) negative relationship between Composite Index 1 and the diagnostic outcome, with an expected 2.3055 decline in the final outcome for each one-point increase in the first variable of the composite index. Additionally, there was also a significant (*p* < 0.05) negative relationship between Composite Index 2 and the outcome, indicating that the composite index, especially the first two variables, played the most important role in the regression model.

The composite index was determined by the data of the characteristic points in muscle force and knee flexion (Figure 2). Each muscle selected was associated with the knee joint, which aids in understanding gait pathology and planning treatment using gait analysis and biomechanical models [20,30]. As shown in Figure 2, most of the characteristic points in thigh muscles were concentrated at the terminal of the stance phase, which also corresponded to the previous studies, especially decreased quadriceps [6,7] and increased hamstring [7,42,43]. Alternatively, although tibialis triceps were active during the midstance phase, they had more of an impact on ankle dorsiflexion during this period and thus were not significantly different in patients with ACLD [44]. In this study, the muscle force and knee flexion were normalized. Therefore, in further research or clinical diagnosis and treatment, even if the muscle force or knee flexion is obtained in different principles, the characteristic points in Figure 2 can still be directly selected.

With further validation, the regression model can be used to aid clinical practice [30]. Table 5 describes the characteristics of two hypothetical subjects from the ACLD and Control groups. These two subjects have feature values close to the mean of their respective groups. Their expected final outcomes are 0.6164 and −0.4672, respectively, which can be clearly classified into the ACLD group and Control group. Of all the features, the expected improvements in Composite Index 1 and Composite Index 2 have the most impact on the final outcomes. Notably, the subject values of the ACLD group are all less than 0, while the subject values of the Control group are all greater than 0. With *t*-tests between the two groups on the composite index, we were able to obtain *p* < 0.001 for the first variable of the composite index and *p* < 0.05 for the second variable, verifying the validity of the composite index in the regression model and demonstrating that using only the composite index is also a successful evaluation index, similar to the normalcy index [28] and strength score [29,30].

Some limitations of this study should be noted. First, some patients with ACLD also had meniscus injuries. One study [45] has shown that about 40% to 80% of patients with ACLD have a concurrent meniscal injury. Grouping the data more deeply will help improve our accuracy. Second, data quality can be further improved. The compensatory patterns in the knee joint change depending on the time after ACLD. Limited by clinical data, the period of patients' injuries in this study was not concentrated. Unconcentrated data may affect the accuracy of the final results. Third, electromyography (EMG) data can be introduced. EMG can assist in validating the muscle forces obtained from the calculations [46]. In addition, EMG data can be directly involved in the calculation to obtain a composite index. Fourth, the results of the composite index need further validation for the assessment of walking.


**Table 5.** Characteristics of two hypothetical subjects from the ACLD and Control groups.

Composite Index 1–3 are respectively the first three features of the composite index. ACLD, anterior cruciate ligament deficiency.

#### **5. Conclusions**

We built a multivariable linear regression model to diagnose patients with ACLD using a composite index that combined knee flexion and muscle forces. This statistical model and composite index can aid clinical diagnosis. The composite index and characteristic points can help avoid complex subjective diagnosis in clinical practice and can be used to extract effective information more quickly and conveniently for diagnosis.

**Author Contributions:** Conceptualization, Q.R.; methodology, H.L.; software, H.L.; validation, Q.R., H.H. and S.R.; formal analysis, H.L.; investigation, H.L.; resources, H.H. and S.R.; data curation, H.H. and S.R.; writing—original draft preparation, H.L.; writing—review and editing, Q.R. and H.L.; visualization, S.R.; supervision, Q.R.; project administration, Q.R.; funding acquisition, Q.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Natural Science Foundation of China (Grant No: 11872074, 82202821, 31900943), Beijing Municipal Natural Science Foundation (Grant No: L222138), Peking University Third Hospital (Grant No: BYSYZHKC2022119, BYSY2022058, BYSYZD2021012).

**Institutional Review Board Statement:** The study was conducted in accordance with the Declaration of Helsinki and was approved by the Ethics Committee of Peking University Third Hospital (IRB00006761-2012010).

**Informed Consent Statement:** Informed consent was obtained from all subjects involved in the study.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
