*2.4. Extracting Features*

In this study, PCA was adopted as a statistical method for data dimensionality reduction. The main algorithm of PCA is to map the original n-dimensional data to a new k-dimensional feature that retains the largest variance. Thus, the other parts where the variance is close to zero can be ignored and the loss of information is guaranteed to be small. The flow of PCA is as follows:


Therefore, data are reduced to k dimensions. If a new sample (1 × n) needs to be predicted, perform the same de-average operation first (the centered sample = the new sample—the variable mean of *G* from Step 2). Then, the centered sample × PCcoeff produces the predicted value of the sample after the same processing, and the first k columns can be selected as the features of the new sample.

Inspired by the NI index [28] and the strength score [29,30], the first three features used in the calculation are the average force of each muscle during the stance/swing phase of each participant and the value of each person's knee flexion during the swing phase. The specific process was to first obtain the average value of the stance/swing phase of 13 muscle forces and then perform PCA on the average muscle force to obtain a column variable. PCA was also used to process the knee flexion data in the swing phase to obtain a column variable. Only one column variable for each feature after PCA was used because the information content was sufficiently large. For the data of knee flexion in the stance phase and other columns after PCA, their regression parameters in the following multivariable regression were not significant and did not affect the final accuracy.

#### *2.5. Composite Index*

In addition to the above features, a composite index containing the data of the knee flexion and muscle forces' characteristic points, which were the time points in the gait cycle when the most significant difference was observed between the two groups, was used in this study. Comparing the ACLD and Control groups based on knee flexion and muscle force data for all participants, *t*-tests were performed at each point during a 0–100% gait cycle. The data of the characteristic points (*p* < 0.05 and *p* values were minimal) were finally filtered out as a matrix to calculate the composite index. The selection method of the characteristic points is shown in Figure 1 (using the rectus femoris as an example), and the filtered characteristic points are shown in Figure 2. The muscle force/knee flexion at these characteristic points were filtered out to form a matrix, where the rows of the matrix were the number of participants and the columns of the matrix were the number of characteristic points. Finally, PCA was used to process this matrix to select the representative columns as features.

**Figure 1.** Example of *p* values between the ACLD and Control groups (taking the muscle force of rectus femoris as an example). The red dotted line means that the *p* value equals 0.05. The black circle is the selected characteristic point used to calculate the composite index, which is the minimum value in the range of significant differences (*p* < 0.05). ACLD, anterior cruciate ligament deficiency.

**Figure 2.** The filtered characteristic points, which were the time points in the gait cycle when the most significant difference (*p* < 0.05 and *p* values were minimal) occurred between the ACLD and Control groups according to the two-sample *t*-test method. Characteristic points are shown at their corresponding points during the 0–100% gait cycle. ACLD, anterior cruciate ligament deficiency.
