*4.2. Multi-Objective Feature Selection*

We perform feature selection in two steps. In the first step, we exclude non-informative time-domain (TD) and frequency-domain (FD) features, and, in the second step, we use multi-objective optimization (MOO) to further refine the selected feature set. We implement five different MOO methods, including our newly proposed method, to find the optimal set of features. The complete set of features includes the following TD and FD features: *F*1: slope sign change (SSC); *F*2: zero crossing (ZC); *F*3: waveform length (WL); *F*4: variance (VAR); *F*5: mean absolute value (MAV); *F*<sup>6</sup> and *F*7: modified MAV (MAV1 and MAV2); *F*8: root mean square (RMS); *F*9: Willison amplitude (WAMP); *F*10: skewness (SK); *F*11: kurtosis (KU); *F*12: median frequency (MDF); *F*13: mean frequency (MNF); *F*14: maximum frequency (MAXF); *F*<sup>15</sup> and *F*16: correlation coefficient (COR) and angle (ANG) between two frames of data, respectively; and *F*17: fourth-order auto-regressive coefficients (AR4). We perform preliminary feature selection using able-bodied subjects. We use MOO for final feature selection using only one able-bodied subject to reduce computational effort. In Section 4.4, we will investigate the performance of the selected features with other able-bodied subjects and with the amputee subjects. We note that the optimal feature subset may vary depending on subjects. However, in this paper, we are particularly interested in obtaining an optimal feature subset from able-bodied subjects, and assessing its performance on amputee subjects. The reason for this approach is that able-bodied subjects' data are more accessible for UIR training in real-world applications. Our results in Section 4.4 will show no significant difference between UIR performances for amputee subjects when trained with the optimal feature subset and when trained with the full feature set. Future work could compare optimal subsets of features obtained from different individuals.

In the first step, we train LDA, QDA, SVM-Linear, SVM-RBF, and MLP for each of three able-bodied subjects, and separately for each individual feature type listed in the previous paragraph, using 10-fold cross-validation (CV) for each training procedure. The mean classification accuracy over the three subjects and the ten folds are used to assess the importance of each feature type.

Figure 7 shows the mean classification accuracy and processing ratio for each feature type. *Processing ratio* indicates the relative computational load to compute each feature type—for instance, the percentage of computational load required to compute *F*<sup>17</sup> over the computational load required to compute all feature types is 4.22%. To reduce clutter in the figure, we show LDA results as a representative of QDA and SVM-Linear since they had similar performance. Similarly, we show SVM-RBF as a representative of MLP. Figure 7 shows consistent performance of different classifiers in terms of prioritizing various feature types. Figure 7 indicates that TD features require less computational effort than FD features. For instance, *F*12, *F*13, and *F*<sup>14</sup> require high computational effort compared to other features. We exclude *F*12, *F*13, and *F*<sup>14</sup> from the candidate feature set due to their relatively high computational expense and poor classification accuracy. In addition, *F*<sup>6</sup> and *F*7, two variants of MAV, are excluded due to their poor classification accuracy and because they provide information that is similar to MAV. Therefore, we exclude a total of five weak feature types, and pre-select the remaining 12 feature types. This results in a training vector with 11 × 3 + 4 × 3 = 45 elements. Note that the AR4 feature type includes four components and thus contributes a total of 12 elements from the three measurement signals. Finally, vertical hip position does not cross zero (see Figure 3), thus the number of zero crossing (ZC) feature of this signal is excluded. To verify the lack of information in the eliminated features, we found that the combination of the excluded features with the pre-selected features did not significantly enhance classification performance. In summary, we have a training data set with 44 features.

**Figure 7.** Mean classification accuracy of three able-bodied subjects, and processing ratio of 17 feature types trained by LDA using 10-fold cross validation.

Now, we are ready to proceed to the feature selection step. In this step, we use vector evaluated BBO (VEBBO), non-dominated sorting BBO (NSBBO), niched Pareto BBO (NPBBO), strength Pareto BBO (SPBBO), and gradient-based multi-objective feature selection (GMOFS) to select an optimal subset from the 44 pre-selected features. To reduce the computational expense, we use only the AB01 training data set in this step. We then verify that the selected subset results in a satisfactory UIR system when trained for other subjects. Table 3 shows the tuning parameters used in this paper. To tune the parameters, we performed a sensitivity analysis of multi-objective optimization (MOO) performance to each parameter, one at a time, to find a local optimum of MOO performance with respect to each parameter. For instance, GMOFS is implemented with different elastic net parameter values *α* = {0, 0.5, 1}, and we found that the Pareto front with *α* = 0 dominates Pareto fronts that are found

with other values of *α*. For training the neural network in GMOFS, we used the MATLAB function fmincon from the Optimization Toolbox (R2014a, MathWorks, Natick, MA, USA) to implement a trust region reflective algorithm. We mostly used default values for the fmincon parameters, but we found that the performance of GMOFS is not very sensitive to these parameters.

We run each multi-objective method for 10 independent trials, and the best Pareto front of each method is selected for MOO comparison. Results show that the GMOFS Pareto front statistically significantly dominates all four multi-objective biogeography-based optimization (MOBBO) Pareto fronts. We note that GMOFS and the MOBBO variants use different classifiers for feature selection, namely, MLP and LDA. To obtain a fair comparison of the new components of GMOFS with the MOBBO variants, we decouple the search strategy from classification performance. Note that LDA, which is used in MOBBO, is one of the most popular classification algorithms and has been widely used with evolutionary algorithms for feature selection due to its good performance and simplicity [44].


**Table 3.** Tuning parameters for multi-objective feature selection.

To conduct the fair comparison, we apply SVM with linear kernels to all of the optimal feature subsets found by the MOO methods. Figure 8a illustrates the Pareto fronts obtained by the five MOOs with SVM with linear kernels. Figure 8a shows that the Pareto fronts of VEBBO, SPBBO, NSBBO, and GMOFS are close, and clearly dominate the NPBBO Pareto points. Figure 8b indicates the combined Pareto front obtained from all of the non-dominated points in Figure 8a. GMOFS provides the maximum contribution to the combined Pareto front, while NPBBO does not contribute any Pareto points. All of the points in Figure 8b are labeled for easy referencing.

To systematically compare the Pareto fronts in Figure 8a, we use relative coverage and normalized hypervolume as discussed in Section 3.3. Tables 4 and 5 provide the comparison results using these two approaches. In Table 4, an entry in column *i* and row *j* (*i* = *j*) indicates the percentage of Pareto points of the method of column *i* that is dominated by at least one Pareto point of the method of row *j*. We see that, on average, only 7.2% of the Pareto points of VEBBO are weakly dominated by at least one Pareto point from the other four MOO methods. Therefore, VEBBO ranks first in terms of relative coverage. GMOFS ranks second and performs better than SPBBO, NSBBO, and NPBBO. In addition, Table 5 shows that VEBBO and GMOFS rank first and second in terms of normalized hypervolume,

respectively. GMOFS ranks first in terms of the number of Pareto points. These results verify the competitive performance of GMOFS compared to the other four MOO methods.

Most importantly, in terms of the advantage of GMOFS, it requires the execution of only 43 classifier training procedures (due to the number of *λ* increments), while each of the other four EA-based MOO methods require 100,000 training procedures (due to the combination of population size and generation limit).

**Figure 8.** (**a**) Pareto fronts obtained from MOO methods with an SVM classifier with linear kernels using AB01 training data; (**b**) combined Pareto front obtained from non-dominated Pareto points in (**a**).

**Table 4.** Comparison of Pareto fronts using relative coverage (RC). Only 7.2% and 30% of the VEBBO and GMOFS points, respectively, are dominated by other Pareto points; so VEBBO and GMOFS rank first and second, respectively, in terms of RC.


**Table 5.** Comparison of Pareto fronts using normalized hypervolume. *Np* is the number of Pareto points obtained by each MOO method. VEBBO and GMOFS rank first and second, respectively, in terms of normalized hypervolume, and GMOFS ranks first in terms of the number of points.


The benefit of presenting the data of Figure 8b is that it allows us to find the best subset of features for an accurate and parsimonious classifier. Among the 12 Pareto points, we choose *p*<sup>9</sup> as a potential candidate solution. We could pick any other solution from the Pareto front depending on the priority of the problem objectives, but *p*<sup>9</sup> provides a good trade-off between classification error and number of features. Therefore, in Section 4.4, we will investigate classification performance with candidate solution *p*<sup>9</sup> for all human subject data AB01, AB02, AB03, AM01, AM02, and AM03. However, first, we will find the best classifier in the following section.

Figure 9 shows that the feature selection frequencies of GMOFS and VEBBO, taken across all Pareto points, are different. However, they both select significant features at a high frequency. There are five common features that appear in all of the GMOFS, VEBBO, NSBBO, NPBBO, and SPBBO Pareto points: WL from vertical hip position and thigh angle (features 6 and 7), VAR from thigh moment (feature 11), WAMP from vertical hip position (feature 18), and ANG from thigh moment (feature 32). Therefore, all five feature selection methods value the most informative features regardless of their selection criterion and machine learning method. For example, Pareto point *p*<sup>8</sup> (obtained by VEBBO combined with the LDA classifier) and candidate solution *p*<sup>9</sup> (obtained by the GMOFS combined with MLP classifier) have nine features in common out of a total of 13 and 14 features, respectively.

**Figure 9.** Selection frequency of 44 features by VEBBO and GMOFS. The plots show how many times each feature appears in the Pareto points of the given method. For instance, feature 6 is present in all 10 GMOFS Pareto points.
