*3.1. Biogeography-Based Multi-Objective Optimization*

We propose the application of multi-objective biogeography-based optimization (MOBBO) for feature selection of the UIR system. The dimension of the optimization problem is equal to the number of available features (independent variables). Each feature is represented by a binary value where 1 indicates that the feature is used for classification, and 0 indicates otherwise. Therefore, each individual in the MOO algorithm is a binary sequence with length equal to the problem dimension. We evaluate the following two objective functions for all individuals in the population:

$$\begin{aligned} f\_1^i &= \text{number of selected features in the } i \text{-th individual}, \\ f\_2^i &= \text{average prediction error using } c \text{-fold cross validation}. \end{aligned} \tag{3}$$

where *i* = 1, ··· , *N*, and *N* is the population size. We combine BBO with four MOO algorithms [21] to obtain vector evaluated BBO (VEBBO), non-dominated sorting BBO (NSBBO), niched Pareto BBO (NPBBO), and strength Pareto BBO (SPBBO). We apply these MOBBO variants to find the optimal Pareto set for the optimization problem. We investigate the performance of each method in a later section.

We use linear discriminant analysis (LDA) to compute *f <sup>i</sup>* <sup>2</sup>. LDA is widely used with evolutionary algorithms to evaluate the quality of a candidate subset for feature selection [44]. LDA does not require time-consuming iterations to build a model. This point is important because EAs require

many objective function evaluations to find the solution. In Section 4.2, we will demonstrate that all feature selection approaches presented in this paper are able to find the most significant features, even though they use different selection criteria and machine learning methods. It is possible to use either classification accuracy or error as the quality measure for the second objective. We use average classification error of *c*-fold cross validation (CV). In *c*-fold CV, we randomly divide the training set into *c* distinct folds. Then, we repeat training *c* times; each time, the model is trained using *c* − 1 folds and is tested with the remaining fold. The average of the *c* classification errors is used as the quality measure.
