*3.1. Decomposition Methods*

The strategy of decomposing the original problem into many sub-problems is extensively used in applying binary classifiers to solve multi-class classification problems. The OAO algorithm was used for decomposition herein. The OAO scheme divides an original problem into as many binary problems as possible pairs of classes. Each problem is faced by a binary classifier, which is responsible for distinguishing between each of the pair, and then the outputs of these base classifiers are combined to predict the final output.

Specifically, the OAO method constructs k(k–1)/2 classifiers [16], where k is the number of classes. Classifier *ij*, named *fij*, is trained, using all of the patterns from class *i* as positive instances. All of the patterns from class *j* are negative cases and the rest of the data points are ignored. The code-matrix in this case has dimensions k × k(k–1)/2 and each column corresponds to a binary classifier of a pair of classes. All classifiers are combined to yield the final output.

Different methods can be used to combine the obtained classifiers for the OAO scheme. The most common method is a simple voting method [66] by which a group, such as people

in a meeting or an electorate, makes a decision or expresses an opinion, usually following discussion, debate or election campaigns.

#### *3.2. Optimization in Machine Learning*

3.2.1. Least Squares Support Vector Machine for Classification

The least squares SVM (LSSVM), proposed by Suykens et al. [53], is an enhanced ML technique with many advanced features. Therefore, the LSSVM has high generalizability and a low computational burden. In a function estimation of the LSSVM, given a training dataset {*xk*, *yk*}*Nk*=1, the optimization problem is formulated as follows:

$$\min\_{\omega, b, \varepsilon} l(\omega, \varepsilon) = \frac{1}{2} \left\| \omega \right\|^2 + \frac{1}{2} \mathbb{C} \sum\_{k=1}^{N} \varepsilon\_k^2 \tag{1}$$

subject to *yk* = *<sup>ω</sup>*, *ϕ*(*xk*) + *b* + *ek*, *k* = 1, . . . *N*

where *J(<sup>ω</sup>,e)* is the optimization function; *ω* is the parameter in the linear approximation; *ek* ∈ R are error variables; *C* ≥ 0 is a regularization constant that represents the trade-off between the empirical error and the flatness of the function; *xk* is the input patterns; *yk* are prediction labels; and *N* is the sample size.

Equation (2) is the resulting LSSVM model for function estimation.

$$f(\mathbf{x}) = \sum\_{k=1}^{N} a\_k \mathcal{K}(\mathbf{x}, \mathbf{x}\_k) + b \tag{2}$$

where *αk*, *b* are Lagrange multipliers and the bias term, respectively; and *K(x, xk)* is the kernel function.

The Gaussian radial basis function (RBF) and the polynomial are commonly used kernel functions. RBFs are more frequently used because, unlike linear kernel functions, they can classify multi-dimensional data efficiently. Therefore, in this study, an RBF kernel is used. Equation (3) is the RBF kernel.

$$\mathcal{K}(\mathbf{x}, \mathbf{x}\_k) = \exp(-||\mathbf{x} - \mathbf{x}\_k||^2 / 2\sigma^2 \tag{3}$$

Although the LSSVM can effectively learn patterns from data, the main shortcoming is that the predictive accuracy of an LSSVM model depends on the setting of its hyperparameters. Parameter optimization in an LSSVM includes the regularization parameter (*C*) in Equation (1) and the sigma of the RBF kernel (*σ*) in Equation (3). The generalizability of the LSSVM can be increased by determining optimal values of *C* and *σ*. In this investigation, the enhanced FA, which is an improved stochastic, nature-inspired metaheuristic algorithm, was developed to finetune the above hyperparameters *C* and *σ.*

#### 3.2.2. Enhanced Firefly Algorithm

In this study, the enhanced firefly algorithm is proposed to improve the LSSVM's hyperparameters. The FA is improved by integrating stochastic agents to enrich global exploration and local exploitation.
