Metaheuristic Firefly Algorithm

Yang (2008) developed the FA, which is inspired by the swarm nature of fireflies [67]. This algorithm is designed to solve global optimization problems in which each individual firefly in a population interacts with each other through their light intensity. The attractiveness of an individual firefly is proportional to its intensity. Visibly, the less this attraction for another individual firefly, the farther away it is from its location.

Despite the effectiveness of conventional FA in solving optimization problems, it often gets stuck in the local optima [39]. Randomization is considered an important part of searching optimal solutions. Therefore, fine-tuning the degree of randomness and balancing the local and global search are critical for the favorable performance of a metaheuristic algorithm.

The achievement of the FA is decided by three parameters, which are *β*, *γ*, and *α*, where *β* is the attractiveness of a firefly, *γ* is the absorption coefficient, and *α* is a trade-off constant to determine the random movements. Hence, this study supplements metaheuristic components—chaotic maps, adaptive inertia weight (AIW) and <sup>L</sup>évy flight— into the basic FA. The components are not only to restore the balance between exploration and exploitation but also to increase the probability of escaping from the attraction of local optima.

Chaotic Maps: Generating a Variety of Initial Population and Refining Attractive Values

The simplest chaotic mapping operator is the logistic mapping, which creates more diversity than randomly selected baseline populations, and reduces the probability of early convergence [68]. The logistic map is formulated as Equation (4).

$$X\_{n+1} = \eta X\_n (1 - X\_n) \tag{4}$$

where *n* is the number label of a firefly and *Xn* is the logistic chaotic value of the *n*th firefly. In this work, initial populations are generated using the logistic map equation, and parameter *η* is set to 4.0 in all experiments.

Additionally, chaotic maps are used as efficient alternatives to pseudorandom sequences in chaotic systems [69]. A Gauss/mouse map is the best chaotic map for tuning the attractiveness parameter *(β)* of the original FA. Equation (5) describes the Gauss/mouse map that was used in this study.

$$\text{Gauss/mouse map}: \begin{aligned} \beta\_{\text{chros}}^t = \begin{cases} 0 & \beta\_{\text{chros}}^{t-1} = 0 \\ 1/\beta\_{\text{chros}}^{t-1} \text{mod}(1) & \text{otherwise} \end{cases} \end{aligned} \tag{5}$$

The *β* of a firefly is updated using Equation (6).

$$\beta = (\beta\_{\text{chav}}^t - \beta\_0)e^{-\gamma \tau\_{ij}^{-2}} + \beta \text{o} \tag{6}$$

where *β* is the firefly attractiveness; *βtchaos* is the *t*th Gauss/mouse chaotic number and *t* is the iteration number; *β*0 is the attractiveness of the firefly at distance *r* = 0; *rij* is the distance between the *i*th firefly and the *j*th firefly; *e* is a constant coefficient, and *γ* is the absorption coefficient.

#### Adaptive Inertia Weight: Controlling Global and Local Search Capabilities

In this investigation, the AIW was integrated into the original FA because AIW has critical effects on not only the optimal solution convergence, but also the computation time. A monotonically decreasing function of the inertia weight was used to change the randomization parameter *α* in the conventional FA. The AIW was utilized to adjust the parameter *α* by which the distances between fireflies were reduced to a reasonable range (Equation (7)).

$$
\sigma = \mathfrak{a}\_0 \theta^t \tag{7}
$$

where *α*0 is the initial randomization parameter; *αt* is the randomization parameter in the *t*th generation; *θ* is the randomness reduction constant (0 < *θ* < 1), and *t* is the number of the iteration. The selected value of *θ* in this implementation is 0.9 based on the literature, and *t* ∈ [0, *tmax*], where *tmax* is the maximum number of generations.

*αt* <sup>L</sup>évy Flight: Increasing Movement and Mimicking Insects

A random walk is the outstanding characteristic of <sup>L</sup>évy flight in which the step length follows a <sup>L</sup>évy distribution [70]. Equation (8) provides the step length *s* in Mantegna's algorithm.

$$\text{Levy} \sim s = \frac{u}{\left|v\right|^{1/\tau}}\tag{8}$$

where <sup>L</sup>évy is a <sup>L</sup>évy distribution with an index *τ*; *s* denotes a power–law distribution; and *u* and *v* are drawn from normal distributions, as follows. New solutions are obtained around the optimal solution using a <sup>L</sup>évy walk, which expedites the local search.

$$
u \sim N(0, \sigma\_u^2), \upsilon \sim N(0, \sigma\_v^2) \tag{9}$$

$$\text{where } \sigma\_{\text{ll}} = \left\{ \frac{\Gamma(1+\tau)\sin(\pi\tau/2)}{\Gamma[(1+\tau)/2]\pi 2^{(\tau-1)/2}} \right\}^{1/\tau}, \sigma\_{\text{\textquotedblleft}} = 1 \tag{10}$$

Here, <sup>Γ</sup>(t) is the Gamma function.

$$
\Gamma(\mathbf{t}) = \int\_0^\infty z^{t-1} e^{-z} dz \tag{11}
$$

Notably, the aforementioned metaheuristic components supplement the basic FA to improve the effectiveness and efficiency of optimization process. The movement of the *i*th firefly that is attracted to a brighter *j*th firefly is thus modified as follows:

$$\mathbf{x}\_{i}^{t+1} = \mathbf{x}\_{i}^{t} + \beta(\mathbf{x}\_{j}^{t} - \mathbf{x}\_{i}^{t}) + \boldsymbol{\alpha}^{t}\text{sign}[\mathbf{rand} - 0.5] \otimes \mathbf{Lcvy} \tag{12}$$

Table 1 presents the default settings of the parameters used in the enhanced FA.

**Table 1.** Default settings of parameters of enhanced FA.


#### 3.2.3. Optimized LSSVM Model with Decomposition Scheme

The hybrid model in this work combines the LSSVM with the OAO decomposition scheme to solve multi-level classification problems. In highly nonlinear spaces, the RBF kernel is used in the LSSVM. To improve accuracy in the solution of multi-class problems, the enhanced FA is used to finetune the regularization parameter (*C*) and the sigma parameter (*σ*) in the LSSVM model. Particularly, the FA was improved using three supplementary elements to optimize hyperparameters *C* and *σ*. Equation (13) is the fitness function of the model in which the objective function represents the classification accuracy.

$$\mathbf{f}(\mathbf{m}) = \text{objective\\_function}\_{\text{validation}\text{-data}} \tag{13}$$
