**Applied (Meta)-Heuristic in Intelligent Systems**

Editor

**Peng-Yeng Yin**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editor* Peng-Yeng Yin Ming Chuan University Taiwan

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Applied Sciences* (ISSN 2076-3417) (available at: https://www.mdpi.com/journal/applsci/special issues/applied heuristic Intelligent systems).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-5875-2 (Hbk) ISBN 978-3-0365-5876-9 (PDF)**

© 2022 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**


## **About the Editor**

#### **Peng-Yeng Yin**

Peng-Yeng Yin is a Professor at the Information Technology and Management Program, Ming Chuan University, Taiwan. He has been a visiting Professor at the University of Maryland, Georgetown University, the University of California, Riverside (UCR), the University of Colorado, and Osaka University. He was a Distinguished Professor of the Department of Information Management, National Chi Nan University, Taiwan, in 2003 and 2021, and the Dean of the Information College at China University of Technology in 2022. Dr. Yin is the Editor-in-Chief of the *International Journal of Applied Metaheuristic Computing* and has been on the Editorial Board of the *Journal of Computer Information Systems*, *Applied Mathematics & Information Sciences*, *Mathematical Problems in Engineering*, *Mathematical Biosciences and Engineering*, *Applied Sciences*, *Symmetry*, *Sustainability*, and the *International Journal of Advanced Robotic Systems*. His research interests include artificial intelligence, metaheuristics, and operations research.

### *Editorial* **Applying Modern Meta-Heuristics in Intelligent Systems**

**Peng-Yeng Yin**

Information Technology and Management Program, Ming Chuan University, No. 5 De Ming Rd., Gui Shan District, Taoyuan City 333, Taiwan; pyyin@mail.mcu.edu.tw

#### **1. Introduction**

Engineering and business problems are increasingly impenetrable due to the new economics triggered by big data, artificial intelligence, and the Internet of things. Exact algorithms and heuristics are not sufficient to conquer such extremely large and unstructured problems. Metaheuristic algorithms emerge as prevailing methods in this context. A generic metaheuristic framework guides the course of search trajectories beyond the local optimality, thus overcoming the impairment of traditional computation methods. The application of modern metaheuristics has a large coverage, ranging from unmanned aerial and ground surface vehicles, unmanned factories, resource-constrained production, and humanoids to green logistics, renewable energy, the circular economy, technology agriculture, environmental protection, finance technology, and the entertainment industry. The aim of this Special Issue is to collect new proposals for marrying modern metaheuristics and intelligent systems. The manuscript submission to this Special Issue was closed on 31 August 2022; we received 22 submissions, of which 8 papers were published, an acceptance rate of 36%. I believe the publication of this Special Issue will position itself at the research frontier across many aspects of applied sciences.

#### **2. Modern Metaheuristics and Its Applications**

In previous decades, we saw many novel metaheuristic algorithms including the genetic algorithm (GA), particle swarm optimization (PSO), ant colony optimization (ACO), differential evolution (DE), simulated annealing (SA), adaptive memory programming (AMP), and tabu search (TS). Recently, two interesting branches of metaheuristics have absorbed researchers' attention. Both of the two modern metaheuristics come from a marriage between two different disciplines, which can be easily recognized by their compound names: matheuristics and hyperheuristics. Matheuristics embeds mathematics into a metaheuristic framework, or vice versa. The purpose is to reach an elaborate mechanism melding together the effectiveness of mathematics and the efficiency of metaheuristics. The hyperheuristics aims to construct an automatic heuristic-selection machine, taking advantage of many lower-level heuristics which have been proposed in academia and industry.

This Special Issue presents two papers contemplating new matheuristics. The first paper, authored by P. Yin, P. Chen, Y. Wei, and R. Day, proposes a matheuristic approach embedding several memory programming strategies from AMP into the metaheuristic framework of the firefly algorithm [1]. The useful strategies include multiple guiding solutions, pattern search, multi-start, swarm rebuilding, and the objective landscape analysis. The second paper, authored by E. Cuevas, H. Becerra, H. Escobar, A. Luque-Chang, M. Pérez, H. Eid, and M. Jiménez, proposes matheuristic search schemes with the trajectory courses assisted by the second-order systems [2]. The second-order systems have different temporal responses depending on the set values of the parameters. Such temporally varying responses can be embedded into a metaheuristic to facilitate the search patterns adapted to complex landscapes. One hyperheuristics paper was chosen for this Special Issue. The paper, authored by X. Sánchez-Díaz, J. Ortiz-Bayliss, I. Amaya, J. Cruz-Duarte, S. Conant-Pablos, and H. Terashima-Marín, develops a feature-independent hyperheuristic

**Citation:** Yin, P.-Y. Applying Modern Meta-Heuristics in Intelligent Systems. *Appl. Sci.* **2022**, *12*, 9746. https://doi.org/10.3390/ app12199746

Received: 23 September 2022 Accepted: 26 September 2022 Published: 28 September 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

approach for solving the knapsack problem. The proposed hyperheuristics does not rely on problem features to map the problem states into suitable existing heuristics [3]. Instead, a fixed sequence of existing heuristics is defined to improve the problem-solving performance within the hyperheuristic framework.

We also collected papers for classic applications of modern metaheuristics, namely vehicle routing and wireless networking. A paper authored by S. Nucamendi-Guillén, D. Flores-Díaz, E. Olivares-Benitez, and A. Mendoza presents a memetic algorithm for the cumulative capacitated vehicle routing problem [4]. The proposed method is a biobjective optimization scheme that minimizes the total latency and total tardiness of the vehicle routing simultaneously. A mixed-integer program formulation is proposed for the cumulative capacitated vehicle routing problem. As compared to commercial software which optimally solves the problem with a small size, the proposed memetic algorithm can solve a larger-sized problem and produce the efficient bi-objective Pareto fronts. The paper by H. Rico-Garcia, J. Sanchez-Romero, A. Jimeno-Morenilla, and H. Migallon-Gomis proposes a parallel metaheuristic approach to reduce the vehicle traveling time in smart cities [5]. A Compute Unified Device Architecture (CUDA)-based implementation of the Teacher–Learner-Based Optimization (TLBO) metaheuristic is presented to target the shortest routing path for visiting a large number of points in a city. For applications in wireless networking, the paper by M. Chiang and W. Su presents a load balancing scheme for multithreaded applications on NUMA systems. When an imbalance occurs in the load on NUMA multiple cores, the load balancing mechanism of the kernel scheduler should migrate threads between NUMA cores. Threads to be migrated are selected considering the distribution of threads on nodes for inter-node load balancing [6]. The paper, authored by H. Zhang, J. Yang, T. Qin, Y. Fan, Z. Li, and W. Wei, develops a multi-strategy improved sparrow search algorithm (ISSA) for solving the node localization problem in heterogeneous wireless sensor networks (HWSNs). ISSA is an advanced version of the traditional SSA for improving the convergence speed and accuracy. This is accomplished by adopting the PSO's individual best for solution guiding and a Gaussian disturbance for preventing falling at local optima. The experimental results show that the ISSA yields a smaller average localization error than that of other metaheuristics [7].

Future challenges in metaheuristic applications are addressed in the final paper. The paper, authored by C. Huang, W. Liang, P. Chen, and Y. Chan, uses a dual matrix to identify the opinion leaders and followers in order to reach a converged consensus on the web [8]. The effectiveness of the proposed method has been validated by a case study of extracting leading opinions on green energy and low carbon for helping make effective public policies on environmental protection.

**Funding:** This research received no external funding.

**Acknowledgments:** I would like to thank all the contributing authors for their innovative and well managed works. This Special Issue would not be possible without their enthusiasm for charting advanced applications of metaheuristics in intelligent systems. My thanks are also given to the anonymous reviewers for their unselfish and professional reviews, which have helped shape the standard of this Special Issue.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Cyber Firefly Algorithm Based on Adaptive Memory Programming for Global Optimization**

**Peng-Yeng Yin 1,2,\*, Po-Yen Chen 1, Ying-Chieh Wei <sup>2</sup> and Rong-Fuh Day <sup>1</sup>**


Received: 12 October 2020; Accepted: 14 December 2020; Published: 15 December 2020

**Abstract:** Recently, two evolutionary algorithms (EAs), the glowworm swarm optimization (GSO) and the firefly algorithm (FA), have been proposed. The two algorithms were inspired by the bioluminescence process that enables the light-mediated swarming behavior for mating or foraging. From our literature survey, we are convinced with much evidence that the EAs can be more effective if appropriate responsive strategies contained in the adaptive memory programming (AMP) domain are considered in the execution. This paper contemplates this line and proposes the Cyber Firefly Algorithm (CFA), which integrates key elements of the GSO and the FA and further proliferates the advantages by featuring the AMP-responsive strategies including multiple guiding solutions, pattern search, multi-start search, swarm rebuilding, and the objective landscape analysis. The robustness of the CFA has been compared against the GSO, FA, and several state-of-the-art metaheuristic methods. The experimental result based on intensive statistical analyses showed that the CFA performs better than the other algorithms for global optimization of benchmark functions.

**Keywords:** adaptive memory programming; firefly algorithm; global optimization; glowworm swarm optimization; metaheuristics

#### **1. Introduction**

Many challenging problems in modern engineering and business domains challenge the design of satisfactory algorithms. Traditionally, researchers resort to either mathematical programming approaches or heuristic algorithms. However, mathematical programming approaches are plagued by the curse of problem size and the heuristic algorithms have no guarantees to near-optimal solutions. Recently, metaheuristic approaches have come as an alternative between the two extreme approaches. The metaheuristic approaches can be classified into two classes, evolution-based and memory-based algorithms. The evolutionary algorithms (EAs) iteratively improve solution quality by decent operations inspired by nature metaphors, creating several novel algorithms, such as genetic algorithms, artificial immune systems, ant colony optimization, and particle swarm optimization. The memory-based metaheuristic approaches guide the search course to go beyond the local optimality by taking full advantage of adaptive memory manipulations. Typical renowned methods in this class include tabu search, path relinking, scatter search, variable neighborhood search, and greedy randomized adaptive search procedures (GRASP).

The metaheuristic approaches contained in the two classes have developed rather independently, and only a few works investigate the possible synergy between them [1]. Talbi and Bachelet [2] proposes the COSEARCH approach which combines the tabu search and a genetic algorithm for solving the quadratic assignment problem. Shen et al. [3] proposes an approach called HPSOTS which enables the particle swarm optimization to circumvent local optima by restraining the particle

movement based on the use of tabu memory. A hybrid of the ant colony optimization and the GRASP is proposed by Marinakis et al. [4] for cluster analysis. However, the two metaheuristics are separately used to tackle the feature selection and clustering problems, respectively. Fuksz and Pop [5] proposes a hybrid genetic algorithm (GA) with variable neighborhood search (VNS) to the number partitioning problem. Their hybrid GA-VNS runs the GA as the main algorithm and the VNS procedure for improving individuals within the population. Yin et al. [6] introduces the cyber swarm algorithm which gives more substance to the particle swarm optimization (PSO) by incorporating the adaptive memory programming (AMP) strategies introduced in the scatter search and path-relinking (SS/PR) template. The adjective "cyber" emphasizes the connection between the evolutionary swarm metaheuristics and the AMP metaheuristics. The cyber swarm algorithm outperforms several state-of-the-art metaheuristics on complex benchmark functions. The experimental results of the above-noted works disclose a promising research area that the marrying of the approaches from the two classes of metaheuristics can create significant benefit that is hardly obtained by the approaches from each class alone.

More recently, two evolution-based algorithms [7,8], namely the glowworm swarm optimization (GSO) and the firefly algorithm (FA), were proposed. The two algorithms were inspired by the bioluminescence process that enables the light-mediated swarming behavior for mating or foraging. The intensity of the light and the distance between the light source and the observer determine the attractiveness degree that causes the moving maneuver. This form of metaphor can be used to develop a swarm-based optimization algorithm where a swarm of glowworms/fireflies represent a set of candidate solutions. The light intensity is evaluated by the objective value of the light source and the distance between the glowworms/fireflies implicitly defines the eligible neighbors since the light observed by an agent decays with the distance. The glowworms/fireflies are attracted by visible light sources and fly towards them. Therefore, the solutions represented by the glowworms/fireflies improve their objective value through the swarming behavior. Several improvements of FA have been proposed. Yang proposed the LFA [9] by combining his original FA with Levy flight. Yu et al. employed the variable step size strategy to create the VESSFA [10] variant. The dynamic adaptive inertia weight was adopted in WFA [11] to use the short-term memory of previous moving velocity. Kaveh et al. developed CLFA [12] which applies chaos theory and logical mapping to determine the optimal FA parameters. The Tidal Force formula was used in FAtidal [13] to improve the balance between exploitation and exploration search behaviors. The most recent improvement was GDAFA [14] which uses global-oriented positional update and dynamically adjusts the step size and attractiveness to avoid being trapped in local optima. The DsFA [15] employed dynamic step change strategy to balance the global and local search capabilities, such that the search course is not likely trapped in local optimum.

From a long-term perspective of metaheuristic development, we anticipate that the GSO and the FA can be made more effective by incorporating the notions from the memory-based approaches, as validated in many previous attempts. Based on the prevailing framework of the cyber swarm algorithm [6] that integrates key elements of the two types of metaheuristic methods, we propose the Cyber Firefly Algorithm (CFA) to proliferate the advantages of the original form of the GSO and the FA. The CFA conceptions include the employment of multiple guiding fireflies, the embedding of the pattern search, firefly swarm rebuilding by the multi-start search, and the responsive strategies based on objective landscape analysis. The robustness of the CFA has been compared against the GSO, FA, and several state-of-the-art metaheuristic methods. The result as demonstrated in our statistical analyses and comprehensive experiments showed the CFA manifests a more effective form of GSO and FA. Most FA variants intend to improve the position update mechanism such as by using Levy flight [9], variable step size strategy [10], adaptive inertia weight [11], and dynamically adjusting strategy [14]. Our CFA differs with these variants by facilitating a more intelligent step size control mechanism which performs landscape analysis in the objective space to adaptively tune the step size according to the profiles of the incumbent fitness landscape.

The novelty of this paper stems at creating a more effective form of GSO and FA approaches by bridging the advantages of evolution-based and memory-based metaheuristics. In particular, this paper investigates whether the CSA template is viable for improving GSO and FA as CSA has been shown in [6] for improving PSO. Our experimental results show that the proposed CFA prevails GSO, FA, and several state-of-the-art metaheuristics on benchmark datasets. This justifies the generalization capability of the CSA template and one can follow the template to create an effective version of interested metaheuristics.

The remainder of this paper is organized as follows. Section 2 presents a literature review and Section 3 proposes the CFA and the employed features. Section 4 presents the results of intensive experiments. Finally, conclusions and future research possibilities are given in Section 5.

#### **2. Related Work Materials and Methods**

#### *2.1. Glowworm Swarm Optimization (GSO)*

Krishnanand and Ghose [7] developed the GSO algorithm. It is assumed that each glowworm has a luciferin level and a local visibility range. The luciferin level of a glowworm determines its light intensity, and the local visibility range identifies the neighboring glowworms which are visible to it. The glowworm probabilistically chooses a neighbor which has a higher luciferin level than itself and is flying towards this neighbor due to light attraction. The local visibility range is dynamic for maintaining an ideal number of neighbors. The GSO simulates the glowworm behaviors and consists of three phases as depicted as follows.

The luciferin update phase evaluates the luciferin level of every glowworm according to the decay of its luminescence and the merit of its new position after performing the movement within the evolution cycle *t*. The luciferin level of glowworm *i* is updated by the following equation.

$$l\_{i}^{t+1} \leftarrow (1 - \rho)l\_{i}^{t} + \tau f\_{i}^{t+1} \tag{1}$$

where ρ is the decay ratio of the glowworm's luminescence and τ is an enhancement constant. The first term is the persistence substance of luminescence due to decay with time, and the second term is the additive luminescence as a function of *f t*+1 *<sup>i</sup>* which indicates the objective value measured at the glowworm's new position (here, without loss of generality, we assume the objective function is to be maximized).

In the movement phase of each evolution cycle, every glowworm in the swarm must perform a movement by flying towards a neighbor which has a higher luciferin level than the incumbent glowworm and is located within the local visibility neighborhood defined by a radius *r<sup>t</sup> i* . The probability with which glowworm *i* is attracted to a brighter glowworm *j* at evolution cycle *t* is given by:

$$p\_{ij} = \frac{l\_j^t - l\_i^t}{\sum\_{k \in N\_i^t} l\_k^t - l\_i^t} \tag{2}$$

where *N<sup>t</sup> <sup>i</sup>* is the set of glowworms within the visibility neighborhood of glowworm *i* at evolution cycle *t*. Once selecting a neighbor, say glowworm *j*, the current glowworm *i* performs a movement to update its position as follows.

$$\mathbf{x}\_{i}^{t+1} \leftarrow \mathbf{x}\_{i}^{t} + s \left( \frac{\mathbf{x}\_{j}^{t} - \mathbf{x}\_{i}^{t}}{\|\mathbf{x}\_{j}^{t} - \mathbf{x}\_{i}^{t}\|} \right) \tag{3}$$

where *s* is the movement distance and • indicates the length of the referred vector. Precisely speaking, glowworm *i* moves in *s* units of distance towards glowworm *j*.

The visibility range update phase dynamically tunes the visibility radius of each glowworm to maintain an ideal number of neighbors, *N*∗ . So, the current number of neighbors, *Nt i* , is compared to *N*<sup>∗</sup> and the visibility radius *rt <sup>i</sup>* is tuned by the following equation.

$$r\_{i}^{t+1} \leftarrow \min \left\{ r\_{\max \prime} \; \max \left\{ 0, r\_{i}^{t} + \eta \left( N^{\*} - N\_{i}^{t} \right) \right\} \right\} \tag{4}$$

where *r*max is the maximum visibility radius and η is a scaling parameter for tuning *N<sup>t</sup> i* . Therefore, the value of *r<sup>t</sup> <sup>i</sup>* is increased if *<sup>N</sup><sup>t</sup> <sup>i</sup>* < *N*<sup>∗</sup> , and it is decreased if *N<sup>t</sup> <sup>i</sup>* > *N*<sup>∗</sup> . The feasible range of *rt <sup>i</sup>* is bounded between [0, *r*max]. The phenomenon of *r<sup>t</sup> <sup>i</sup>* = 0 indicates many glowworms are resorting to the position of the current glowworm, while *rt <sup>i</sup>* = *rmax* discloses the situation that the current glowworm is in a large distance to most of the other glowworms.

#### *2.2. Firefly Algorithm (FA)*

Yang [8] introduced the FA. The FA explores the solution space with a population of fireflies. Each firefly has luminescence of flashing light which attracts its mates in an inverse multiplication of the squared distance and the light absorption. By using the metaphor, a firefly (representing a candidate solution) can improve its light intensity (a merit function of the objective value) by flying toward a more attractive firefly. In particular, the attractiveness of firefly *j* observed by firefly *i* is defined as follows.

$$
\beta\_r = \beta\_0 \mathfrak{e}^{-\gamma r^2} \tag{5}
$$

where γ is the light absorption coefficient, *r* is the Euclidean distance between the two fireflies, and β<sup>0</sup> is the light intensity of firefly *j*.

The movement of firefly *i* is attracted to a more attractive firefly *j* by the following equation,

$$\mathbf{x}\_{i}^{t+1} \leftarrow \mathbf{x}\_{i}^{t} + \beta\_{r} (\mathbf{x}\_{j}^{t} - \mathbf{x}\_{i}^{t}) + \alpha \varepsilon\_{i} \tag{6}$$

where the second term is due to the light attraction and the third term is a random perturbation with α being the randomization parameter and ε*<sup>i</sup>* is a vector of small random numbers drawn from a Gaussian distribution or uniform distribution. After the movement, the light intensity of firefly *i* should be re-evaluated and the relative attractiveness of any other flies to firefly *i* is also re-calculated. The FA conducts a maximum number of evolution cycles and within each cycle every pair of fireflies should be examined for possible movement due to attraction and randomization.

#### *2.3. Adaptive Memory Programming (AMP)*

The AMP comprises a broader spectrum than its more popularly accepted branch, the tabu search [16]. In what follows, we focus our discussion on the use of the SS/PR template [17] which we found very effective in creating benefits for marrying with evolutionary-based metaheuristics.

#### 2.3.1. Scatter Search (SS)

The scatter search (SS) [18] operates on a common reference set consisting of diverse and elite solutions observed throughout the evolution history. The SS method dynamically updates the reference set and systematically selects subsets of the reference set to generate new solutions. These new solutions are improved until local optima are obtained and the reference set is updated by comparing its current members to these local optima. Some important features of the SS are as follows. (1) A diversification-generation method is designed to identify the under-explored region in the solution space such that a set of diverse trial solutions can be produced. (2) An improvement method is used to enhance the quality of the solutions under keeping. This process usually involves a local search operation which brings the trial solution to a local optimum. (3) A reference set update method is adopted to make sure that the reference set is maintaining a set of high quality and mutually diverse solutions observed overall in search history. (4) The multi-start search strategy iteratively restarts a new search session when the current search loses it efficacy. To lead the search course towards under-explored solution space, the multi-start strategy works with a rebuilt reference set produced by the diversification-generation method. (5) Interactions between multiple reference solutions are systematically contemplated. The simplest implementation is to generate all 2-element subsets consisting of exactly two reference solutions. Various search courses are conducted between and beyond the selected reference solutions.

#### 2.3.2. Path Relinking (PR)

There is a common hypothesis accepted by most of the metaheuristics that elite solutions often lie on trajectories from good solutions to other good solutions. In a broader sense, the crossover of chromosomes, the sociocognition learning of particles, and the pheromone trail searching are all effective operators following the noted hypothesis. PR therefore creates a search path between elite solutions. An initiating solution and a guiding solution are selected from the repository of elite solutions. PR then transforms the initiating solution into the guiding solution by generating a succession of moves that introduce attributes from the guiding solution into the initiating solution. The relinked path can go beyond the guiding solution to extend the search trajectory. PR works in the neighborhood space instead of the Euclidean space and variable neighborhoods are usually considered in performing successive moves. Therefore, PR is well fitted for use as a diversification strategy.

#### **3. Proposed Methods**

Our proposed CFA synergizes the strength from three domains, namely the GSO, the FA, and the AMP, to create a more effective global optimization metaheuristic algorithm. We articulate the new features of the CFA as follows.

#### *3.1. Multiple Guiding Points*

Both GSO and FA algorithms conduct the firefly movement by referring to a guiding point, which is a nearby elite firefly with a higher fitness than that of the moving firefly. This mechanism of using a single guiding point raises the risk of misleading due to the selection of a false peak. Our proposed CFA algorithm instead employs a two-guiding-point mechanism to enhance the exploration capability of the algorithm. More precisely, the neighboring fireflies of the incumbent firefly *i* are identified by reference to the current value of the visibility radius, *rt i* , where *t* indicates the index of the evolution iteration. Among the neighbors, the elite fireflies with a higher fitness than that of the incumbent firefly are eligible for guiding-point selection. We employ the rank selection strategy to select two guiding points from the eligible neighbors. According to our preliminary experiments, the rank selection strategy is more robust against prickly fitness landscape than the roulette-wheel selection strategy because the former works in the ranking value space instead of the fitness space.

#### *3.2. Luciferin-Proportional Movement*

In GSO and FA, the luciferin of firefly is used only for the determination of the single guiding point. As we deploy the two-guiding-point mechanism, the relative significance of contribution from each of the two guiding points can be further elaborated. We propose to convert the luciferin of each guiding point to the weighting value for individual contribution. Given two guiding points *xt <sup>j</sup>* and *<sup>x</sup><sup>t</sup> k* observed at iteration *t*, their weighting values ω<sup>1</sup> and ω<sup>2</sup> are set as follows.

$$\omega\_1 = \frac{l\_j^t}{l\_j^t + l\_k^t} \tag{7}$$

$$
\omega\_2 = \frac{l\_k^t}{l\_j^t + l\_k^t} = 1 - \omega\_1 \tag{8}
$$

We then propose the firefly movement formula as follows.

$$\mathbf{x}\_{i}^{t+1} \leftarrow \mathbf{x}\_{i}^{t} + \varphi\_{1} \omega\_{1} \beta\_{\tau\_{j}} \left(\mathbf{x}\_{j}^{t} - \mathbf{x}\_{i}^{t}\right) + \varphi\_{2} \omega\_{2} \beta\_{\tau\_{k}} \left(\mathbf{x}\_{k}^{t} - \mathbf{x}\_{i}^{t}\right) \tag{9}$$

where ϕ<sup>1</sup> and ϕ<sup>2</sup> are random values drawn from a uniform distribution U(*lb*, *ub*). Hence, firefly *i* is drawn by firefly *j* and firefly *k* with driving force relative to respective luciferin level. Parameters *lb* and *ub* control the feasible range of the step size for every firefly movement and their values are dynamically tuned in accordance with the landscape analysis as will be noted.

#### *3.3. Adaptive Local Search*

Local search is a rudimentary component contained in most of the successful metaheuristics such as memetic algorithms [19], scatter search [18], and GRASP [20], to name a few. Our CFA employs the pattern search method [21,22] which iteratively performs a two-step trajectory search. The explorative search step tentatively looks for an improving neighbor while the pattern search step aggressively expands the improving move by doubling the execution of the move. The proposed CFA embodies the pattern search method as a local search but performs it in an adaptive way. The local search is performed on all fireflies only after consuming every *t*<sup>1</sup> fitness function evaluations. To leverage the balance between search efficiency and effectiveness, we adaptively vary the frequency parameter (*t*1) for the performed local search. The adaptive local search strategy is based on the analysis of the landscape observed in the objective space as will be noted.

#### *3.4. Multi-Start with PR for Firefly Swarm Rebuilding*

Multi-start is a mechanism for reinitiating the search with an under-explored region when the trajectory search loses its efficacy. The multi-start search strategy thus works with two functions. The function for *critical event detection* emits a signal upon the moment when the search stagnation behavior is observed. The *diversification-generation* function identifies an under-explored region in the solution space to generate a starting solution for the new trajectory search. Our CFA approach facilitates the critical event detection by monitoring the number of enduring iterations since the last improvement of the best-so-far firefly. If the number of no-improvement iterations for the best-so-far firefly has exceeded a parameter *t*2, a signal for detection of a critical event is returned. The parameter *t*<sup>2</sup> is also made adaptive by the landscape analysis approach.

The diversification-generation function of the CFA approach is implemented by using the path-relinking technique, which has been found very useful in identifying a promising solution within an under-explored region and has been adopted as a diversification strategy such as in Yin et al. [6]. When a critical event signal is activated, a ratio δ of the swarm fireflies are rebuilt and each new firefly is placed on the best solution along a relinked path connecting a random solution and the best-so-far firefly. The path-relinking technique creates the path by dividing the subspace between the two end points of the path into *n* equal-size hyper-grids where *n* is the number of decision variables for the addressed optimization problem. A solution is sampled within each hyper-grid, so in total *n* solutions will be obtained along the relinked path. The best of the *n* sampled solutions is designated as the rebuilt firefly.

#### *3.5. Responsive Strategies Based on Landscape Analysis*

The previous features of CFA can be more effective by incorporating the notion of responsive strategy which adaptively tunes the search strategy when observing the status transitions. The CFA adapts the strategy parameters when the search encounters the transition between the smooth and prickly landscapes manifested by the objective function. The measure of fitness distance correlation (FDC) proposed by Jones and Forrest [23] has been proved useful for judging the suitability of a fitness landscape for various search algorithms. The FDC measures the correlation between the solution cost

and the distance to the closest global optimum. Let the set of observed cost-distance pairs be {(*c*1, *d*1), ... , (*cm*, *dm*)}, the correlation coefficient is defined as

$$\text{FDC} = \frac{\frac{1}{m} \sum\_{i=1}^{m} \left(c\_i - \bar{c}\right) \left(d\_i - \bar{d}\right)}{\sigma\_c \sigma\_d} \tag{10}$$

where <sup>−</sup> *<sup>c</sup>* and <sup>−</sup> *d* are the mean cost and the mean distance, and σ*<sup>c</sup>* and σ*d*. are the standard deviations of the costs and distances. Hence, a significant FDC value has the implication that on average, the better the solution quality the closer this solution is to an optimal solution. It can be contemplated that a single-modal objective function would impose a high FDC value in contrast to a multi-modal objective function whose FDC value is closer to zero because of the irrelevance between the fitness and the distance.

Most existing research adopted the FDC technique as an off-line analysis for realizing the characteristics of the fitness landscape, while our CFA exploits the fitness landscape analysis as an online responsive strategy. We periodically conduct the fitness landscape analysis for every 1000*n* fitness function evaluations, where *n* is the number of problem variables. All the visited solutions ever identified as local optima in the trajectory of a firefly within this time period are eligible for the FDC value computation because these local optima are representative solutions produced in the evolution. As the global optimum is unknown, the best solution obtained at the end of this period is considered to be the best-known solution. The FDC value is computed by using these local optima and the best solution identified in the current time period of previous 1000*n* fitness function evaluations. Consequently, our CFA performs the adaptive landscape analysis which can predict the objective landscape within the current region explored by the firefly swarm and activate appropriate responsive strategies to make the search more effective. If the derived FDC value for the current time period is greater than a threshold (FDC > *h*1), the local objective landscape is considered to be asymptotical single-modal, and we perform two responsive strategies as follows: (1) increasing the distance, on average, of the firefly movement by *lb* = *lb*/λ and *ub* = *ub*/λ (0 < λ < 1); (2) increasing the elapsed iterations for checking the feasibility of executing the local search and multi-start search by *t*<sup>1</sup> = *t*1/λ and *t*<sup>2</sup> = *t*2/λ. The implication of the two responsive strategies is that when the local objective landscape is single-modal, the search may be more effective if the firefly movement distance is greater and the frequency for conducting the local search and the multi-start search is lower. On contrary, If the absolute FDC value for the current time period is less than a threshold (|FDC| < *h*2), the local objective landscape is considered to be asymptotical multi-modal, and we perform two responsive strategies as follows: (1) decreasing the distance, on average, of the firefly movement by *lb* = *lb* × λ and *ub* = *ub* × λ; (2) decreasing the elapsed iterations for checking the feasibility of executing the local search and multi-start search by *t*<sup>1</sup> = *t*<sup>1</sup> × λ and *t*<sup>2</sup> = *t*<sup>2</sup> × λ.

#### *3.6. Pseudo-Code of CFA*

The pseudo-code of our CFA is elaborated in Figure 1. The new features of the CFA are emphasized in boldface for comprehensive descriptions.


**Figure 1.** Pseudo-code of the CFA algorithm.

#### **4. Experimental Results and Discussions**

We have conducted intensive experiments and statistical tests to compare the performance of the proposed CFA algorithm and its counterparts. The experimental results disclose several interesting outcomes in addition to establishing the effectiveness of the proposed method. The platform for conducting the experiments is a PC with an Intel Core i5CPU and 8.0 GB RAM. All programs are coded in C# language.

#### *4.1. Benchmark Test Functions and Algorithm Parameter Settings*

We have chosen two benchmark datasets. (1) The standard benchmark dataset contains 23 test functions that are widely used in the nonlinear global optimization literature [6,18,24,25]. The detailed function formulas can be found in the relevant literature and they have a wide variety of different landscapes and present a significant challenge to optimization methods. The number of variables, domain, and optimal value of these benchmark test functions are listed in Table 1. Performance evaluation on this dataset is reported in terms of the mean best function value obtained from 30 repetitive runs. For each run, the tested algorithm can perform 160,000 function evaluations (FE) for ensuring that the tested algorithm has very likely converged. (2) The IEEE CEC 2005 benchmark dataset which is designed for unconstrained real-parameter optimization. We selected the most challenging functions and compared our CFA with the best methods reported in [26] under the same evaluation criteria described in the original paper. The implementation of the benchmark functions from both datasets is available in public library [27] in some programming languages such as MATLAB and Java. As we used C# for all the experiments, we modified the public codes to C# and embedded them into our main program.


**Table 1.** The number of variables, domain, and optimal value of the benchmark functions.

To obtain the parameter settings of the CFA, GSO, and FA, various values for their parameters were tested with the standard benchmark dataset and the values that resulted in the best mean function value were used as the parameter settings as tabulated in Table 2. The parameter settings used by other compared metaheuristics will be presented in Sections 4.3.2 and 4.3.3 where the comparative experiments are presented.


**Table 2.** Parameter settings of the CFA, GSO, and FA.

#### *4.2. Analysis of CFA Strategies*

To understand the influence on performance of using various CFA strategies, we conduct experiments on the benchmark functions with several CFA variants as in the following subsections.

#### 4.2.1. Selection Strategy for Multiple Guiding Solutions

In contrast to GSO and FA, the CFA selects multiple guiding solutions for performing the firefly movement. The firefly can circumvent the false peaks by relaxing the limitation that constrains the firefly to move toward the single best solution within neighborhood. To investigate the influence on performance of using various selection strategies for guiding solutions, we compare the variants of the CFA that employs the roulette-wheel selection, the tournament selection, and the rank selection, respectively. The comparative performance of the CFA variants is listed in Table 3. It is seen that for the test functions with fewer than 10 variables and the relatively simple functions (Sphere and Zakharov), all of the CFA variants work well and the mean best value obtained is very near the optimal value. For the harder and larger functions (Rosenbrock, Rastrigin, and Griewank with 10 or more variables), the CFA with the rank selection strategy finds a more effective solution for most of the functions than the CFA with the other two selection strategies. The rank selection outperforms the other selection strategies in tackling harder problems because it eliminates the fitness-scaling problem by working in the rank-value space instead of in the function-value space such that the firefly will not be severely misled by false but higher peaks.


**Table 3.** The mean best function value and the standard deviation obtained by using various selection strategies.

#### 4.2.2. Adaptive Strategy for Local Search

Local search is a rudimentary component contained in most of the modern metaheuristic algorithms. The local search procedure exploits the regional function profiles and makes the master metaheuristic algorithm more effective than the original form which does not embed this procedure. However, the local search procedure can be computationally expensive if it is performed within each iteration of

the master metaheuristic algorithm. As previously noted, the CFA applies the adaptive local search strategy which dynamically varies the frequency of the local search execution according to the result of the landscape analysis. Table 4 tabulates the mean best function value and the standard deviation obtained by CFA with or without adaptive local search. We observe that for all the benchmark test functions the CFA with adaptive local search obtains better or equivalent mean function value than its counterpart without adaptive local search.

**Table 4.** The mean best function value and the standard deviation obtained by CFA with or without adaptive local search.


#### 4.2.3. Multi-Start Strategy for Swarm Rebuilding

Multi-start is a diversification strategy which terminates an ineffective trajectory search and reinitiates a new one from an under-explored region. Our CFA applies the adaptive multi-start strategy which monitors the performance of the incumbent firefly swarm. If the program stops improving the performance for a threshold number of consecutive iterations, part of the swarm is rebuilt by positioning some fireflies on diversified locations using the path-relinking technique. The performance stagnation threshold is made adaptive according to the result of the landscape analysis such that the frequency of the multi-start activation depends on the regional function profiles under exploration. It can be seen from Table 5 that the multi-start strategy effectively improves the mean best function value and the standard deviation obtained by CFA.

#### 4.2.4. Responsive Strategies Based on Landscape Analysis

As noted in Section 3.5, the responsive strategies employed by the CFA are made adaptive based on the analysis of function landscape. The landscape analysis makes distinctions of two classic function forms: single-modal and multi-modal. The responsive strategies then vary the movement step size and the frequency of the local search and the multi-start activations to make the CFA search more effective. Table 6 lists the comparative performance of the CFA with or without performing the landscape analysis. It is noted that the version of the CFA without adaptive landscape analysis still embeds the local

search and the multi-start procedures as its rudimentary components, which, however, are executed with fixed parameter values. From the tabulated result, we conclude that the landscape analysis can proliferate the performance gains possibly obtained by the local search and the multi-start strategies.


**Table 5.** The mean best function value and the standard deviation obtained by CFA with or without multi-start strategy.

#### 4.2.5. Worst-Case Analysis

As the proposed CFA is a stochastic optimization algorithm, every single execution of the program may produce a different solution. It thus becomes very important to measure the solution variation of the worst case obtained by the CFA. We conduct the worst-case analysis on the three most challenging functions, namely Rosenbrock (30), Rastrigin (30), and Griewank (30) as follows. One thousand independent runs of the CFA program are performed. We record the worst function value (fitness) that could be obtained by allowing different number of repetitive runs in the experiment. It is seen from Figure 2 that the Rosenbrock (30) fitness value of less than 5 can be obtained with 99.7% confidence because three out of the one thousand runs report a fitness value greater than 5. It also indicates the worst fitness value we could obtain is no more than 5 if we can run the CFA program at least three times. Actually, there are two major components with the fitness distribution. One is close to 4 (a local optimum) with about 20% of the distribution, the other is near zero (the global optimum) with about 79% confidence. The worst-case analysis with Rastrigin (30) is illustrated in Figure 3. We observe a Gaussian-like distribution and the major component falls in the fitness value between 0 and 12. The Gaussian-like distribution provides a reliable guarantee that the mean performance value can be obtained with high confidence. Figure 4 shows the worst-case analysis for Griewank (30). The distribution has a long tail, and the major component concentrates at the high quality part. This situation manifests good properties that there is only one major outcome which is near the global optimum, and that a near-optimal solution can be obtained with a few runs in the worst case.


**Table 6.** The mean best function value and the standard deviation obtained by CFA with or without landscape analysis.

**Figure 2.** Worst-case analysis with the Rosenbrock (30) function. (**a**) The worst fitness obtained by the CFA program as the number of repetitive runs increases. (**b**) The number of program runs with which each performance level is reached.

**Figure 3.** Worst-case analysis with the Rastrigin (30) function. (**a**) The worst fitness obtained by the CFA program as the number of repetitive runs increases. (**b**) The number of program runs with which each performance level is reached.

**Figure 4.** Worst-case analysis with the Griewank (30) function. (**a**) The worst fitness obtained by the CFA program as the number of repetitive runs increases. (**b**) The number of program runs with which each performance level is reached.

#### *4.3. Performance Evaluation*

In this section, the performance of the CFA is evaluated in three-fold. First, the CFA is compared against its counterparts, the GSO and the FA. Secondly, the CFA is compared to other kinds of metaheuristics. Thirdly, the performance of CFA is further justified on the CEC 2005 dataset. For the first two experiments, all compared algorithms were executed 30 times for each test function. For each run, the program can perform 160,000 FEs. However, for the third experiment, the evaluation criteria in the original paper are respected where the competing method is executed in 25 independent runs and in each run the method is evaluated at different numbers of consumed FEs. To compare the performance between two competing algorithms, we employ the performance index defined by Yin et al. [6] as follows. Given two competing algorithms, p and q, the performance merit of p against q on a test function is defined by the formula,

$$\text{Merrit(p, q)} = (\mathbf{f}\_{\text{p}} - \mathbf{f}^{\star} + \varepsilon) / (\mathbf{f}\_{\text{q}} - \mathbf{f}^{\star} + \varepsilon) \tag{11}$$

where <sup>ε</sup> is a small constant equal to 5 <sup>×</sup> 10<sup>−</sup>7, fp and fq are the mean best function values obtained by the competing algorithms p and q, and f\* is the global minimum of the test function. As all the test functions involve minimization, we realize p outperforms q if Merit(p, q) < 1.0, p is inferior to q if Merit(p, q) > 1.0, and p and q perform equally well if Merit(p, q) = 1.0.

#### 4.3.1. Comparison with CFA Counterparts

Our CFA enhances the GSO and the FA by incorporating the responsive strategies from the adaptive memory programming domain. Thus, it is important to validate our idea by comparing the performance of the CFA against its counterparts, the GSO and the FA. The results shown in the second to the fifth columns of Table 7 are the mean best function values obtained by the competing algorithms, and those in the last three columns correspond to the relative merit values. We observe that for the test functions with fewer than ten variables, the CFA has a unit merit or a merit value less than one with one order of magnitude, indicating that all the competing algorithms perform nearly equally well. For the test functions with ten or more variables, the merit value of the CFA in relation to the GSO and the FA becomes significantly less, implying a superior performance in favor of the CFA. The product of the merit values gives an overview of the comparative performance on all test function. It is seen at the bottom of Table 7 that the CFA has a merit product of 7.32 <sup>×</sup> 10−<sup>43</sup> and 9.51 <sup>×</sup> 10−<sup>32</sup> in relation to the GSO and the FA, respectively. We further compare the CFA to the better one of the best function values obtained by the GSO and the FA, giving a merit product of 8.99 <sup>×</sup> 10−<sup>29</sup> as shown in the last column of Table 7. The result suggests that the CFA significantly outperforms its counterparts over the benchmark dataset, and that the CFA also beats a hybrid which takes the better one of the best function values obtained by the two counterparts.

**Table 7.** The performance comparison between the CFA and its counterparts.


We further verify the online performance advantage of our CFA against its counterparts by a statistical test suggested in Taillard [28]. As the best objective value produced by a metaheuristic approach is non-deterministic, we model the result obtained from multiple runs of method A (and method B) as a random variable *Xa* (*Xb*) and we want to testify the confidence regarding that *Xa* is less than *Xb*. A classic statistical test based on the central limit theorem for comparing two proportions is to approximate the mean *Xa* − *Xb* as a normal distribution if the collected number of samples

is sufficiently large. Therefore, 30 independent runs of each competing algorithm were conducted. For each run, the online best function value obtained at a particular FE, say *e*, is a sample for *Xa*(*e*), the random variable for the result obtained by method A at FE *e*. We tally the samples at every instance of FEs during the whole duration of executing the algorithm. By examining the mean curve of *Xa*(*e*) and *Xb*(*e*) as *e* increases during the evolution, we can testify if method A well outperforms method B. For clear illustration, the boundary and mean curves over the 30 runs are plotted. Figure 5a shows the online performance analysis with 95% confidence interval for the Rosenbrock (30) function. It is seen that the 95% confidence interval of the best function value obtained by the three competing algorithms (GSO, FA, and CFA) converges with various speeds. Our CFA converges at a much faster speed and reach towards a better function value than its two counterparts. The FA is the second-best performer followed by the GSO. To investigate the detailed performance during the second half duration of the execution, we enlarge the plot for this period as shown in Figure 5b. It can be seen that the CFA significantly outperforms the other two algorithms with 95% confidence level during the second half execution duration. We also found that the GSO performs better than the FA after 80,000 FEs, although the GSO may not beat the FA at the early stage of the execution as previously noted. The online performance comparison with 95% confidence level for the Rastrigin (30) function is shown in Figure 6. We observe that during the whole execution period the CFA significantly outperforms the GSO and the FA. The FA performs better than the GSO when the allowed number of consumed FEs is less than 20,000, but the FA is far surpassed by the GSO if more FEs are allowed. Figure 7 shows the online performance variation with 95% confidence level for the Griewank (30) function. Again, the CFA is the best performer among the three algorithms throughout the whole execution duration. However, we see a phenomenon differing from those for the two previous test functions with the GSO and the FA. The GSO and the FA performs about equally well before consuming 50,000 FEs, although the former is a more stable performer because it has a shorter confidence interval. However, after this critical execution period, the FA becomes very effective both in the convergence speed and the function value. As shown in Figure 7b, the FA significantly surpasses the GSO and reaches a comparative performance with the CFA.

#### 4.3.2. Comparison against Other Metaheuristics

We now compare the CFA against other metaheuristics inspired by different nature metaphors, the PSO, the GA, and the cyber swarm algorithm (CSA). The compared PSO is the constriction factor version proposed by Clerc and Kennedy [29] which has been shown to be one of the best PSO implementations. The implemented GA employs real-value chromosome coding, tournament selection (with *k* = 2, i.e., two competitors in each instance of selection), arithmetic crossover, and Gaussian mutation. The GA is generational without population gap, i.e., the whole parent population is replaced by the offspring population. The implementation and parameter setting of CSA follow the original paper [6]. All the compared algorithms have the same population size of 60 individuals, and are executed until consuming 160,000 FEs. Table 8 tabulates the mean best function value obtained by the compared algorithms over 30 runs and the merit value among the competitors. For the comparison of the CFA against the PSO and the GA, we observe that the CFA well surpasses the other two algorithms on most of the benchmark functions. The merit product in relation to the PSO and the GA is 8.80 <sup>×</sup> <sup>10</sup>−<sup>21</sup> and 2.11 <sup>×</sup> <sup>10</sup><sup>−</sup>36, respectively. When we compare the CFA to the CSA, the merit product is 1.94 <sup>×</sup> 1018. The result seems to suggest that the CSA performs better on the dataset. However, if we take a closer look, the CSA is very effective in solving small functions with less than ten variables, thus CSA gives significantly greater merits for these functions. For the test functions with ten or more variables, the merit value turns to be in favor to the CFA, disclosing that the CFA is more effective than CSA in tackling larger-sized functions. It is worth noting that both CFA and the CSA take advantage of the features contained in the AMP domain, and the two algorithms extremely outperform the other compared algorithms in our experiments. This phenomenon discloses the potential of future research in the direction of marrying the AMP with other types of metaheuristics.

**Figure 5.** Online performance analysis with 95% confidence interval for the Rosenbrock (30) function. (**a**) The convergence of the best function value during the whole duration of the execution. (**b**) The convergence of the best function value during the second half duration of the execution.

**Figure 6.** Online performance analysis with 95% confidence interval for the Rastrigin (30) function. (**a**) The convergence of the best function value during the whole duration of the execution. (**b**) The convergence of the best function value during the second half duration of the execution.

**Figure 7.** Online performance analysis with 95% confidence interval for the Griewank (30) function. (**a**) The convergence of the best function value during the whole duration of the execution. (**b**) The convergence of the best function value during the second half duration of the execution.


**Table 8.** The performance comparison between the CFA and other metaheuristics.

To compare the CFA with the state-of-the-art variants of FA, we quote the results (mean objective value over 30 runs) from the original paper LFA [9], VESSFA [10], WFA [11], CLFA [12], FAtidal [13], and GDAFA [14]. The best mean objective value for each function obtained by all compared methods is printed in bold. As can be seen in Table 9, our CFA wins the most times as obtaining the best mean objective value among all. GDAFA seems to possess better performance as the dimensionality increases. Both CFA and GDAFA can gain an objective value very close to the optimum, while the other competing methods may produce an objective value far away from the optimum in some challenging functions.


**Table 9.** The performance comparison between the CFA and the state-of-the-art variants of FA.

#### 4.3.3. Comparison on the CEC 2005 Dataset

To further justify the performance of CFA, we compare CFA with the investigated methods reported in [26] on 12 CEC 2005 benchmark functions [30]. The IEEE CEC Repository [27] provides fruitful benchmark datasets for optimization problems with various purposes such as unconstrained, constrained, and multi-objective optimization. The CEC 2005 dataset is designed for unconstrained real-parameter optimization which is addressed in this paper. We selected 12 CEC 2005 functions which are very challenging and have never been solved to optimal by any known methods [26]. We executed

all compared algorithms with the same evaluation criteria and parameter settings as described in the original paper [26]. Each algorithm is executed for 25 independent runs on each test function with *n* = 10 and 30, respectively. All compared methods are executed by being allowed to consume 1000, 10,000, and 100,000 FEs.

We adopt the GAP performance measure proposed in the original paper [26] and it is defined as GAP = |f − f\*| where f is the function value obtained by an evaluated method and f\* is the global optimum value of the test function. Table 10 shows the mean minimum (Min.) and average (Avg.) of GAP and Merit of all compared methods over the 12 functions for *n* = 10. We observe that our CFA is less exploitative in small size CEC problems than the leading methods such as G-CMA-ES and L-CMA-ES, both of which are based on the covariance matrix adaptation evolution strategy (CMA-ES) [31] which updates the covariance of the multivariate distribution to better handle the dependency between variables. Though the CFA is less competitive in the mean Avg. GAP, it can deliver a quality Min. out of 25 independent runs. It suggests that the CFA can be executed multiple times and output the best value from those runs when tackling small size yet complex problems. This phenomenon is also revealed in the geometric mean of the merits (GMM). The GMM of the Min. function value is 1.099, 0.995, and 0.878 at 1000, 10,000, and 100,000 FEs, while the GMM of the Avg. function value gradually deteriorates from 0.963, 1.167 to 1.211 at the same FEs.


**Table 10.** Min./Avg. GAP and Merit over the CEC dataset with *n* = 10.

As for high-dimensional and complex CEC problems with *n* = 30, the mean Min. and Avg. of GAP and Merit of all compared methods are tabulated in Table 11. It is seen from the GMM that our CFA is compared favorably to the other methods in both Min. and Avg. function value at all FEs check points. The prevailing exploration search conducted by CFA is due to its elements of AMP-responsive strategies, which are more effective when the problem is more complex and is presented in higher-dimensional space. The G-CMA-ES is again the best method since it excels in terms of GAP in most cases as compared to the other competing methods. It is worth further studying the possibility of including the CMA technique into the CFA to resolve the dependency between variables.



#### **5. Concluding Remarks and Future Research**

We have proposed the CFA which is a more effective form of the GSO and the FA in global optimization. The CFA incorporates several AMP strategies including multiple guiding solutions, pattern search as local improvement method, solution set rebuilding in a multi-start search template, and the responsive strategies. The experimental result on benchmark functions for global optimization has shown that the CFA performs significantly better in terms of both solution quality and robustness than the GSO, FA, and several state-of-the-art metaheuristic methods as demonstrated in our statistical analyses and comprehensive experiments. It is worth noting that it is certain a sophisticated method such as CFA incorporating advanced components will pose higher computational complexity in the computation iteration than an algorithm which does not. However, as metaheuristic approaches are computational ones which can stop at any computation iteration and output the best-so-far result. We conduct a fair performance comparison between two metaheuristic approaches at the same number of fitness FE instead of using the evolution iterations. All our experiments follow this fashion.

Our findings strengthen the motivations for marrying the approaches selected from each of the metaheuristic dichotomies, respectively. The CFA template gives general ideas for creating this sort of effective hybrid metaheuristics. Inspired by the promising result of the CFA, it is worthy of investigating the possibility of the application of the CFA template to other metaheuristic approaches with various problem domains for future research.

**Author Contributions:** Conceptualization, P.-Y.Y.; methodology, P.-Y.Y. and P.-Y.C.; software, P.-Y.C.; validation, Y.-C.W. and R.-F.D.; writing—original draft preparation, P.-Y.Y.; writing—review and editing, P.-Y.Y., Y.-C.W. and R.-F.D.; visualization, P.-Y.C.; funding acquisition, P.-Y.Y. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by MOST Taiwan, grant numbers 107-2410-H-260-015-MY3. The APC was funded by 107-2410-H-260-015-MY3.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Nomenclature**

The list of the acronyms referenced in this paper is tabulated as follows.


#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Search Patterns Based on Trajectories Extracted from the Response of Second-Order Systems**

**Erik Cuevas 1,\*, Héctor Becerra 1, Héctor Escobar 1, Alberto Luque-Chang 1, Marco Pérez 1, Heba F. Eid <sup>2</sup> and Mario Jiménez <sup>1</sup>**


**Abstract:** Recently, several new metaheuristic schemes have been introduced in the literature. Although all these approaches consider very different phenomena as metaphors, the search patterns used to explore the search space are very similar. On the other hand, second-order systems are models that present different temporal behaviors depending on the value of their parameters. Such temporal behaviors can be conceived as search patterns with multiple behaviors and simple configurations. In this paper, a set of new search patterns are introduced to explore the search space efficiently. They emulate the response of a second-order system. The proposed set of search patterns have been integrated as a complete search strategy, called Second-Order Algorithm (SOA), to obtain the global solution of complex optimization problems. To analyze the performance of the proposed scheme, it has been compared in a set of representative optimization problems, including multimodal, unimodal, and hybrid benchmark formulations. Numerical results demonstrate that the proposed SOA method exhibits remarkable performance in terms of accuracy and high convergence rates.

**Keywords:** metaheuristic methods; search patterns; second-order systems; evolutionary methods

#### **1. Introduction**

Metaheuristic algorithms refer to generic optimization schemes that emulate the operation of different natural or social processes. In metaheuristic approaches, the optimization strategy is performed by a set of search agents. Each agent maintains a possible solution to the optimization problem, and is initially produced by considering a random feasible solution. An objective function determines the quality of the solution of each agent. By using the values of the objective function, at each iteration, the position of the search agents is modified, employing a set of search patterns that regulate their movements within the search space. Such search patterns are abstract models inspired by natural or social processes [1]. These steps are repeated until a stop criterion is reached. Metaheuristic schemes have confirmed their supremacy in diverse real-world applications in circumstances where classical methods cannot be adopted.

Essentially, a clear classification of metaheuristic methods does not exist. Despite this, several categories have been proposed that considered different criteria, such as a source of inspiration, type of operators or cooperation among the agents. In relation to inspiration, nature-inspired metaheuristic algorithms are classified into three categories: Evolutionbased, swarm-based, and physics-based. Evolution-based approaches correspond to the most consolidate search strategies that use evolution elements as operators to produce search patterns. Consequently, operations, such as reproduction, mutation, recombination, and selection are used to generate search patterns during their operations. The most representative examples of evolution-based techniques, include Evolutionary Strategies

**Citation:** Cuevas, E.; Becerra, H.; Escobar, H.; Luque-Chang, A.; Pérez, M.; Eid, H.F.; Jiménez, M. Search Patterns Based on Trajectories Extracted from the Response of Second-Order Systems. *Appl. Sci.* **2021**, *11*, 3430. https://doi.org/ 10.3390/app11083430

Academic Editor: Juan A. Gómez-Pulido

Received: 6 March 2021 Accepted: 1 April 2021 Published: 12 April 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

(ES) [2–4], Genetic Algorithms (GA) [5], Differential Evolution (DE) [6] and Self-Adaptative Differential Evolution (JADE) [7]. Swarm-inspired techniques use behavioral schemes extracted from the collaborative interaction of different animals or species of insects to produce a search strategy. Recently, a high number of swarm-based approaches have been published in the literature. Among the most popular swarm-inspired approaches, include the Crow Search Algorithm (CSA) [8], Artificial Bee Colony (ABC) [9], Particle Swarm Optimization (PSO) algorithm [10–12], Firefly Algorithm (FA) [13,14], Cuckoo Search (CS) [15], Bat Algorithm (BA) [16], Gray Wolf Optimizer (GWO) [17], Moth-flame optimization algorithm (MFO) [18] to name a few. Metaheuristic algorithms that consider the physics-based scheme use simplified physical models to produce search patterns for their agents. Some examples of the most representative physics-based techniques involve the States of Matter Search (SMS) [19,20], the Simulated Annealing (SA) algorithm [21–23], the Gravitational Search Algorithm (GSA) [24], the Water Cycle Algorithm (WCA) [25], the Big Bang-Big Crunch (BB-BC) [26] and Electromagnetism-like Mechanism (EM) [27]. Figure 1 visually exhibits the taxonomy of the metaheuristic classification. Although all these approaches consider very different phenomena as metaphors, the search patterns used to explore the search space are exclusively based on spiral elements or attraction models [10–17,28]. Under such conditions, the design of many metaheuristic methods refers to configuring a recycled search pattern that has been demonstrated to be successful in previous approaches for generating new optimization schemes through a marginal modification.

**Figure 1.** Visual taxonomy of the nature-inspired metaheuristic schemes.

On the other hand, the order of a differential equation refers to the highest degree of derivative considered in the model. Therefore, a model whose input-output formulation is a second-order differential equation is known as a second-order system [29]. One of the main elements that make a second-order model important is its ability to present very different behaviors, depending on the configuration of its parameters. Through its different behaviors, such as oscillatory, underdamped, or overdamped, a second-order system can exhibit distinct temporal responses [30]. Such behaviors can be observed as search trajectories under the perspective of metaheuristic schemes. Therefore, with second-order systems, it is possible to produce oscillatory movements within a certain region or build complex search patterns around different points or sections of the search space.

In this paper, a set of new search patterns are introduced to explore the search space efficiently. They emulate the response of a second-order system. The proposed set of search patterns have been integrated as a complete search strategy, called Second-Order Algorithm (SOA), to obtain the global solution of complex optimization problems. To analyze the performance of the proposed scheme, it has been compared in a set of representative optimization problems, including multimodal, unimodal, and hybrid benchmark formulations. The competitive results demonstrate the promising results of the proposed search patterns. The main contributions of this research can be stated as follows:


The remainder of this paper is structured as follows: A brief introduction of the second-order systems is given in Section 2; in Section 3, the most important search patterns in metaheuristic methods are discussed; in Section 4, the proposed search patterns are defined; in Section 5, the measurement of exploration-exploitation is described; in Section 6, the proposed scheme is introduced; Section 7 presents the numerical results; in Section 8, the main characteristics of the proposed approach are discussed; in Section 9, finally, the conclusions are drawn.

#### **2. Second-Order Systems**

A model whose input *R*(*s*)-output *C*(*s*) formulation is a second-order closed-loop transfer function is known as a second-order system. One of the main elements that make a second-order model important is its ability to present very different behaviors depending on the configuration of its parameters. A generic second-order model can be formulated under the following expression [29],

$$\frac{\mathcal{L}(s)}{\mathcal{R}(s)} = \frac{\omega\_n^2}{s^2 + 2\zeta\omega\_n s + \omega\_n^2},\tag{1}$$

where *ζ* and *ω<sup>n</sup>* represent the damping ratio and *ω<sup>n</sup>* the natural frequency, respectively, while *s* symbolizes the Laplace domain.

The dynamic behavior of a system is evaluated in terms of the temporal response obtained through a unitary step signal as input *R*(*s*). The dynamic behavior is defined as the way in which the system reacts, trying to reach the value of one as time evolves. The dynamic behavior of the second-order system is described in terms of *ζ* and *ω<sup>n</sup>* [30]. Assuming such parameters, the second-order system presents three different behaviors: Underdamped (0 < *ζ* < 1), critically damped (*ζ* = 1), and overdamped (*ζ* > 1).

#### *2.1. Underdamped Behavior (*0 < *ζ* < 1)

In this behavior, the poles (roots of the denominator) of Equation (1) are complex conjugated and located in the left-half of the *s* plane. Under such conditions, the system underdamped response *CU*(*s*) in the Laplace domain can be characterized as follows:

$$\mathcal{C}\_{\rm II}(s) = \frac{\omega\_n^2}{s \left(s + \zeta \omega\_n\right)^2 + \omega\_n^2 (1 - \zeta^2)}. \tag{2}$$

Applying partial fraction operations and the inverse Laplace transform, it is obtained the temporal response that describe the underdamped behavior *cU*(*t*) as it is indicated in Equation (3):

$$\alpha\_{ll}(t) = 1 - \frac{e^{-\tilde{\zeta}\omega\_n t}}{\sqrt{1-\tilde{\zeta}^2}} \sin\left(\omega\_n \sqrt{1-\tilde{\zeta}^2} + \tan^{-1}\left(\frac{\sqrt{1-\tilde{\zeta}^2}}{\tilde{\zeta}}\right)\right). \tag{3}$$

If *ζ* = 0, a special case is presented in which the temporal system response is oscillatory. The output of these behaviors is visualized in Figure 2 for the cases of *ζ* = 0, *ζ* = 0.2, *ζ* = 0.5 and *ζ* = 0.707. Under the underdamped behavior, the system response starts with

high acceleration. Therefore, the response produces an overshoot that surpasses the value of one. The size of the overshoot inversely depends on the value of *ζ*.

**Figure 2.** Temporal responses of second-order system considering its different behaviors: Underdamped (0 < *ζ* < 1), critically damped (*ζ* = 1), and overdamped (*ζ* > 1).

#### *2.2. Critically Damped Behavior (ζ* = 1)

Under this behavior, the two poles of the transfer function of Equation (1) present a real number and maintain the same value. Therefore, the response of the critically damped behavior *CC*(*s*) in the Laplace domain can be described as follows:

$$\mathcal{C}\_{\mathbb{C}}(\mathbf{s}) = \frac{\omega\_{\mathbf{u}}^{2}}{\mathbf{s} (\mathbf{s} + \omega\_{\mathbf{u}})^{2}}. \tag{4}$$

Considering the inverse Laplace transform of Equation (4), the temporal response of the critically damped behavior *cC*(*t*) is determined under the following model:

$$
\omega\_{\mathbb{C}}(t) = 1 - \varepsilon^{-\omega\_n t} (1 + \omega\_n t). \tag{5}
$$

Under the critically damped behavior, the system response presents a temporal pattern similar to a first-order system. It reaches the objective value of one without experimenting with an overshoot. The output of the critically damped behavior is visualized in Figure 2.

#### *2.3. Overdamped Behavior (ζ* > 1)

In the overdamped case, the two poles of a transfer function of Equation (1) have real numbers but with different values. Its response *CO*(*s*) in the Laplace domain is modeled under the following formulation:

$$\mathcal{C}\_{O}(s) = \frac{\omega\_{n}^{2}}{s\left(s + \zeta\omega\_{n} + \omega\_{n}\sqrt{\zeta^{2} - 1}\right)\left(s + \zeta\omega\_{n} - \omega\_{n}\sqrt{\zeta^{2} - 1}\right)}. \tag{6}$$

After applying the inverse Laplace transform, it is obtained the temporal response of the overdamped behavior *cO*(*t*) defined as follows:

$$c\_O(t) = 1 + e^{-(\zeta - \sqrt{\zeta^2 - 1})\omega\_n t}.\tag{7}$$

Under the Overdamped behavior, the system slowly reacts until reaching the value of one. The deceleration of the response depends on the value of *ζ*. The greater the value of *ζ*, the slower the response will be. The output of this behavior is visualized in Figure 2 for the case of *ζ* = 1.67.

#### **3. Search Patterns in Metaheuristics**

The generation of efficient search patterns for the correct exploration of a fitness landscape could be complicated, particularly in the presence of ruggedness and multiple local optima. Recently, several new metaheuristic schemes have been introduced in the literature. Although all these approaches consider very different phenomena as metaphors, the search patterns, used to explore the search space, are very similar. A search pattern is a set of movements produced by a rule or model in order to examine promising solutions from the search space.

Exploration and exploitation correspond to the most important characteristics of a search pattern. Exploration refers to the ability of a search pattern to examine a set of solutions spread in distinct areas of the search space. On the other hand, exploitation represents the capacity of a search pattern to improve the accuracy of the existent solutions through a local examination. The combination of both mechanisms in a search pattern is crucial for attaining success when solving a particular optimization problem.

To solve the optimization formulation, from a metaheuristic point of view, a population of **P***<sup>k</sup>* **x***<sup>k</sup>* <sup>1</sup>,..., **<sup>x</sup>***<sup>k</sup> N* of *<sup>N</sup>* candidate solutions (individuals) evolve from an initial point (*k* = 1) to a *Maxgen* number of generations (*k* = *Maxgen*). In the population, each individual **x***<sup>k</sup> <sup>i</sup>* (*<sup>i</sup>* <sup>∈</sup> [1, . . . , *<sup>N</sup>*]) corresponds to a *<sup>d</sup>*-dimensional element *xk <sup>i</sup>*,1,..., *<sup>x</sup><sup>k</sup> i*,*d* , which symbolizes the decision variables involved by the optimization problem. At each generation, search patterns are applied over the individuals of the population **P***<sup>k</sup>* to produce the new population **P***k*<sup>+</sup>1. The quality of each individual **x***<sup>k</sup> <sup>i</sup>* is evaluated in terms of its solution regarding the objective function *J* **x***k i* whose result represents the fitness value of **x***<sup>k</sup> <sup>i</sup>* . As the metaheuristic method evolves, the best current individual **b** {*b*1,..., *bd*} is maintained since **b** represents the best available solution seen so-far.

In general, a search pattern is applied to each individual **x***<sup>k</sup> <sup>i</sup>* using the best element **b** as a reference. Then, following a particular model, a set of movements are produced to modify the position of **x***<sup>k</sup> <sup>i</sup>* until the location of **b** has been reached. The idea behind this mechanism is to examine solutions in the trajectory from **x***<sup>k</sup> <sup>i</sup>* to **b** with the objective to find a better solution than the current **b**. Search patterns differ in the model employed to produce the trajectories **x***<sup>k</sup> <sup>i</sup>* from to **b**.

Two of the most popular search models are attraction and spiral trajectories. The attraction model generates attraction movements from **x***<sup>k</sup> <sup>i</sup>* to **b**. The attraction model is used extensively by several metaheuristic methods such as PSO [10–12], FA [13,14], CS [15], BA [16], GSA [24], EM [27] and DE [6]. On the other hand, the spiral model produces a spiral trajectory that encircles the best element **b**. The spiral model is employed by the recently published WOA and GWO schemes. Trajectories produced by the attraction, and spiral models are visualized in Figure 3a,b, respectively.

**Figure 3.** Trajectories produced by, (**a**) attraction, and (**b**) spiral models.

#### **4. Proposed Search Patterns**

In this paper, a set of new search patterns are introduced to explore the search space efficiently. They emulate the response of a second-order system. The proposed set of search patterns have been integrated as a complete search strategy to obtain the global solution of complex optimization problems. Since the proposed scheme is based on the response of the second-order systems, it can be considered as a physics-based algorithm. In our approach, the temporal response of second-order system is used to generate the trajectory from the position of **x***<sup>k</sup> <sup>i</sup>* = *xk <sup>i</sup>*,1,..., *<sup>x</sup><sup>k</sup> i*,*d* to the location of **b** = {*b*1,..., *bd*}. With the use of such models, it is possible to produce more complex trajectories that allow a better examination of the search space. Under such conditions, we consider the three different responses of a second-order system to produce three distinct search patterns. They are the underdamped, critically damped and overdamped modeled by the expressions Equations (8)–(10), respectively:

$$\mathbf{x}\_{i,j}^{k} = \left(1 - \frac{e^{-\zeta\omega\_{n}k}}{\sqrt{1-\zeta^{2}}} \sin\left(\omega\_{n}\sqrt{1-\zeta^{2}} + \tan^{-1}\left(\frac{\sqrt{1-\zeta^{2}}}{\zeta}\right)\right)\right) \left(b\_{j} - \mathbf{x}\_{i,j}^{k}\right);\tag{8}$$

$$\mathbf{x}\_{i,j}^k = \left(\mathbf{1} - \boldsymbol{\sigma}^{-\omega\_n k} (\mathbf{1} + \omega\_n k)\right) \left(b\_j - \mathbf{x}\_{i,j}^k\right);\tag{9}$$

$$\mathbf{x}\_{i,j}^{k} = \left(\mathbf{1} + e^{-(\zeta - \sqrt{\zeta^2 - 1})\omega\_n \mathbf{z}}\right) \left(b\_j - \mathbf{x}\_{i,j}^{k}\right);\tag{10}$$

where *i* (∈ [1, *N*]) corresponds to the search agent while *j* (∈ [1, *d*]) symbolizes the decision variable or dimension. Since the behavior of each search pattern depends on the value of *ζ*, it is easy to combine elements to produce interesting trajectories. Figure 4 presents some examples of trajectories produced by using different values for *ζ*. In the Figure, it is assumed a two-dimensional case (*d* = 2) where the initial position of the search agent **x***k <sup>i</sup>* is (0.5, 0.5) and the final location or the best location (1, 1). Figure 4a presents the case of *x<sup>k</sup> <sup>i</sup>*,1 ← *<sup>ζ</sup>* = <sup>0</sup> and *<sup>x</sup><sup>k</sup> <sup>i</sup>*,2 ← *<sup>ζ</sup>* = 1. Figure 4b presents the case of *<sup>x</sup><sup>k</sup> <sup>i</sup>*,1 ← *ζ* = 0.1 and *xk <sup>i</sup>*,2 ← *<sup>ζ</sup>* = 0.5. Figure 4c presents the case of *<sup>x</sup><sup>k</sup> <sup>i</sup>*,1 ← *<sup>ζ</sup>* = <sup>1</sup> and *<sup>x</sup><sup>k</sup> <sup>i</sup>*,2 ← *ζ* = 1.67. Finally, Figure 4d presents the case of *x<sup>k</sup> <sup>i</sup>*,1 ← *<sup>ζ</sup>* = 0.5 and *<sup>x</sup><sup>k</sup> <sup>i</sup>*,2 ← *ζ* = 1. From the figures, it is clear that the second-order responses allow producing several complex trajectories, which include most of the other search patterns known in the literature. In all cases (a)–(d), the value of *ω<sup>n</sup>* has been set to 1.

**Figure 4.** Some examples of trajectories produced by using different values for ζ. (**a**) *x<sup>k</sup> <sup>i</sup>*,1 <sup>←</sup> *<sup>ζ</sup>* <sup>=</sup> <sup>0</sup> and *<sup>x</sup><sup>k</sup> <sup>i</sup>*,2 ← *ζ* = 1 , (**b**) *x<sup>k</sup> <sup>i</sup>*,1 <sup>←</sup> *<sup>ζ</sup>* <sup>=</sup> 0.1 and *<sup>x</sup><sup>k</sup> <sup>i</sup>*,2 <sup>←</sup> *<sup>ζ</sup>* <sup>=</sup> 0.5 , (**c**) *<sup>x</sup><sup>k</sup> <sup>i</sup>*,1 <sup>←</sup> *<sup>ζ</sup>* <sup>=</sup> 1 and *<sup>x</sup><sup>k</sup> <sup>i</sup>*,2 <sup>←</sup> *<sup>ζ</sup>* <sup>=</sup> 1.67 and (**d**) *<sup>x</sup><sup>k</sup> <sup>i</sup>*,1 <sup>←</sup> *<sup>ζ</sup>* <sup>=</sup> 0.5 and *<sup>x</sup><sup>k</sup> <sup>i</sup>*,2 ← *ζ* = 1 .

#### **5. Balance of Exploration and Exploitation**

Metaheuristic methods employ a set of search agents to examine the search space with the objective to identify a satisfactory solution for an optimization formulation. In metaheuristic schemes, search agents that present the best fitness values tend to regulate the search process, producing an attraction towards them. Under such conditions, as the optimization process evolves, the distance among individuals diminishes while the effect of exploitation is highlighted. On the other hand, when the distance among individuals increases, the characteristics of the exploration process are more evident.

To compute the relative distance among individuals (increase and decrease), a diversity indicator known as the dimension-wise diversity index [31] is used. Under this approach, the diversity is formulated as follows,

$$Div\_j = \frac{1}{N} \sum\_{i=1}^{N} \left| median\left(\mathbf{x}^j\right) - \mathbf{x}\_{i,j} \right|\\Div = \frac{1}{d} \sum\_{j=1}^{d} Div\_j \tag{11}$$

where *median xj* symbolizes the median of dimension *j* of all search agents. *xi*,*<sup>j</sup>* represents the variable decision *j* of the individual *i*. *N* is the number of individuals in the population **P***<sup>k</sup>* while *d* corresponds to the number of dimensions of the optimization formulation. The diversity *Divj* (of the *j*-th dimension) evaluates the relative distance between the variable *j* of each individual and its median value. The complete diversity *Div* (of the entire population) corresponds to the averaged diversity in each dimension. Both elements *Divj* and *Div* are calculated in every iteration.

Having evaluated the diversity values, the level of exploration and exploitation can be computed as the percentage of the time that a search strategy invests exploring or exploiting in terms of its diversity values. These percentages are calculated in each iteration by means of the following models,

$$XPL\% = \left(\frac{Div}{Div\_{\max}}\right) \times 100\\XPT\% = \left(\frac{|Div - Div\_{\max}|}{Div\_{\max}}\right) \times 100\tag{12}$$

where *Divmax* symbolizes the maximum diversity value obtained during the optimization process. The percentage of exploration *XPL*% corresponds to the size of exploration as the rate between *Div* and *Divmax*. On the other hand, the percentage of exploitation *XPT*% symbolizes the level of exploitation. *XPT*% is computed as the complemental percentage to *XPL*% since the difference between *Divmax* and *Div* is generated because of the concentration of individuals.

#### **6. Proposed Metaheuristic Algorithm**

The set of search patterns based on the second-order systems have been integrated as a complete search strategy to obtain the global solution of complex optimization problems. In this section, the complete metaheuristic method, called Second-Order Algorithm (SOA), is completely described.

The scheme considers four different stages: (A) Initialization, (B) trajectory generation, (C) reset of bad elements, and (D) avoid premature convergence mechanism. The steps (B)–(D) are sequentially executed until a stop criterion has been reached. Figure 5 shows the flowchart of the complete metaheuristic method.

**Figure 5.** Flowchart of the proposed metaheuristic method based on the response of second-order systems.

#### *6.1. Initialization*

In the first iteration *k* = 0, a population **P**<sup>0</sup> of *N* agents **x**0 <sup>1</sup>,..., **<sup>x</sup>**<sup>0</sup> *N* is randomly produced considering to the following equation,

$$\mathbf{x}\_{i,j}^{0} = rand \cdot \left( b\_{j}^{high} - b\_{j}^{low} \right) + b\_{j}^{low} i = 1, \ 2, \ \dots, \ N; \ j = 1, \dots, d \tag{13}$$

where *b high <sup>j</sup>* and *<sup>b</sup>low <sup>j</sup>* are the limits of the *j* decision variable and *rand* is a uniformly distributed random number between [0,1].

To each individual **x***<sup>i</sup>* from the population, it is assigned a vector ζ*<sup>i</sup>* = *ζi*,1, ..., *ζi*,*<sup>d</sup>* whose elements *ζi*,*<sup>j</sup>* determine the trajectory behavior of each *j*-th dimension. Initially, each element *ζi*,*<sup>j</sup>* is set to a random value between [0,2]. Under this interval, all the secondorder behavior are possible: Underdamped (0 < *ζ* < 1), critically damped (*ζ* = 1), and overdamped (*ζ* > 1).

#### *6.2. Trajectory Generation*

Once the population has been initialized, it is obtained the best element of the population **b**. Then, the new position **x***k*+<sup>1</sup> *<sup>i</sup>* of each agent **<sup>x</sup>***<sup>k</sup> <sup>i</sup>* is computed as a trajectory generated by a second-order system. Once all new positions in the population **P***<sup>k</sup>* are determined, it is also defined the best element **b**.

#### *6.3. Reset of Bad Elements*

To each agent **x***<sup>k</sup> <sup>i</sup>* is allowed to move in its own trajectory for ten iterations. After ten iterations, if the search agent **x***<sup>k</sup> <sup>i</sup>* maintains the worst performance in terms of the fitness function, it is reinitialized in both position and in its vector ζ*i*. Under such conditions, the search agent will be in another position and with the ability to perform another kind of trajectory behavior.

#### *6.4. Avoid Premature Convergence Mechanism*

If the percentage of exploration *XPL*% is less than 5%, the best value **b** is replaced by the best virtual value **b***v*. The element **b***<sup>v</sup>* is computed as the averaged value of the best five individuals of the population. The idea behind this mechanism is to identify a new position to generate different trajectories that avoid that the search process gets trapped in a local optimum.

#### **7. Experimental Results**

To evaluate the results of the proposed SOA algorithm, a set of experiments has been conducted. Such results have been compared to those produced by the Artificial Bee Colony (ABC) [9], the Covariance matrix adaptation evolution strategy (CMAES) [4], the Crow Search Algorithm (CSA) [8], the Differential Evolution (DE) [6], the Moth-flame optimization algorithm (MFO) [18] and the Particle Swarm Optimization (PSO) [10], which are considered the most popular metaheuristic schemes in many optimization studies [32].

For the comparison, all methods have been set according to their reported guidelines. Such configurations are described as follows:


In our analysis, the population size *N* has been set to 50 search agents. The maximum iteration number (*Maxgen*) for all functions has been set to 1000. This stop criterion has been decided to keep compatibility with similar works published in the literature [33,34]. To evaluate the results, three different indicators are considered: The Average Best-so-far (**AB**) solution, the Median Best-so-far (**MB**) solution and the Standard Deviation (**SD**) of the best-so-far solutions. In the analysis, each optimization problem is solved using every algorithm 30 times. From this operation, 30 results are produced. From all these values, the mean value of all best-found solutions represents the Average Best-so-far (**AB**) solution. Likewise, the median of all 30 results is computed to generate **MB** and the standard deviation of the 30 data is estimated to obtain **SD** of the best-so-far solutions. Indicators **AB** and **MB** correspond to the accuracy of the solutions, while **SD** their dispersion, and thus, the robustness of the algorithm.

The experimental section is divided into five sub-sections. In the first Section 7.1, the performance of SOA is evaluated with regard to multimodal functions. In the second Section 7.2, the results of the OTSA method in comparison with other similar approaches are analyzed in terms of unimodal functions. In the third Section 7.3, a comparative study among the algorithms examining hybrid functions is accomplished. In the fourth Section 7.4, the ability of all algorithms to converge is analyzed. Finally, in the fifth Section 7.5, the performance of the SOA method to solve the CEC 2017 set of functions is also analyzed.

#### *7.1. Multimodal Functions*

In this sub-section, the SOA approach is evaluated considering 12 unimodal functions (*f*1(**x**)– *f*12(**x**)) reported in Table 1 from Appendix A. Multimodal functions present optimization surfaces that involve multiple local optima. For this reason, these function presents more complications in their solution. In this analysis, the performance of the SOA method is examined in comparison with ABC, CMAES, CSA, DE, MFO and PSO in terms of the multimodal functions. Multimodal objective functions correspond to functions from *f*1(**x**) to *f*12(**x**) in Table 1 from the Appendix A, where the set of local minima augments as the dimension of the function also increases. Therefore, the study exhibits the capacity of each metaheuristic scheme to identify the global optimum when the function contains several local optima. In the experiments, it is assumed objective functions operating in 30 dimensions (*n* = 30). The averaged best (AB) results considering 30 independent executions are exhibit in Table 1. It also reports the median values (MD) and the standard deviations (SD).

**Table 1.** Minimization results of multimodal benchmark functions.



**Table 1.** *Cont.*

According to Table 1, the proposed SOA scheme obtain a better performance than ABC, CMAES, CSA, DE, MFO and PSO in functions *f*1(**x**), *f*4(**x**), *f*5(**x**), *f*6(**x**), *f*8(**x**), *f*9(**x**), *f*10(**x**), *f*11(**x**) and *f*12(**x**). Nevertheless, the results of SOA exhibit similar as the obtained by DE, CMAES and MFO in functions *f*2(**x**), *f*3(**x**) and *f*7(**x**).

To statistically validate the conclusions from Table 1, a non-parametric study is considered. In this test, the Wilcoxon rank-sum analysis [35] is adopted with the objective to validate the performance results. This statistical test evaluates if exists a significant difference when two methods are compared. For this reason, the analysis is performed considering a pairwise comparison such as SOA versus ABC, SOA versus CMAES, SOA versus CSA, SOA versus DE, SOA versus MFO and SOA versus PSO. In the Wilcoxon analysis, a null hypothesis (*H*0) was adopted that showed that there is no significant difference in the results. On the other hand, it is assumed as an alternative hypothesis (*H*1) that the result has a similar structure. For the Wilcoxon analysis, it is assumed a significance value of 0.05 considering 30 independent execution for each test function. Table 2 shows the *p*-values assuming the results of Table 2 (where *n* = 30) produced by the Wilcoxon study. For faster visualization, in the Table, we use the following symbols - , and . The symbol refers that the SOA algorithm produces significantly better solutions than its competitor. symbolizes that SOA obtains worse results than its counterpart. Finally, the symbol denotes that both compared methods produce similar solutions. A close inspection of Table 2 demonstrates that for functions *f*1, *f*4, *f*5, *f*6, *f*8, *f*9, *f*10, *f*<sup>11</sup> and *f*<sup>12</sup> the proposed SOA scheme obtain better solutions than the other methods. On the other hand, for functions *f*2, *f*<sup>2</sup> and *f*7, it is clear that the groups SOA versus ABC, SOA versus CMAES, SOA versus DE and SOA versus MFO and EA-HC versus SCA present similar solutions.

**Table 2.** Wilcoxon analysis for multimodal benchmark functions.


#### *7.2. Unimodal Functions*

In this subsection, the performance of SOA is compared with ABC, DE, DE, CMAES CSA and MFO, considering four unimodal functions with only one optimum. Such functions are represented by functions from *f*13(**x**) to *f*16(**x**) in Table 1. In the test, all functions are considered in 30 dimensions (*d* = 30). The experimental results, obtained from 30 independent executions, are presented in Table 3. They report the results in terms of **AB**, **MB** and **SD** obtained in the executions. According to Table 3, the SOA approach provides better performance than ABC, DE, DE, CMAES CSA and MFO for all functions. In general, this study demonstrates big differences in performance among the metaheuristic scheme, which is directly related to a better trade-off between exploration and exploitation produced by the trajectories of the SOA scheme. Considering the information from Table 3, Table 4 reports the results of the Wilcoxon analysis. An inspection of the *p*-values from Table 4, it is clear that the proposed SOA method presents a superior performance than each metaheuristic algorithm considered in the experimental study.

**Table 3.** Minimization results of unimodal benchmark functions.


**Table 4.** Wilcoxon analysis for unimodal benchmark functions.


#### *7.3. Hybrid Functions*

In this study, hybrid functions are used to evaluate the optimization solutions of the SOA scheme. Hybrid functions refer to multimodal optimization problems produced by the combination of several multimodal functions. These functions correspond to the formulations from *f*17(**x**) to *f*20(**x**), which are shown in Table 1 in Appendix A. In the experiments, the performance of our proposed SOA approach has been compared with other metaheuristic schemes.

The simulation results are reported in Table 5. It exhibits the performance of each algorithm in terms of **AB**, **MB** and **SD**. From Table 5, it can be observed that the SOA method presents a superior performance than the other techniques in all functions. Table 6 reports the results of the Wilcoxon analysis assuming the index of the Average Best-so-far (**AB**) values of Table 5. Since all elements present the symbol -, they validate that the proposed SOA method produces better results than the other methods. The remarkable performance of the proposed SOA scheme for hybrid functions is attributed to a better balance between exploration and exploitation of its operators provoked by the properties of the second system trajectories. This denotes that the SOA approach generates an appropriate number of promising search agents that allow an adequate exploration of the search space. On the other hand, a balanced number of candidate solutions is also produced that make it possible to improve the quality of the already-detected solutions, in terms of the objective function.


**Table 5.** Minimization results of hybrid benchmark functions.

**Table 6.** Wilcoxon analysis for hybrid benchmark functions.


*7.4. Convergence Analysis*

The evaluation of accuracy in the final solution cannot completely assess the abilities of an optimization algorithm. On the other hand, the convergence of a metaheuristic scheme represents an important property to assess its performance. This analysis determined the velocity, which determined metaheuristic scheme reaches the optimum solution. In this subsection, a convergence study has been carried out. In the comparisons, for the sake of space, the performance of the best four metaheuristic schemes is considered adopting a representative set of six functions (two multimodal, two unimodal and two hybrids), operated in 30 dimensions. To generate the convergence graphs, the raw simulation data produced in the different experiments was processed. Since each simulation is executed 30 times for each metaheuristic method, the convergence data of the execution corresponds to the median result. Figures 6–8 show the convergence graphs for the four best-performing metaheuristic methods. A close inspection of Figure 6 demonstrates that the proposed SOA scheme presents a better convergence than the other algorithms for all functions.

**Figure 6.** Convergence graphs in two representative multimodal-functions.

**Figure 7.** Convergence graphs in two representative unimodal functions.

**Figure 8.** Convergence graphs in two representative hybrid functions.

#### *7.5. Performance Evaluation with CEC 2017*

In this sub-section, the performance of the SOA method to solve the CEC 2017 set of functions is also analyzed. The set of functions from the CEC2017 [36] represents one the most elaborated platform for benchmarking and comparing search strategies for numerical optimization. The CEC2017 benchmarks correspond to a test environment of 30 different functions with distinct features. They will be identified from *F*1(**x**) to *F*30(**x**). Most of these functions are similar to those exhibited in Appendix A, but with different translations and/or rotations effects. The average obtained results, corresponding to 30 independent executions, are re-registered in Table 7. The results are reported in terms of the performance indexes: Average Best fitness (AB), Median Best fitness (MB), and the standard deviation of the best finesses (SD).

**Table 7.** Optimization results from benchmark functions of CEC2017.



**Table 7.** *Cont.*

According to Table 7, SOA provides better results than ABC, CMAES, CSA, DE, MFO and PSO for almost all functions. A close inspection of this table reveals that the SOA scheme attained the best performance level, obtaining the best results in 22 functions from the CEC2017 function set. Likewise, the CMAES presents second place, in terms of most of the performance indexes, while DE and CSA techniques reach the third category with performance slightly minor. On the other hand, the MFO and PSO methods produce the worst results. In particular, the results show considerable precision differences, which are directly related to the different search patterns presented by each metaheuristic algorithm. This demonstrates that the search patterns, produced by second-order systems, are able to provide excellent search patterns. The results show good performance of the proposed SOA method in terms of accuracy and high convergence rates.

#### **8. Analysis and Discussion**

The extensive experiments, performed in previous sections, demonstrate the remarkable characteristics of the proposed SOA algorithm. The experiments included not only standard benchmark functions but also the complex set of optimization functions from CEC2017. In both sets of functions, they have been solved in 30 dimensions. Therefore, a total of 50 optimization problems were employed to comparatively evaluate the performance of SOA with other popular metaheuristic approaches, such as ABC, CMAES, CSA, DE, MFO and PSO. From the experiments, important information has been obtained by observing the end-results, in terms of the mean and standard deviations found over a certain number of runs or convergence graphs, but also in-depth search behavioral evidence in the form of exploration and exploitation measurements were also used.

The generation of efficient search patterns for the correct exploration of a fitness landscape could be complicated, particularly in the presence of ruggedness and multiple local optima. A search pattern is a set of movements produced by a rule or model that is used to examine promising solutions from the search space. Exploration and exploitation correspond to the most important characteristics of a search strategy. The combination of both mechanisms in a search pattern is crucial for attaining success when solving a particular optimization problem.

In our approach, the temporal response of second-order system is used to generate the trajectory from the position of **x***<sup>k</sup> <sup>i</sup>* = *xk <sup>i</sup>*,1,..., *<sup>x</sup><sup>k</sup> i*,*d* to the location of **b** = {*b*1,..., *bd*}. Three different search patterns have been considered based on the second-order system responses. The proposed search patterns can explore areas of considerable size by using a high rate of velocity and the same time, refining the solution of the best individual **b** by the exploitation of its location. This behavior represents the most important property of the proposed search patterns. According to the results provided by the experiments, the search patterns produce more complex trajectories that allow a better examination of the search space.

Similar to other metaheuristic methods, SOA tries to improve its solutions based on its interaction with the objective function or on a 'trial and error' scheme through defined stochastic processes. Different from other popular metaheuristic methods such as DE, ABC, GA or CMAES, our proposed approach uses search patterns represented by trajectories to explore and exploit the search space. Since SOA employs search patterns, it presents more similarities with algorithms such as CSA, MFO and GWO. However, the search patterns used in their search strategy are very different. While CSA, MFO and GWO consider only spiral patterns, our proposed method uses complex trajectories produced by the response of second-order systems.

#### **9. Conclusions**

A search pattern is a set of movements produced by a rule or model, in order to examine promising solutions from the search space. In this paper, a set of new search patterns are introduced to explore the search space efficiently. They emulate the response of a second-order system. Under such conditions, it is considered three different responses of a second-order system to produce three distinct search patterns, such as underdamped, critically damped and overdamped. These proposed set of search patterns have been integrated as a complete search strategy, called Second-Order Algorithm (SOA), to obtain the global solution of complex optimization problems.

The form of the search patterns allows for balancing the exploration and exploitation abilities by efficiently traversing the search-space and avoiding suboptimal regions. The efficiency of the proposed SOA has been evaluated through 20 standard benchmark functions and 30 functions of CEC2017 test-suite. The results over multimodal functions show remarkable exploration capabilities, while the result over unimodal test functions denotes adequate exploitation of the search space. On hybrid functions, the results demonstrate the effectivity of the search patterns on more difficult formulations. The search efficacy of the proposed approach is also analyzed in terms of the Wilcoxon test results and convergence

curves. In order to compare the performance of the SOA scheme, many other popular optimization techniques such as the Artificial Bee Colony (ABC), the Covariance matrix adaptation evolution strategy (CMAES), the Crow Search Algorithm (CSA), the Differential Evolution (DE), the Moth-flame optimization algorithm (MFO) and the Particle Swarm Optimization (PSO), have also been tested on the same experimental environment. Future research directions include topics such as multi-objective capabilities, incorporating chaotic maps and include acceleration process to solve other real-scale optimization problems.

**Author Contributions:** Conceptualization, E.C. and M.J.; Data curation, M.P.; Formal analysis, H.E.; Funding acquisition, M.J.; Investigation, H.B. and H.F.E.; Methodology, H.B., A.L.-C. and H.F.E.; Resources, H.E. and M.J.; Supervision, A.L.-C.; Visualization, M.P.; Writing—original draft, E.C. and H.F.E. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

**Table 1.** List of Benchmark Functions.



**Table 1.** *Cont.*

#### **References**


## *Article* **A Feature-Independent Hyper-Heuristic Approach for Solving the Knapsack Problem**

**Xavier Sánchez-Díaz, José Carlos Ortiz-Bayliss \*, Ivan Amaya, Jorge M. Cruz-Duarte, Santiago Enrique Conant-Pablos and Hugo Terashima-Marín**

> Tecnologico de Monterrey, School of Engineering and Sciences, Monterrey 64849, Mexico; sax@tec.mx (X.S.-D.); iamaya2@tec.mx (I.A.); jorge.cruz@tec.mx (J.M.C.-D.); sconant@tec.mx (S.E.C.-P.); terashima@tec.mx (H.T.-M.)

**\*** Correspondence: jcobayliss@tec.mx

**Abstract:** Recent years have witnessed a growing interest in automatic learning mechanisms and applications. The concept of hyper-heuristics, algorithms that either select among existing algorithms or generate new ones, holds high relevance in this matter. Current research suggests that, under certain circumstances, hyper-heuristics outperform single heuristics when evaluated in isolation. When hyper-heuristics are selected among existing algorithms, they map problem states into suitable solvers. Unfortunately, identifying the features that accurately describe the problem state—and thus allow for a proper mapping—requires plenty of domain-specific knowledge, which is not always available. This work proposes a simple yet effective hyper-heuristic model that does not rely on problem features to produce such a mapping. The model defines a fixed sequence of heuristics that improves the solving process of knapsack problems. This research comprises an analysis of feature-independent hyper-heuristic performance under different learning conditions and different problem sets.

**Keywords:** hyper-heuristics; knapsack problem; optimization

#### **1. Introduction**

The intersection between combinatorial optimization and Hyper-Heuristics (HHs) is a relevant and active area in literature, as Sánchez et al. detailed with their thorough systematic review [1]. The former considers optimization problems where a permutation of feasible values gives candidate solutions. The hardness of these problems could be easy or hard to solve, depending on different parameters. Among them resides a category known as NP-hard problems, for which analytical solvers cannot be scaled due to an excessive computational burden. Therefore, approximate solvers are commonly sought for NP-hard problems, including those based on heuristics. It is here where HHs shine bright. This approach has been deemed as high-level heuristics useful to tackle hard-to-solve problems [2], particularly NP-hard ones [3]. A classification of HHs considers two main groups: selection HHs (those that select heuristics from an available set) and generation HHs (those that create new heuristics using components of existing ones) [4]. Although both groups have proved of great interest to the scientific community, in this work we focus on the former.

Selection HHs deal with problems indirectly. They browse a set of available heuristics, which are selectively applied to solve the problem at hand [5]. A selection HH analyzes a set of available heuristics and chooses the most suitable one according to a given performance metric. Most of the current selection HH models include two key phases: heuristic selection and move acceptance [6]. The former represents the strategy for deciding which heuristic should be selected. Conversely, the latter determines whether the new solution is accepted or discarded. The approach proposed in this work simplifies the overall model by only focusing on heuristic selection. Thus, changes resulting from applying a particular heuristic are always accepted.

**Citation:** Sánchez-Díaz, X. Ortiz-Bayliss, J.C.; Amaya, I.; Cruz-Duarte, J.M.; Conant-Pablos, S.E.; Terashima-Marín, H. A Feature-Independent Hyper-Heuristic Approach for Solving the Knapsack Problem. *Appl. Sci.* **2021**, *11*, 10209. https://doi.org/10.3390/app112110209

Academic Editor:Peng-Yeng Yin

Received: 24 September 2021 Accepted: 26 October 2021 Published: 31 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Evolutionary computation is a recurrent approach among the many learning methods employed in the literature to produce HHs. Some examples include, but are not limited to, Genetic Programming (GP) [5,7–9], Grammatical Evolution (GE) [10], Genetic Algorithms (GA) [11,12], and Artificial Immune Systems (AIS) [13,14]. The literature contains other related proposals, such as those that evolve HHs by analyzing the set of problem instances [15]. These findings support the idea of using an evolutionary strategy as a learning mechanism for HHs, one which improves their performance via crossover or mutation. This work focuses on the latter.

HHs have been used to tackle packing problems in the past. Hart and Sim describe a variant of AIS used as a hybrid HH for the Bin Packing Problem [13,14,16]. Falkenauer [11] proposed a GA-based HH model, and Burke et al. [17], Hyde [8], and Drake et al. [9,18,19] have studied GP rules for the Bin Packing and Multidimensional Knapsack problems. More recent studies of HHs have been conducted on the binary knapsack domain using Ant Colony Optimization [20] and Quartile-related Information [21]. Further information about the state-of-the-art of HHs and their applications are provided in [4,22].

As mentioned above, several combinatorial optimization problems have been tackled with HHs [1]. This paper focuses on the Knapsack Problem (KP), which has been studied in great depth due to its simple structure and broad applicability. Example applications include cargo loading, cutting stock, allocation, and cryptography [23]. When a constructive approach is used to solve this problem, the solution is built one step at a time by deciding if one particular item must be packed or ignored. For simplicity, our setting states that once the process chooses an item, there is no way to change such a decision. Using a constructive approach leads to different subproblems throughout the solving process, depending on the heuristic used. This happens because packing an item produces an instance with a reduced knapsack capacity (the previous knapsack capacity minus the weight of the packed item) and a reduced list of items (the item recently packed or ignored is no longer part of the items to pack). This behavior raises the question of which heuristic, among the different options, should be used to maximize the overall profit resulting from the items contained in the knapsack. The literature usually refers to the problem of selecting the best algorithm for a particular situation as the Algorithm Selection Problem [24].

Despite the extensive use of selection HHs [4,25], only a few works explore the insights of their behavior. A few examples include a run-time analysis [26,27], the use and control of crossover operators [19], and heuristic interaction when applied to constraint satisfaction problems [28]. Furthermore, traditional selection HH models, that represent the relation between problem states and heuristics through condition, action rules, exhibit a significant limitation: they require the definition of a set of features to characterize the problem state [29]. Finding such a set of features implies an additional layer of complexity to the model. To the best of our knowledge, there is only one work on feature-independent HHs for the knapsack problem, which obtained only a few preliminary results [30]. Therefore, our work aims at filling this knowledge gap through two particular contributions:


The remainder of this document is organized as follows. Section 2 defines the fundamental ideas associated with our work. Section 3 describes the HH model and the rationale behind it. Section 4 details the experiments conducted and analyzes the results. Finally, Section 5 presents the conclusions and provides an overview of future directions for this investigation.

#### **2. Background**

This section introduces several basic concepts utilized in this work. It starts by describing the fundamental Knapsack Problem (KP). Then, it presents heuristics in general. Later on, it explains the basic foundations of Evolutionary Algorithms (EAs).

#### *2.1. Knapsack Problem*

In layman's terms, a knapsack problem seeks to store a set of items into a knapsack with limited capacity. Therefore, one must choose a subset of the items based on some criteria. For example, the profit or weight to select the items that must go into the knapsack. In formal terms, let *<sup>x</sup>* <sup>∈</sup> <sup>Z</sup>*<sup>n</sup>* <sup>2</sup> be a binary-valued vector of *n* items, where each element *xi* ∈ *x* represents an item and its selection according to its position and value, respectively, for example, if there is a stock of three items and we choose the second one, we have *x* = (0, 1, 0). Likewise, let *<sup>p</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* <sup>+</sup> and *w*- <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* <sup>+</sup> be the profit and weight vectors directly related to the items, i.e., the *i*-th item has profit *pi* ∈ *p* and weight *wi* ∈ *w*- . Thus, this problem (widely known as the 0/1 Knapsack Problem) can be defined as

$$\begin{aligned} \vec{x}\_{\*} &= \underset{\vec{\vec{x}}}{\text{argmax}} \{ \vec{p} \cdot \vec{x} \}, \\ \text{s.t.} \quad \vec{w} \cdot \vec{x} &\le c, \\ \text{with} \quad \vec{x} \in \mathbb{Z}\_{2}^{n} \quad \text{and} \quad \vec{p} \; \vec{w} \in \mathbb{R}\_{+}^{n} \end{aligned} \tag{1}$$

where *<sup>c</sup>* <sup>∈</sup> <sup>R</sup><sup>+</sup> is the total capacity of the knapsack.

Although many solving strategies can be found in the literature, the initial solvers for the knapsack problem were initially divided into two categories: exact methods and approximate ones. Exact methods provide optimal solutions but have limited applications due to their consumption of computational resources [31,32]. Conversely, approximated methods make some assumptions to simplify the solving process to give usually suboptimal solutions [33].

However, the advance in computing power and creativity has lead to extensive diversity of techniques that can be found nowadays, most of them hybrids. To provide a glance at such diversity, we mention some representative works. For example, Genetic Algorithms (GA) and Particle Swarm Optimization (PSO) have been used for solving the multidimensional KP [34,35]. Gómez et al. proposed a Binary Particle Swarm strategy with a genetic operator, also for the multidimensional KP [36]. Another example of recent solving methods includes the one by Razavi and Sajedi, where the authors proposed using Gravitational Search (GSA) for solving the 0/1 KP [37]. Inspired by Quantum Computing, Patvardhan et al. proposed a novel method to solve the KP by using a Quantum-Inspired Evolutionary Algorithm [38]. Modifications to apparently simple ideas can also be found nowadays. For example, Lu et al. solve the 0/1 KP by using Greedy Degree and Expectation Efficiency (GDEE) [39]. Some probabilistic models include Cohort Intelligence (CI) [40] and hybrid heuristics. For example, by using Mean Field Theory [41], Banda et al. generated a probabilistic model capable of replacing a complex distribution of items with an easier one.

The literature on the KP includes a wide variety of solution methods. For a thorough treatment, we refer the reader to the textbooks found in [42,43]. Broadly speaking, the solution methods are of three types: exact methods, heuristic methods, and approximation algorithms (which are special heuristic methods that yield solutions of a guaranteed quality). The best exact methods are now capable of solving instances with thousands of items to proven optimality in reasonable computing times [32,44–46]. However, it is possible to devise instances that are very challenging for exact methods [45]. For this reason, heuristics remain of interest.

#### *2.2. Heuristics*

A heuristic is a procedure that creates or modifies a candidate solution for a given problem instance. There are many classifications of heuristics in the literature. Most of them relate to combinatorial optimization domains [47]. In this work, we use the term 'heuristic' to refer to the low-level operations to apply to a problem instance [4]. An illustrative example is a specific way to organize a knapsack based on item size. In mathematical terms, a heuristic *h* : X → X applied to an instance problem X, may use the current candidate solution *x* ∈ X, and delivers a new candidate *x* as shown,

$$
\vec{\mathfrak{x}}' = h\{\vec{\mathfrak{x}}\}.\tag{2}
$$

Consider that if *x* = ∅, the heuristic creates a candidate solution; otherwise, it modifies the current one.


A HH is described as a high-level strategy that controls heuristics in the process of solving an optimization problem [47]. Therefore, HHs move in the heuristic space to find a heuristic configuration that solves a given problem. With that in mind, a HH can be defined as follows [4]. Let *<sup>h</sup>* <sup>∈</sup> <sup>H</sup> be a heuristic sequence from the heuristic space <sup>H</sup>, and let *<sup>F</sup>*(*h*|X) : <sup>H</sup> <sup>×</sup> <sup>X</sup> <sup>→</sup> <sup>R</sup> be its performance measure function. Here, <sup>X</sup> is the problem domain in an optimization problem with an objective function *f*(*<sup>x</sup>*) : <sup>X</sup> <sup>→</sup> <sup>R</sup>. For the particular case of the knapsack problem, X = Z*<sup>n</sup>* <sup>2</sup> . Then, a solution *x*<sup>∗</sup> ∈ X and its corresponding fitness value *f*(*x*∗) are found when a *h* is applied on X, so its performance *F*(*h*|X) can also be determined. Therefore, let a HH be a technique that solves

$$(\mathsf{h}\_{\*}; \vec{x}\_{\*}) = \underset{\substack{\mathsf{m} \in \mathsf{X}^{\otimes}, \vec{x} \in \mathsf{X}}}{\text{argmax}} \, \{ F(\mathsf{h}|\mathfrak{X}) \}. \tag{3}$$

In other words, a HH searches for the optimal heuristic configuration *h*<sup>∗</sup> that produces the optimal solution *x*<sup>∗</sup> with the maximal performance *F*(*h*∗|X).

It is essential to mention that there also exists something we might categorize as mid-level heuristics: Metaheuristics (MHs). These methods are trendy because of their proven performance on different scenarios and applications [48]. MHs are defined as master strategies that control simple heuristics [49]. Therefore, by finding a middle point between definitions for low and high-level heuristics, it is possible to say that an MH corresponds to a heuristic sequence *h* applied recurrently until reaching the desired performance. This idea has been extended by Cruz-Duarte et al. [50,51]. Thus, for a given optimization problem, we can define the process of approximating the optimal solution *x*<sup>∗</sup> ∈ X via an MH such as

$$\vec{\mathfrak{X}}\_{\*} \stackrel{\scriptstyle \Delta}{=} \coprod\_{h\_{k} \in \mathbf{h}} h\_{k} \{ \mathfrak{X} \} \tag{4}$$

where *hf* is also an operator to evaluate stopping criteria, and И is the iteral operator based on that in [52]. This operator indicates a recurrent application of heuristics from a sequence *h* until *hf* marks the halt. For example, for a MH defined with two operations ( = 2), say crossover (*h*1) and mutation (*h*2), thence *h* = {*h*1, *h*2}. Consider *hf* as a common fitness tolerance criterion such as *f*(*x*) ≤ *ε*. Therefore, the metaheuristic applies first *h*1, followed by *h*2, and then it checks the condition given by *hf* . If such a condition is not met, the process is repeated until it complies.

#### *2.3. Evolutionary Algorithms (EAs)*

EAs are a particular subgroup of the population-based metaheuristics inspired by evolution and biological processes observed in nature [53]. The most notable examples of these methods are Differential Evolution (DE) [54] and Genetic Algorithms (GAs) [55]. In a general sense, the individuals of an EA interact with each other to explore the problem domain and exploit the candidate solutions, i.e., they evolve. Such evolution is possible due to operators such as selection, crossover, mutation, and reproduction. It is easy to notice that these operators are just (low-level) heuristics. In this work, we chiefly employ mutation-based operations, which are detailed in the next section.

#### **3. Proposed Hyper-Heuristic Model**

The HH model developed in this work can be classified as an offline selection HH [56]. Internally, the HH model relies on a variant of the (1 + 1) Evolutionary Algorithm (EA) to find the sequence of heuristics to apply. The original EA flipped each bit with probability 1/ (where is the chromosome length). Conversely, our approach chooses one among various available mutation operators based on a uniform random probability distribution. This EA implementation considers some of the features used by Lehre and Özcan [27]. However, the set of available operators is different as we work with constructive heuristics while they have used perturbative ones.

We choose four simple packing heuristics due to their popularity: H = {Def, MaxP, MaxPW, MinW}. Before moving forward, we describe them below.


Should there be a tie, all heuristics will choose the first conflicting item. Among these heuristics, Def is the fastest one to execute as it involves no additional operations. On the contrary, MaxP, MaxPW, and MinW take longer to compute but usually yield better results than using no order at all (Def). We are aware that more complex heuristics are available in the literature. Still, at this point in the investigation, we consider it suitable to test the proposed HH approach on this set of simple heuristics.

In our model, each HH represents a sequence of heuristics to apply to the problem in one specific order. For simplicity, we refer to its length as the cardinality of the HH. Let us consider the HH with cardinality of five (*<sup>h</sup>* ∈ <sup>H</sup>5) given by *<sup>h</sup>* = {Def, MaxP, MaxP, Def, MinW} (Figure 1). Please note that one of the available heuristics is not contained within the HH, MaxPW. In this HH, Def occupies positions 0 and 3, MaxP occupies positions 1 and 2, and MinW occupies position 4. Let us assume that *h* will solve a 12-item knapsack instance. In this case, there are 12 decisions to be made for solving this instance (*d*<sup>1</sup> to *d*12). While some of the decisions will add an item to the knapsack, others will discard it. As the HH represents a sequence of five heuristics, the first five decisions will be made in the same order in which the heuristics are presented in *h*: Def, MaxP, MaxP, Def, and MinW. After that, there are no more heuristics to choose to make further decisions. Then, the HH is used again, but now backwards: MinW, Def, MaxP, MaxP, and Def. The process repeats by inverting the sequence every time we reach the end of the HH. Although this scheme seems disruptive, we consider it feasible to favor the changes in heuristics throughout the solving process.

**Figure 1.** An example of a hyper-heuristic with a cardinality of five, and the way it solves an instance with 12 items. Black arrows indicate the direction in which heuristics are selected to make a decision.

#### *3.1. Hyper-Heuristic Training*

The learning mechanism within the HH model works as follows. The HH, *<sup>h</sup>* <sup>∈</sup> <sup>H</sup>, is randomly initialized (with a sequence of randomly selected heuristics), and it is used to solve the set of training instances. We register the profit of the solution obtained by *h* for each instance. The performance of the HH, *F*(*h*, X), is then calculated as the average of all the profits obtained for the training set. In our EA implementation, a chromosome represents a HH that codes a sequence of heuristics to apply, i.e., *h*. At each iteration, the process randomly chooses one mutation operator *om* from the available operators O. To do so, a uniform probability distribution is employed. Thus, EA applies it on a copy of the current HH, which produces a candidate HH, according to (2). The model evaluates the candidate HH on the set of training instances. If its performance is greater or equal (to favor diversity) than the current HH, this candidate takes its place. The process is repeated until it reaches the maximum number of allowed iterations *I*max. Pseudocode 1 details this training procedure.

The main goal of the learning mechanism is to produce a HH (an ordered sequence of heuristics) that maximizes the profit of an instance by packing appropriate items into the knapsack. In other words, it solves (1). Thus, the EA does not operate on the solution space X, but on the heuristic space H.

#### **Pseudocode 1** Hyper-heuristic trainer **input:** Set of heuristics H, Set of mutation operators O, Set of instances to train the hyper-heuristic X, Initial cardinality , Performance function *F*(*h*, X), and Maximum number of iterations *I*max **output:** Best selection hyper-heuristic *h*<sup>∗</sup> 1: *h* ← INITIALIZERANDOMLY(H, ) 2: *Fh* ← *F*(*h*, X) Evaluate the current HH 3: **for** i=0 **to** *I*max **do** 4: *h* ←GENERATECANDIDATE(*h*) 5: *Fh* ← *F*(*h* , X) Evaluate the candidate HH 6: **if** *Fh* ≥ *Fh* **then** 7: *h* ← *h* and *Fh* ← *Fh* 8: **end if** 9: **end for** 10: **return** *h*<sup>∗</sup> ← *h* 11: **procedure** GENERATECANDIDATE(*h*) 12: *h* ← CLONE(*h*) 13: *om* ← CHOOSERANDOMLY(O) 14: *h* ← *om*{*h* } Mutate *h* by applying *om* 15: **return** *h* 16: **end procedure**

#### *3.2. Mutation Operators*

The evolutionary process dynamically chooses among eight available mutation operators to alter the candidate hyper-heuristic *h* . Operators contained within set O are described below. To clarify the behavior of such operators, we also provide a brief example of their behavior. Bear in mind that, in all the examples, we always consider the HH depicted in Figure 1 as the HH to mutate, i.e., *h* = {Def, MaxP, MaxP, Def, MinW}.


are valid positions. Should *i* correspond to an edge, the only neighbor is copied. Let us suppose that *i* = 1. In this case, the heuristic to insert would be either Def (*j* = 0) or MaxP (*k* = 2), with the same probability (50%). Imagine that MaxP is selected. Then, and as with the Add operator, cardinality grows to six and the resulting HH is *h* = {Def, MaxP, MaxP, MaxP, Def, MinW}.


Note that neighbor-based operators are allowed to replace the corresponding heuristic with the same one since it is determined by the neighbors. Certainly, other operators can be used but, for the sake of simplicity, we limit ourselves to these eight to formulate O. Finally, consider that there is only one operator that reduces cardinality (Remove), while two of them increase it (Add and Neighbor-based Add), and the remaining five preserve it (Single-point Flip, Two-point Flip, Single-point Neighbor-based Flip, Two-point Neighbor-based Flip, and Swap). Therefore, it is rather expected that trained HHs exhibit a higher cardinality than untrained ones. Because of this, a HH may end up choosing a different amount of items with each heuristic. We believe such flexibility in the learning process may favor more complex interactions between the available heuristics.

#### *3.3. The Knapsack Problem Instances*

In this investigation, we consider four datasets: *α*1, *α*2, *β*1, and *β*2. Dataset *α*<sup>1</sup> is used for training purposes and comprises 400 knapsack problem instances. We synthetically generated such instances for this work by using the algorithm proposed by Plata et al. [57]. This dataset contains a balanced mixture of problem instances, where each heuristic has an equal probability of performing best. Therefore, we are sure that no single heuristic outperforms others on all instances. We replicated the process to produce 400 additional problem instances for testing purposes (dataset *α*2). These two datasets consider instances of 50 items and a knapsack capacity of 20 units. Dataset *β*<sup>1</sup> contains 600 hard instances proposed by Pisinger [45] and featuring 20 items but at different capacities. Finally, dataset *β*<sup>2</sup> consists of 600 additional hard instances (also proposed by Pisinger [45]). In this case, instances have 50 items and, again, exhibit different capacities. A summary of the datasets is provided in Table 1. Bear in mind that for the confirmatory experiments, we analyze the effect of changing the training dataset, but we shall discuss it in more detail further on.


**Table 1.** Datasets used in this work.

#### *3.4. Performance Metrics*

We evaluate performance by considering both absolute and relative performances. All approaches are evaluated based on the profits obtained by their solutions (the sum of the profits of the items packed within the knapsack). The overall performance of a method on a particular set is calculated as the sum of the profits of the solutions produced for all the instances in such a set. However, it is also useful to estimate the relative performance, i.e., w.r.t. the other methods. For this purpose, we also include a performance ranking and the success rate (*ρ*).

The performance ranking is calculated as follows. All methods solve every instance in the set. Then, for each instance, the methods are sorted based on the profit of their solutions. The best one receives a ranking of 1, the second one a ranking of 2, and so on. In the case of ties, the ranking is the average of tied positions. This metric is helpful to identify ties that indicate a similar performance of two or more methods. Conversely, the success rate is calculated as the percentage of instances in a set where a particular method reaches or surpasses a threshold. In this investigation, such a threshold is given per instance and corresponds to the best result among those achieved by the heuristics, i.e., the best profit we can get from using each heuristic individually. For the sake of clarity, we present the success rate as a vector of two components: one where the threshold is achieved and one where it is surpassed. Bear in mind that the second component in the vector can only be achieved by mixing heuristics throughout the solving process, and so it only applies to HHs.

#### **4. Experiments and Results**

We organized our experiments in three stages: preliminary, exploratory, and confirmatory experiments. For the sake of clarity, these stages are further divided into categories. Figure 2 provides an overview of our experimental methodology. In the following lines, we carefully describe each stage, the corresponding experiments, and the results we obtained.

#### *4.1. Preliminary Experiments*

The preliminary experiments were designed first to justify the need for an intelligent way to combine the heuristics (the HHs) and, second, to determine a suitable set of parameters for running the EA to produce such HHs.

#### 4.1.1. Preliminary Experiment I

The first experiment conducted in this work aims to prove that we need an intelligent approach to define the sequences of heuristics. In other words, we justify that it is unlikely for a random sequence of heuristics to achieve competent results. For this purpose, we generated 30 random HHs. The length of these HHs was set to 16 heuristics for empirical reasons. As in this experiment, we were only interested in the behavior of randomly generated HHs, the EA was not used to improve the initial HHs. Then, we used the 30 randomly generated HHs to solve set *α*2, and we compared their results against the ones obtained by the available heuristics (see Section 3 for more details). For the sake of clarity, our comparisons include the results of three representative HHs from the perspective of total profit on set *α*2: the best, the median, and the worst hyper-heuristics.

**Figure 2.** Three-stage methodology followed in this work.

One way to analyze the performance of the methods relies on ranking the resulting data. Table 2 shows the ranks obtained by the four heuristics, as well as the three representative HHs, when solving set *α*2. As one may observe, in most of the cases, the heuristics obtain the best result in isolation (ranks close to 1), as it was expected as set *α*<sup>2</sup> is balanced. Although the best HH never occupies the last positions in the rank, the median performance hints that the good behavior of Best-HH is more likely to be due to chance. Furthermore, by randomly generating the sequences of heuristics, we risk producing something as bad as Worst-HH, where it replicated the worst heuristic for this set, i.e., Def.

Table 2 also shows the total profit obtained for each method on set *α*2, which serves as evidence that it is indeed possible to produce one good HH at random, such that it outperforms several heuristics. However, this seems unlikely to happen by chance since at least half the HHs perform worse than MinW.

Regarding the success rate, the results confirm that generating sequences of heuristics at random is not a good idea. The success rate of the best randomly generated HH was (0.0500, 0.0025), which means that only in 5% of the instances in set *α*<sup>2</sup> this HH produced a solution as competent as the best one among the four heuristics, while in 0.25% of the instances for the same set, the HH improved upon this result.

#### 4.1.2. Preliminary Experiment II

Before moving further in this investigation, we needed to fix the number of iterations for the EA. Therefore, we generated 30 HHs by running the EA for 1000 iterations each time. In all the cases, the initial HH contained sequences of 12 heuristics, and we used *α*<sup>1</sup> as the training set. This time, the cardinality of the HHs was reduced by 1/4 concerning the previous experiment since the mutation operators allow shortening and extending the sequences. Then, we no longer need long initial sequences as with the first preliminary experiment. For each HH, we recorded the Stagnation Point (SP). We define SP as the iteration at which the best solution was first encountered and for which the profit showed no improvement in subsequent iterations. Table 3 shows the stagnation points of the 30 runs of the EA.


**Table 2.** Ranks and total profit obtained by the four heuristics and the best, median, and worst randomly generated hyper-heuristics when tested on set *α*2. The best result, measured in terms of total profit, is highlighted in bold.

**Table 3.** Stagnation points for the first 30 runs of the EA-based hyper-heuristic model.


We used these stagnation points to estimate the maximum number of iterations for running the EA. The average stagnation point among the 30 runs was 150.1, so we rounded it down to 150 iterations. Fifty additional iterations were added just as a precaution to minimize the risk of not reaching a good result. Thus, we set the maximum number of iterations to 200 for the rest of the experiments.

#### *4.2. Exploratory Experiments*

The exploratory experiments comprise a series of tests that cover general aspects of the proposed model, particularly those regarding how it copes with single heuristics on the balanced instance sets (sets *α*<sup>1</sup> and *α*2).

#### 4.2.1. Exploratory Experiment I

In this experiment, the rationale was to test the performance of HHs on similar instances to those they were trained on. Therefore, we produced 30 HHs with an initial cardinality of 12 heuristics each, as in preliminary experiment 2 (Section 4.1.2). Each HH was trained using set *α*<sup>1</sup> for 200 iterations. Afterward, we tested the resulting HHs on set *α*<sup>2</sup> and we compared the data against those obtained by heuristics applied in isolation.

Table 4 presents the ranking of the four heuristics as well as the best, median, and worst HHs produced in this experiment and tested on set *α*2. This table also shows the total profit obtained by each of these methods on the same set. Based on the results obtained (both on ranks and total profit), we observe that the process is forcing the HHs to replicate the behavior of the best performing heuristic for these types of instances (MaxPW). This also means that the HHs are, most of the time, ignoring the remaining heuristics. Although this may seem like a good choice as MaxPW is the best individual performer, the HHs are sacrificing the opportunity to improve their overall performance and outperform the best individual heuristic.

**Table 4.** Ranks and total profit obtained by the four heuristics and the best, median, and worst hyper-heuristics trained on set *α*<sup>1</sup> and tested on set *α*2. The best result, measured in terms of total profit, is highlighted in bold.


#### 4.2.2. Exploratory Experiment 2

In the previous experiment, the EA forced the HHs into replicating the behavior of one heuristic, MaxPW. In this experiment, we try to overcome this situation by reducing the number of instances in the training set. Then, for this experiment, 30 new HHs were produced, but this time, only 60% of the instances in set *α*<sup>1</sup> were used for training purposes. These 60% instances were randomly selected once and used for producing the 30 HHs. As in the previous experiment, the maximum number of iterations for the EA was set to 200 and all the HHs were initialized by using a random sequence of 12 heuristics. We used the 30 HHs to solve set *α*<sup>2</sup> and summarized the results through the rankings and total profit (Table 5).

Based on the results shown in Table 5, we observe a small improvement in Best-HH with respect to MaxPW. Although this supports our initial idea that we can improve the results obtained by the heuristics with a hyper-heuristic, the improvement is rather small and insignificant in practice. Furthermore, the success rate of Best-HH is (0.285, 0.0), which is exactly the same as MaxPW.

For a more in-depth analysis about the performance of the best HH in this experiment, we plotted the performance of Best-HH, as well as of the best and worst heuristic for each particular instance in set *α*2. For clarity, the results are sorted (from larger to smaller) on the profit obtained by Best-HH on each instance in the set. Figure 3 depicts the profit of each method across all instances. As we can observe, there are many cases where Best-HH obtains a smaller profit than the best heuristic for each particular instance.


**Table 5.** Ranks and total profit obtained by the four heuristics and the best, median, and worst hyper-heuristics trained on 60% of set *α*<sup>1</sup> and tested on set *α*2. The best result, measured in terms of total profit, is highlighted in bold.

**Figure 3.** Profit per instance obtained by the best hyper-heuristic trained on 60% of set *α*<sup>1</sup> and the best and worst heuristic per instance on set *α*2.

4.2.3. Exploratory Experiment III

So far, we have observed that, even though it is possible to overcome the best individual performer for each instance in some specific cases, oftentimes the HHs tend to replicate the behavior of the overall best heuristic (MaxPW for sets *α*<sup>1</sup> and *α*2). Although this is not a bad result—the model produces very competent HHs, it is difficult to justify the additional

time devoted to producing such HHs given the small gains with respect to MaxPW. For this reason, in this experiment we wanted to explore the case where the HHs can only choose among Def, MaxP, and MinW (all the available heuristics but MaxPW) and evaluate if the HHs produced may still be considered competent. As in the previous experiments, we generated 30 HHs by training on set *α*<sup>1</sup> for 200 iterations each, and testing on set *α*2. In all the cases, the HHs have an initial cardinality of 12 heuristics.

A comparison between the total profits of Best-HH and MaxPW (Table 6) reveals that a significant efficiency was lost due to the removal of MaxPW from the pool of heuristics (Figure 4). However, all three representative HHs (best, median, and worst) obtained the second rank in terms of total profit. The profit of Best-HH shows an improvement of nearly 6% and over 64% with respect to the MinW and Def heuristics, respectively. Therefore, the model can learn in harsh conditions and thus obtains promising results regardless of whether it was given a pool of heuristics with limited quality.

**Figure 4.** Profit per instance obtained by the best hyper-heuristic trained on set *α*<sup>1</sup> (removing MaxPW from the pool of available heuristics) and the best and worst heuristic per instance on set *α*<sup>2</sup> (including MaxPW in the comparison).

#### *4.3. Confirmatory Experiments*

Seeking to test the generalization properties of the proposed HH model, we conducted three additional experiments that now incorporate problem instances taken from the literature. These experiments were conducted in a similar way to the exploratory ones: each one involves the generation of 30 HHs with an initial cardinality of 12, which were trained for 200 iterations. However, this time we tried some combinations of the instance sets for training and testing. The rationale behind these experiments is to observe how well the HHs adapt to changes in the properties of the instances being solved with respect to the instances used for producing such HHs.


**Table 6.** Ranks and total profit obtained by the four heuristics and the best, median, and worst hyperheuristics trained on set *α*<sup>1</sup> and tested on set *α*2, but MaxPW is not available for the hyper-heuristics. The best result, measured in terms of total profit, is highlighted in bold.

#### 4.3.1. Confirmatory Experiment I

So far, we have evaluated the performance of HHs on instances similar to the ones used for training, under various scenarios. In the first confirmatory experiment, we use all the instances from set *α*<sup>1</sup> to train the HHs, but test their performance on set *β*1. Let us recall that set *β*<sup>1</sup> contains 600 hard instances proposed by Pisinger [45] and feature 20 items and different capacities. The relevant issue regarding set *β*<sup>1</sup> is that such instances are considered hard to solve.

Table 7 shows that MaxPW is, once again, the best individual performer in this experiment. Although all the produced HHs perform similarly (cf. Worst-HH), they are unable to improve upon the results obtained by the best heuristic. Nonetheless, all the HHs produced obtained a total profit larger than those obtained by the remaining heuristics.

**Table 7.** Ranks and total profit obtained by the four heuristics and the best, median, and worst hyper-heuristics trained on set *α*<sup>1</sup> and tested on set *β*1. The best result, measured in terms of total profit, is highlighted in bold.


Even if the HHs were incapable of overcoming MaxPW in this experiment, it is interesting to see how close their performance is, especially as these HHs were trained with instances of a different type than those used for testing.

#### 4.3.2. Confirmatory Experiment II

In the previous experiment, we evaluated the performance of HHs trained on balanced sets of instances when tested on hard instances, and we observed a limited capacity to deal with such instances. In this experiment, we show that this behaviour is not due to the hardness of the instances, but because of the discrepancy between the instances used for training and the ones used for testing. For this reason, this experiment is twofold: (1) we test how HHs behave when trained and tested on hard instances (set *β*1) and (2) we try to reduce the effect of MaxPW in the resulting HHs by reducing the number of instances in the training set. Then, we produced 30 new HHs using only 60% of the instances in set *β*<sup>1</sup> for training purposes. As in previous experiments, the 60% instances were randomly selected once and used for producing the 30 HHs. For consistency, the maximum number of iterations for the EA was set to 200 and all the HHs were initialized to a cardinality of 12. The HHs produced in this experiment were evaluated on the remaining 40% of the instances in set *β*1. The results for rankings and total profit derived from this experiment are summarized in Table 8.

**Table 8.** Ranks and total profit obtained by the four heuristics and the best, median, and worst hyper-heuristics trained on 60% of set *β*<sup>1</sup> and tested on the remaining 40% of set *β*1. The best result, measured in terms of total profit, is highlighted in bold.


This experiment confirms the importance of the instances used for training and their similarity with those solved during the test process. This time, the performance of most of the hyper-heuristics produced using 60% of set *β*<sup>1</sup> when tested on the remaining 40% of set *β*<sup>1</sup> improve on MaxPW (the best heuristic for this set). In fact, the performance of Worst-HH is practically the same as MaxPW (Figure 5). Although the results suggest that an improvement in the total profit can be obtained by using the proposed hyper-heuristic approach, the gain in profit derived from using Best-HH instead of MaxPW is rather small (<0.5%). However, this result should not be interpreted as an indication that Best-HH is replicating the behavior of MaxPW. In fact, their overall profit is similar (with a small gain for Best-HH but their specific performance is not. As depicted in Figure 6, when the ranks are analyzed, we observe that Best-HH obtains better results in more cases than the best individual heuristic, MaxPW.

**Figure 5.** Profit per instance obtained by the best hyper-heuristic trained on 60% of set *β*1, and the best and worst heuristic per instance on the remaining 40% of set *β*1.

**Figure 6.** Frequency of the ranks obtained by MaxPW and Best-HH (trained on 60% of set *β*1) the remaining 40% of set *β*1.

#### 4.3.3. Confirmatory Experiment III

In this final experiment, we extend our analysis to hard and larger instances. This time, we produced 30 HHs by using 60% of set *β*<sup>2</sup> and tested them on the remaining 40% of the same set. With this experiment, we validate that the learning method is quite stable as it still produces competent hyper-heuristics. Despite the fact that the set comprised instances with different features, all three cases beat the best operator (MaxPW) in isolation. Additionally, setting a good training set seems to impact the efficiency of the HH model.

Training done on *α*<sup>1</sup> seemed to negatively affect the results, as seen on exploratory experiment I and confirmatory experiment I. It is important to note that sets *α*<sup>1</sup> and *α*<sup>2</sup> are balanced and synthetically produced. Thus, 25% of the problem instances are designed to maximize the performance of a single low-level heuristic. This pattern is repeated for all the heuristics in this set. Conversely, hard problem instances from sets *β*<sup>1</sup> and *β*<sup>2</sup> are randomly sampled (without replacement) using three different seeds. This training scheme is more representative of real-life applications, where often no balanced or ideal conditions are met.

The model's behavior is similar to what we observed in previous cases. The hyperheuristics trained on hard and larger instances are still competitive regarding the single heuristics applied in isolation. This time, the improvement on MaxPW is smaller than in previous cases (<0.01%). Table 9 shows that, as in previous cases, there is a difference in the behavior of the heuristics and HHs in terms of ranks, but it does not necessarily represent a significant difference in terms of total profit. However, when we analyze the success rate, we observe that Best-HH obtains promising results, (0.6542, 0.0958). This indicates that in little more of 65% of the instances in the 40% of set *β*<sup>2</sup> used for testing, Best-HH was equal in performance to the best out of the four heuristics. Besides, in almost 10% of the instances for the same set, Best-HH improved upon this result.

**Table 9.** Ranks and total profit obtained by the four heuristics and the best, median, and worst hyper-heuristics trained on 60% of set *β*<sup>2</sup> and tested on the remaining 40% of set *β*2. The best result, measured in terms of total profit, is highlighted in bold.


#### *4.4. Discussion*

We would like to discuss the rationale behind the proposed model and why it may be useful for other problem domains besides the one studied in this document.

As with other packing problems, the iterative nature of KP makes it a great candidate for a hyper-heuristic approach. The inherent mapping of problem states to decisions (or, in this case, packing heuristics) can lead to an optimal item selection by overfitting if the relationship were to be searched exhaustively. A generalization of this item selection is the basis of the rationale behind this approach. Furthermore, the advantages of mixing heuristics have previously been discussed in detail for various optimization and search problems [27,58].

The evidence obtained from the experiments confirms that it is possible to produce a sequence of heuristics that provides an acceptable performance when tested on a set of instances. Allow us to explain this in more detail. Let us assume that a hyper-heuristic HH1 is to be produced for solving only one KP instance with *n* items, KP1. If the cardinality of the hyper-heuristic is equal to the number of items to pack ( = *n*), then HH1 can decide which heuristic to use for packing each item. Among all the possible HHs that could be produced, there is one where all the decisions are correct, HH∗ <sup>1</sup>, which represents the optimal sequence to pack the items in KP1, given the available heuristics. Now, let us produce a new hyper-heuristic HH∗ <sup>2</sup>, the best sequence of heuristics for solving a second KP instance, KP2, also containing *n* items (the simplest case for the analysis). There is no guarantee that the previous hyper-heuristic, HH∗ <sup>1</sup>, will also represent the optimal sequence of heuristics for solving KP2. If we keep the idea of having one individual decision per item, only a few of these decisions will be the same for both sequences, HH∗ <sup>1</sup> and HH<sup>∗</sup> <sup>2</sup>. In order to merge the two sequences, some errors must be accepted. The task of the evolutionary framework is to decide which errors to accept so that the performance is the best among all the instances in the training set. Overall, the model is not looking for the best sequence of heuristics for each particular instance but the best sequence to solve, acceptably, all the instances in the training set.

#### **5. Conclusions and Future Work**

In this work, we analyzed the efficiency of a selection HH which does not depend on problem characterization. The analysis showed that the learning mechanism deals very well on all instances despite its simple parameters. For small instances like the ones in sets *α*<sup>1</sup> and *α*2, the MaxPW heuristic seemed like the most suitable heuristic in isolation. The instances used for training were generated with one packing heuristic in mind. Instead, instances in the literature are considered harder and represent a challenge for a single heuristic in isolation. It is also important to note that the instance sets may be considered small, so the learning process was not computationally expensive. For larger datasets with more instances to solve, a HH could perform better if one has adequate resources for the learning process.

Our work proves the feasibility of a feature-independent hyper-heuristic approach powered by an evolutionary algorithm. The results confirm our idea that it is possible to generate generic sequences of heuristics that perform better than individual heuristics for specific sets of instances. Our results also demonstrate that the similarity between the training and testing instances influences the model's performance. In other words, the model can generalize to unseen types of instances, but the ideal scenario would be to train the hyper-heuristics on instances with similar features to the ones to be solved when training is over. At this point, we consider it fair to mention that we are aware of the diversity of the optimization problems. We have selected the KP because it was convenient for our study, as we can easily develop and test hyper-heuristics for this particular problem. However, many other exciting problems could have also been considered in this regard. Our interest was to propose a new hyper-heuristic method to deal with instances that are hard to solve by exact methods.

Many considerations could be taken into account to improve the analysis. Fixing the size of HHs and setting a single heuristic per gene in the model may impact the frequency analysis. Adding more mutation operators and tweaking their probability distribution is also something to consider for future work. More importantly, extending the amount of packing heuristics to choose from, though increasing the heuristic search space, may display more explicit differences between heuristic sequences. Once again, we would like to emphasize that the selection HH proposed did not deal with any problem characterization or feature analysis. Adding problem characterization may improve the performance of the learning process even more at the expense of some additional computational effort.

Although we have not tested the proposed model on other NP-hard optimization problems, we are confident that the model can work correctly on similar problems such as packing and scheduling. Based on our experience, those problems show similar properties that allow hyper-heuristics to grasp the structure of the instances and decide which heuristic to apply at certain times of the solving process. Therefore, the proposed model should perform properly. Nonetheless, testing our approach on other challenging problems is, undoubtedly, a path for future work.

**Author Contributions:** Conceptualization, X.S.-D. and J.C.O.-B.; methodology, X.S.-D., I.A. and J.M.C.-D.; software, X.S.-D. and J.C.O.-B.; validation, I.A. and J.M.C.-D.; formal analysis, X.S.-D.; investigation, X.S.-D., H.T.-M. and S.E.C.-P.; resources, J.C.O.-B.; data curation, X.S.-D.; writing original draft preparation, X.S.-D.; writing—review and editing, J.C.O.-B., I.A. and J.M.C.-D.; visualization, X.S.-D., I.A. and J.M.C.-D.; supervision, H.T.-M. and S.E.C.-P.; project administration, J.C.O.-B.; funding acquisition, I.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was partially funded by the research group with strategic focus in intelligent systems at ITESM, grant NUA A00834075, and CONACyT Basic Science projects with grant number 287479 and fellowship 2021-000001-01NACF-00604.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **A Memetic Algorithm for the Cumulative Capacitated Vehicle Routing Problem Including Priority Indexes**

**Samuel Nucamendi-Guillén †, Diego Flores-Díaz †, Elias Olivares-Benitez \*,† and Abraham Mendoza †**

Facultad de Ingenieria, Universidad Panamericana, Álvaro del Portillo 49, Zapopan, Jalisco 45010, Mexico; snucamendi@up.edu.mx (S.N.-G.); 0198749@up.edu.mx (D.F.-D.); amendoza@up.edu.mx (A.M.)

**\*** Correspondence: eolivaresb@up.edu.mx

† These authors contributed equally to this work.

Received: 27 April 2020; Accepted: 1 June 2020; Published: 5 June 2020

**Abstract:** This paper studies the Cumulative Capacitated Vehicle Routing Problem, including Priority Indexes, a variant of the classical Capacitated Vehicle Routing Problem, which serves the customers according to a certain level of preference. This problem can be effectively implemented in commercial and public environments where customer service is essential, for instance, in the delivery of humanitarian aid or in waste collection systems. For this problem, we aim to minimize two objectives simultaneously, the total latency and the total tardiness of the system. A Mixed Integer formulation is developed and solved using the AUGMECON2 approach to obtain true efficient Pareto fronts. However, as expected, the use of commercial software was able to solve only small instances, up to 15 customers. Therefore, two versions of a Memetic Algorithm with Random Keys (MA-RK) were developed to solve the problem. The computational results show that both algorithms provided good solutions, although the second version obtained denser and higher quality Pareto fronts. Later, both algorithms were used to solve larger instances (20–100 customers). The results were mixed in terms of quality but, in general, the MA-RK v2 consistently outperforms the first version. The models and algorithms proposed in this research provide useful insights for the decision-making process and can be applied to solve a wide variety of business situations where economic, customer service, environmental, and social concerns are involved.

**Keywords:** open vehicle routing; integer programming; split deliveries; logistic distribution; mixed integer formulations

**MSC:** [2010] 90C11; 90B06

#### **1. Introduction**

In this paper, we study the biobjective variant of the Cumulative Capacitated Vehicle Routing Problem (CCVRP), a "customer-centric" variant of the classical Capacitated Vehicle Routing Problem (CVRP) [1] in which a fleet of *k* vehicles serves a set of customers by respecting their priority, defined as an index assigned to each node to indicate the preference to be served. Unlike the traditional VRP, which focuses on the impact that routing costs have on logistics and, in particular, in the transportation activities within the supply chain, the CCVRP rises as a particularization that covers broad objectives centered around service level issues. This problem is relevant in contexts where both customer satisfaction and company profits are prioritized. Due to the importance of sustainable business practices nowadays, there is a need to develop distribution strategies aimed at reducing the negative impact that transportation activities have on the environment. An important application can also be found in the context of emergency logistics, where the distribution of medical aid becomes crucial, particularly when the distribution of different types of drugs or supplementary medical equipment have different levels of importance.

The CCVRP was introduced by Ngueveu et al. [2] as a generalization of the well known *k*-Traveling Repairmen Problem (*k*-TRP), with the objective of addressing problems in which customer metrics reflect the need for fast, equity and fair services. Since then, a diverse number of scenarios have been addressed: where a single vehicle can travel multiple trips [3,4], considering stochastic demand and split/unsplit deliveries [5], or when multiple depots are available [6]. Specifically, the CCVRP with priorities arises in the context of modeling situations in which customers have a differentiated level of attention and has attracted the interest of researchers over the past years due to its applicability in fields such as emergency logistics (i.e., level of damage), delivery logistics (i.e., delivery of different perishable items), personnel transportation, school bus transportation, machine scheduling considering due dates, maintenance operations including delivery dates, receiver wireless communications, etc.

As in the traditional VRP and its known variants, exact solution methods, heuristics, and metaheuristic algorithms are proposed to solve the CCVRP. Karagul et al. [7] pointed out that optimal solutions are possible to obtain in small-scale problems using exact solution methods. However, large-scale problems are difficult and time-consuming to solve to optimality. Hence, when it comes to real-life optimization, where problems are complex and deal with a significant amount of data, sometimes it is enough to find approximate solutions through the use of heuristic and metaheuristic methods.

Before the formal introduction of the problem in 2010, Kara et al. [8] developed a preliminary research studying a particular version of the CCVRP, named as CumVRP, in which the objective consists of minimizing the sumproduct of the arrival times and the demand of the node. Additionally, the CCVRP has been mainly studied from the monobjective function perspective. For this, several contributions involving mathematical models [9,10], exact algorithms [9,11], and heuristic and metaheuristic approaches [2,4,12–14] have been developed.

Previously, Sbihi and Eglese [15] dealt with the time-dependent routing problem (TDVRP) and applied this approach to green logistics. They established that the use of time-dependent models could obtain optimal solutions that produce less pollution by directing vehicles to roads where they can travel at faster speeds, which means away from congestion zones. However, this could represent traveling longer distances. Another benefit of this model is that it allows time window constraints to be satisfied more reliably. On the other hand, Kwon, Choi, and Lee [16] presented the heterogeneous vehicle routing problem that determines a set of vehicle routes that satisfies customer demands and vehicle capacities and minimize the sum of variable operating costs. An integer programming model was used, and for more complex problems, Tabu Search algorithms were developed to find solutions within a reasonable computational time. The results show that the amount of carbon emissions could be reduced by carbon trading without increasing the total costs.

Regarding the study of biobjective vehicle routing problems considering the customer-centric approach, few contributions can be found in the literature: the traveling repairman problem with profits (TRPP) [17], the *k*-Traveling Repairmen Problem with Profits (*k*-TRPP) under uncertainty [18], and a biobjective study of the minimum latency problem [19]. In the first work [17], the authors considered the *k*-TRP with the objective function that monotonically decreases over time. For this problem, they proposed a mathematical formulation. However, since the model should be solved separately for each path, the solution framework can be time-consuming, motivating the development of a Tabu-search procedure. The second research [18] addresses the *k*-TRP but enables the decision-maker to manage and control risk. For this, the authors proposed a mean-risk formulation and a heuristic approach based on an adaptive neighborhood search for the *k*-TRPP under uncertainty, providing high-quality results for small and medium-size instances. Finally, the most recent work [19] studies the Minimum Latency-Distance Problem (MLDP), analyzing the conflict between the total distance traveled by the vehicles and the total waiting time of the customers. The authors proposed a mathematical formulation and two heuristic procedures inspired on evolutionary algorithms, obtaining optimal

results for instances up to 40 customers and extending the computational experimentation over the metaheuristics for instances up to 256 customers.

Considering the aforementioned discussion, to the best of our knowledge, no tailored approach has been proposed in the literature for the problem considering the risk-averse perspective. For this reason, in this paper ,we study the Biobjective Cumulative Capacitated Vehicle Routing Problem considering Profits (BCCVRP-Pr). To handle this new problem, we propose a mixed-integer formulation and a MA-RK procedure to efficiently deal with instances of reasonable size.

Elshaer and Awad [20] studied the diversity of variants of vehicle routing problems solved using metaheuristics, where eleven papers were identified using memetic algorithms. In a recent example, Li et al. [21] proposed a hybrid metaheuristic that combines a memetic algorithm, sequential variable neighborhood descent, and a revised 2-opt method, for the plug-in hybrid electric vehicle routing problem. In addition, Zhang et al. [22] developed a multiobjective memetic algorithm for the vehicle routing problem with time windows. To our knowledge, only Ngueveu et al. [2] applied memetic algorithms to the cumulative capacitated vehicle routing problem. They created two versions of their procedures to solve a single objective problem to minimize the sum of arrival times at the customers.

On the other side, memetic algorithms have been combined with random keys in some applications. One example is the hybridization of He et al. [23], where memetic algorithms were combined with a biased random key genetic algorithm and adaptive large neighbourhood search to solve a scheduling problem. Other applications to scheduling problems using memetic algorithms and random keys were proposed by Li and Yin [24] and Ghrayeb and Damodaran [25], among others. Although this combination of memetic algorithms and random keys have been used to solve sequencing problems, such as in the traveling salesman problem [26], this combination has not been used to solve complex routing problems such as the one presented in this paper.

The literature described above shows the novelty of the combination of memetic algorithms and random keys to solve complex multiobjective routing problems.

The remainder of this paper is organized as follows. Section 2 describes the proposed mathematical formulation, whereas Section 3 presents the algorithm developed. Section 4 reports the results obtained from the experimentation using a set of benchmark instances. Finally, Section 5 summarizes the major findings and proposes future research directions.

#### **2. Mathematical Formulation**

In this section, we present the formal definition of the BCCVRP-Pr as well as its corresponding mathematical formulation. First, the following parameters and variables are considered:

#### **Parameters**


The BCCVRP-Pr can be formally defined as an undirected graph *G* = (*V*, *A*) where *V* = {0, 1, 2, ..., *N*} denote the node set and *A* is the set containing all arcs. The node 0 corresponds to the depot and the rest of the nodes form the set of customers *V* = {1, 2, ..., *N*}. Each arc (*i*, *j*) ∈ *A* has an associated travel time *cij* and each node *j* ∈ *V* has an associated demand *dj*. A heterogeneous fleet of *<sup>K</sup>* vehicles is available, each with capacity *<sup>Q</sup>k*, *<sup>k</sup>* ∈ {1, 2, ..., *<sup>K</sup>*}. It is assumed that the vehicle's overall capacity is enough to satisfy the total demand of all the clients. In addition, to consider the priority, a precedence matrix *P* is defined in which *pij* = 1 represents that customer *i* should be serviced before customer *j* and *pij* = 0 means that customer *i* can be served after the customer *j*.

The following additional parameters are considered:


The goal is to find *k* tours that have only node 0 in the first position, and to cover all the nodes, while minimizing the sum of the latencies of the trips. The demand of all customers must be satisfied without exceeding the capacity of each vehicle. Customers should be served (preferably) according to their priority level, minimizing the total tardiness of the system.

For each path, the tardiness arises when a customer with lower priority is served before a customer with higher priority (even if both customers belong to different routes). In other words, the arrival time (*ti*) for a customer with a lower priority index is less than the arrival time of a customer with a higher priority index (*tj*). Qualitatively, tardiness is associated with customer dissatisfaction. However, because latency is estimated as a function of distance, then tardiness is obtained as the difference between their arrival times (when *pij* = 1), *Iij* = *tj* − *ti*. Hence, the total tardiness of the system is computed, as shown in Equation (1):

$$Total\text{ }tardines = \sum\_{i \in V'} \sum\_{j \in V'} I\_{ij} \tag{1}$$

Figures 1 and 2 graphically exhibit the conflict among the values of latency and tardiness for an instance that contains 12 nodes. The number above each node denotes the customer priority index (higher values indicate higher priorities). In Figure 1, the minimum total tardiness was obtained subject to the minimum total latency. Correspondingly, Figure 2 indicates the optimal total latency while assuring the minimum total tardiness.

**Figure 1.** Solution routes minimizing the total latency: total latency = 1595.31; total tardiness = 4304.55.

**Figure 2.** Solution routes minimizing the total tardiness: total latency = 2972.84; total tardiness = 0.

The formulation is based on the model presented in [10] for the classical CCVRP and following the single flow formulation proposed for the multiple traveling salesman problem [27].

#### **Variables**

*wk* <sup>0</sup>*<sup>j</sup>* = # 1, if the vehicle *k* is assigned to a path from node 0 to customer *j*, 0, otherwise.

$$\mathbf{x}\_{ij} \quad = \begin{cases} 1, & \text{if the arc } (i, j) \text{ is in the path of a vehicle,} \\ \alpha & \text{...} \end{cases}$$

0, otherwise.


The corresponding mathematical model is stated as follows:

$$\min F\_1 = L = \sum\_{i \in V'} c\_{0i} y\_{0i} + \sum\_{i \in V'} \sum\_{\substack{j \in V' \\ j \neq i}} c\_{ij} y\_{ij} \tag{2}$$

$$\min F\_2 = T = \sum\_{i \in V'} \sum\_{\substack{j \in V' \\ j \neq i}} I\_{ij\prime} \tag{3}$$

subject to:

∑ *j*∈*V* ∑ *k*∈*K wk* <sup>0</sup>*<sup>j</sup>* = *K*, (4) ∑ *i*∈*V xi*<sup>0</sup> = *K*, (5) ∑ *j*∈*V* ∑ *k*∈*K wk* <sup>0</sup>*<sup>j</sup>* = 1, ∀*k* ∈ *K*, (6) ∑ *k*∈*K wk* <sup>0</sup>*<sup>j</sup>* + ∑ *i*∈*V xij* = 1, ∀*j* ∈ *V* , (7) ∑ *j*∈*V xij* = 1, ∀*i* ∈ *V* , (8) *y*0*<sup>j</sup>* + ∑ *i*∈*V i*=*j yij* − ∑ *i*∈*V i*=*j yji* = 1, ∀*j* ∈ *V* , (9) *<sup>y</sup>*0*<sup>j</sup>* ≥ ∑ *k*∈*K wk* 0*j* , ∀*j* ∈ *V* , (10) *yij* ≥ *xij*, ∀*i* ∈ *V* ; ∀*j* ∈ *V* ; *i* = *j*, (11) *<sup>y</sup>*0*<sup>j</sup>* ≤ (*<sup>N</sup>* − *<sup>K</sup>* + <sup>1</sup>) ∑ *k*∈*K wk* 0*j* , ∀*j* ∈ *V* , (12) *yij* ≤ (*<sup>N</sup>* − *<sup>K</sup>*) ∑ *k*∈*K xij*, ∀*i* ∈ *V* ; ∀*j* ∈ *V* , (13) *vk* <sup>0</sup>*<sup>j</sup>* <sup>≥</sup> *djw<sup>k</sup>* 0*j* , ∀*j* ∈ *V* ; ∀*k* ∈ *K*, (14) *vk* <sup>0</sup>*<sup>j</sup>* <sup>≤</sup> *<sup>Q</sup>kwK* 0*j* , ∀*j* ∈ *V* ; *k* ∈ *K*, (15) *vij* ≥ *djxij*, ∀*i* ∈ *V* ; *j* ∈ *V* ; *i* = *j*, (16) *vij* ≤ (*Qmax* − *di*)*xij*, ∀*i* ∈ *V* ; *j* ∈ *V*; *i* = *j*, (17) ∑ *k*∈*K vk* <sup>0</sup>*<sup>j</sup>* + ∑ *i*∈*V i*=*j vij* − ∑ *i*∈*V i*=*j vji* = *dj*, ∀*j* ∈ *V* , (18) *t k* <sup>0</sup>*<sup>j</sup>* = *<sup>c</sup>*0*jw<sup>k</sup>* 0*j* , ∀*j* ∈ *V* ; *j* ∈ *V* ; *k* ∈ *K*, (19) *t k* <sup>0</sup>*<sup>j</sup>* <sup>≤</sup> *Mw<sup>k</sup>* 0*j* , ∀*i* ∈ *V* ; *j* ∈ *V* ; *i* = *j*; *k* ∈ *K*, (20) *tij* ≥ *cijxij*, ∀*i* ∈ *V* ; *j* ∈ *V* ; *i* = *j*, (21) *tij* ≤ (*M* − *cij*)*xij*, ∀*i* ∈ *V* ; *j* ∈ *V*; *i* = *j*, (22) ∑ *h*∈*V h*=*i tih* − ∑ *h*∈*V h*=*i thi* − ∑ *k*∈*K t k* <sup>0</sup>*<sup>i</sup>* = ∑ *j*∈*V j*=*i cijxij*, ∀*i* ∈ *V* , (23) *Iij* <sup>≥</sup> *pij* ∑ *k*∈*K t k* <sup>0</sup>*<sup>i</sup>* + ∑ *h*∈*V h*=*i thi* − ∑ *h*∈*V h*=*j thj* − ∑ *k*∈*K t k* 0*j* , ∀*i* ∈ *V* ; *j* ∈ *V* ; *i* = *j*, (24)

$$w\_{0j}^k \in \{0, 1\}, \tag{25} \tag{25} \\ \qquad \qquad \qquad \qquad \forall j \in V'; \forall k \in K\_\prime \tag{25}$$

$$x\_{i\bar{i}} \in \{0, 1\}, \tag{26}$$

$$\forall i \in V'; \forall \bar{i} \in V,\tag{26}$$

$$\begin{aligned} \forall i j \ge 0, \quad & \forall i \in V; \forall j \in V, \\ \upsilon\_{0j}^k \ge 0, \quad & \forall j \in V'; k \in K, \end{aligned} \tag{27}$$

$$\begin{aligned} v\_{0j}^{\kappa} \ge 0, \quad & \forall j \in V'; k \in K, \\ v\_{ij} \ge 0, \quad & \forall i \in V'; j \in V, \end{aligned} \tag{28}$$

$$\dot{\hat{\imath}}\_{0j} \ge 0,\tag{30} \tag{31} \\ \qquad \qquad \qquad \qquad \forall j \in V'; k \in \mathbb{K}\_{\prime} \tag{30}$$

$$t\_{ij} \ge 0,\tag{31}$$

$$\forall i \in V'; j \in V,\tag{31}$$

$$I\_{\vec{\mathbb{N}}} \ge 0,\tag{32}$$

$$\forall i \in V'; j \in V'.\tag{32}$$

In this formulation, the objective functions in Equations (2) and (3) seek a trade-off between the objective of the total latency and the total tardiness of the system. The constraints in Equations (4) and (5) ensure that exactly *K* arcs leave and return to the depot. The constraints in Equation (6) establish that a customer node can be visited for exactly one vehicle coming from the depot. Additionally, the constraints in Equations (7) and (8) impose that exactly one arc enters and leaves for each node associated with a customer. The constraints in Equations (10)–(13) help to avoid sub-tours and allow to calculate the position of the customer *j* on its respective route. The constraints in Equations (12) and (13) force variables *y*0*<sup>j</sup>* and *yij* to be zero when *w<sup>k</sup>* <sup>0</sup>*<sup>j</sup>* = 0 and *xij* = 0, respectively. Regarding the capacity of the vehicles, the constraints in Equations (14) and (15) allow establishing lower and upper bounds for the cumulative demand of any route. These constraints are derived from a generalization of restrictions in Equations (10) and (11). The constraints in Equations (16) and (17) force variables *v<sup>k</sup>* 0*j* and *vij* to be zero when variables *w<sup>k</sup>* <sup>0</sup>*<sup>j</sup>* = 0 and *xij* = 0, respectively. The constraints in Equation (18) ensure that the demand at each node *i* is fulfilled and in conjunction with Equations (16) and (17) estimate the load per vehicle. The constraints in Equations (19)–(23) control the arrival time to the nodes (customers). The constraints in Equation (24) estimate the tardiness (in case of violating the priority constraints). Finally, the constraints in Equations (25)–(32) establish the nature of the variables.

#### *Reformulation Using Epsilon Constraint*

*t k*

In this subsection, we describe the characteristics of the model and the proposed reformulation. A fundamental task in multiobjective optimization is to find Pareto-optimal solutions. As a biobjective approach, we decided to implement a multiobjective method of resolution to generate an exact front of efficient solutions.

In the mathematical model, we note that the objective functions are separable. In other words, each of them involves different decision variables. On the one hand, *yij* allows estimating the total arrival time to the customers. On the other hand, *Iij* computes the total tardiness for the case in which customers having a minor priority are served earlier than customers with higher priority index (regardless if they are located in the same route or belong to different routes).

To clarify this, the particular structure of the biobjective problem proposed herein is described below:

$$\min F\_1 = L = \sum\_{i \in V'} c\_{0i} y\_{0i} + \sum\_{i \in V'} \sum\_{\substack{j \in V' \\ j \neq i}} c\_{ij} y\_{ij} \tag{33}$$

$$\min F\_2 = T = \sum\_{i \in V'} \sum\_{\substack{j \in V' \\ j \neq i}} I\_{ij} \tag{34}$$

#### subject to : *Equations*(4)–(24)

The aforementioned characteristics of the biobjective model are exploited by using an improved version of the *ε*-constraint method, named as AUGMECON2 [28], as a solution procedure. For every single routing decision in Equations (4)–(18), the minimum tardiness (min *T*) problem bounded by the constraints in Equations (19)–(24) is solved as principal objective function, transforming the latency function (*L*) into constraint. The results of the proposed method are shown in Section 4.

#### **3. Metaheuristic Algorithm**

This section is devoted to describe the metaheuristic approach capable of obtaining high quality solutions for small instances and able to deal with large size instances. The proposed method is based on a Memetic Algorithm (MA), a population procedure that has shown its effectiveness in solving sizeable combinatorial optimization problems by incorporating a local search procedure within a classical genetic algorithm. This procedure has been successfully applied for addressing the CCVRP [2], introducing efficient move evaluation procedures in operations *O*(1) for some particular neighborhood structures. MAs have been also employed in solving other routing problems such as split delivery vehicle routing problems [29], capacitated location routing problems [30,31], vehicle routing problems with time windows [32], school bus routing problems [33], and green and healthcare routing problems [34,35].

#### *3.1. Proposed Memetic Algorithm*

Holland [36] was the first to propose Genetic Algorithms (GAs) inspired on ideas of evolution theory. Due to their simple and yet effective search procedure, several papers (e.g., [37]) describe their successful implementations in vehicle routing problems. In particular, Memetic algorithms (MAs) belong to the class of evolutionary algorithms that intensify the search by including local search within a classical genetic algorithm framework. According to Moscato and Cota [38], MAs are intrinsically concerned with exploiting all available knowledge about the problem under study. Due to this, a random key mechanism is included during the construction procedure in order to enhance the performance of the procedure. In this work, the MA proposed is adapted from the NSGA-II, successfully implemented in [39], and consists of the following procedures: initialization, recombination (crossover and local search), and classification in fronts. In our construction procedure, we include a random key that helps to generate a diverse initial population of feasible solutions of size *N*. Subsequently, during a predetermined number of successive generations (iterations), an offspring (*Pt*) of *N* individuals is generated from *Pt*−<sup>1</sup> through recombination and local search mechanisms, involving members representing tentative solutions (high-quality or non-dominated solutions) and members representing diverse solutions. After this, the individuals who belong to the previous and current generations are evaluated and grouped into fronts, according to the level of non-domination, as explained in [40]. To obtain the resulting offspring population of size *N*, the individuals are inserted into the set, starting with the one that belongs to the front of non-dominated solutions (*F*0). Algorithm 1 shows the pseudocode of the overall MA with Random Keys procedure.

#### 3.1.1. Constructive Procedure Based on Random Keys

The constructive procedure creates an initial population of feasible solutions based on generating a chain *Sa* = {1, 2, ··· , *n*} and, for each customer, an auxiliary random key *Ra* is used to encode the solution. Additionally, an empty set *Sp* is used to save the temporary assignment of the customers to the routes.

**Algorithm 1:** Memetic algorithm with random keys.

#### **begin**

*it* ← 0; Initialize a population (*P*0) of *σ* chromosomes implementing the constructive procedure based on random keys; Sort *P*<sup>0</sup> in fronts following the non-dominated sorting approach; **for** *(it* = 1; *it* <= *Maxiter*; *it* + +*)* **do** Generate an offspring population *Pit*, of size *N*, from *Pit*−1, using selection, crossover and local-search operators; Combine parent and offspring population *Rit* = *Pit* ∩ *Pit*−1; Sort population using the non-dominated sorting approach, identify fronts *Fj* = (1, 2, ···), and calculate the crowding distance for each solution in *Fj*; Make *Tit*+<sup>1</sup> ← ∅, *j* ← 1; **while** *(*|*Tit*+1| + |*Fj*| < *N)* **do** *Tit*+<sup>1</sup> ← *Tit*+<sup>1</sup> + *Fj* ; *j* ← *j* + 1; Sort solutions in *Fj* in decreasing order according to crowding distances, select the first *N* − |*Tit*+1| elements of *Fj* and add it to *Tit*+1; **end end end**

**Encoding mechanism:** To encode the solution, a real number drawn randomly from [0, 1) is assigned to every single position in *Ra*. Figure 3 depicts an example of this mechanism.


**Figure 3.** Example of the encoding mechanism.

**Decoding mechanism:** The decoding mechanism is applied based on the information of the random key *Ra*. The *Ra* chain is sorted in a non-decreasing order and their respective positions in *Sa* are sorted correspondingly. As a result, a random ordered chain *Sa* is obtained. Figure 4 exhibits the decoding mechanism.

**Figure 4.** Example of the decoding mechanism.

**Assignment mechanism**: In every iteration, the algorithm selects the corresponding *j*th customer in *Sa* and systematically tries to insert it into into a temporary set *Sp* in the first available position (procuring to maintain feasibility in the capacity of the vehicles). For instance, if in the first potential route, two customers have been previously assigned, the next open position will be the third one. In the case that the customer cannot be inserted in the route due to the capacity constraints, the insertion will be evaluated in the next available route. It is important to emphasize that, since the number of routes is given in advance, the construction procedure considers a parallel routing mechanism. In other words, it performs the evaluation of feasible insertions over all of the routes. If so, the algorithm continues by selecting the next customer at *Sa*. Otherwise, the construction mechanism stops. If the algorithm reaches a feasible assignment, then *S* ← *Sp* and the solution is inserted into the population

*Qt*. Otherwise, the algorithm destroys the partial constructed solution in *Sp* and generates a new random key *Ra* (to sort *Ra*).

The entire constructive procedure finishes when all of the customers have been assigned into *Sp* or after having a successive number of attempts without producing a feasible solution. When a feasible assignment is reached, the set *S* represents a feasible initial solution of routes. The customers already inserted in *S* are removed from *Sa*. Figure 5 shows an illustrative example of a feasible assignment.


**Figure 5.** Example of a feasible assignment.

After reaching a feasible assignment, the sequencing mechanism is applied to construct each route by respecting the order of insertion and, based on this, the corresponding calculations of the objective functions *L* (representing the total latency of the system) and *T* (total tardiness of the system, based on the priorities of the customers) are performed. Algorithm 2 shows the pseudocode for this algorithm.


```
begin
   Data: S ← ∅, Sp ← ∅, Sa =← ∅, L ← 0, T ← 0
   Fill chain in Sa = {1, 2, ··· , n};
   Create an auxiliary random key chain Ra with values [0, 1);
   Sort customers in Sa in a non-decreasing order according with their corresponding
    random values in Ra;
   j = 0;
   while (Sa = ∅) do
       flag = 0;
       while Sa = ∅ or flag = 1 do
          Select the jth customer from Sa;
          l = 0;
          if The insertion of the jth customer is feasible to insert in route l then
              Insert the jth customer in the lth route Sp;
              Remove the assigned customer from Sa;
              j + +;
              flag = 0;
          end
          else if l < K then
              l + +;
          end
          else
              Destroy the partial solution, Sp ← ∅;
              Establish Sa = {1, 2, ··· , n};
              Create a new random key Ra ;
              flag = 1 ;
          end
       end
       Compute the values of total latency L and total tardiness T for the individual;
   end
end
```
#### 3.1.2. Crossover Procedure with Local Search Strategies

The proposed crossover procedure consists of the combination of two solutions, *A* and *B*, to create a new solution *C*. A tournament selection operator is incorporated to diversify the creation of new solutions. After obtaining the new individual *C*, a selective local search mechanism can be applied to improve it.

The procedure receives the following inputs: the current population (*Q*), the updated population (*R*), and the number of children to create (*N*). Then, the mechanism starts by selecting two individuals from the current generation (*Pt*). The first individual will belong to the front *F*0, whereas the second one will be chosen at random from the entire population (generation). Subsequently, the customers of the routes for each solution are grouped into a single big chain by following the assignment order (starting from the first customer belonging to the first route and ending at the last customer of the latter one). As a result, the chains for the corresponding chromosomes *A* and *B* are obtained.

The creation of the new individual *C* is based on the information of the random keys (*Ra*) of each parent (chromosome). Then, a probability for inheriting is assigned to each parent. These probabilities are complementary. In other words, if the probability of *PA* = *α* is assigned for the first chromosome (*A*), then the second chromosomes will receive a probability of *PB* = 1 − *α*. Then, for every position to fill in the *Ra* belonging to the customer, the roulette wheel is spun (*RN*) to determine if the element belonging to the *Ra* for the first or second parent must be selected to insert in the child. If the value of *RN* ≤ *PA*, the *i*th element *Ra* is included in the child. Otherwise, the element belonging to the second parent is selected. The mechanism stops when all of the positions have been evaluated. Figure 6 illustrates and example of this mechanism.


**Figure 6.** Example of child generation based on the roulette-tournament mechanism.

To enhance the creation of reasonable quality solutions, the probability assigned to the parent belonging to *F*<sup>0</sup> is always greater than 50%. Since the decoding procedure is a simple mechanism, it might occur that different random keys lead to an identical solution. Once the *Ra* for a child solution *C* is obtained, its feasibility is evaluated by calling back the constructive procedure. If the resulting solution is feasible, then the total latency *LC* and tardiness *TC* objectives are computed. On the contrary, the child is discarded, and a new second parent is selected (preserving the first individual) from *Pt* to restart the crossover procedure. Algorithm 3 depicts the pseudocode of the crossover mechanism.

**Algorithm 3:** Crossover procedure.

```
begin
   Data: Q, R
   for (h = 1; h <= N; h + +) do
       Select randomly the first parent (A) from F0 ∈ Q;
       Select randomly the second parent (B) from Q;
       Construct the big chain for each selected solution;
       i = 0;
       repeat
          Spin the wheel to obtain the value of probability;
          if probability ≤ β and customer in A is available then
              Select the customer of the chromosome A;
          end
          else if probability ≤ β and customer in B is available then
              Select the customer of chromosome B;
          end
          else
              i++;
          end
       until All positions in both parents have been evaluated;
   end
   if feasible then
       Decode the corresponding solution for the new individual;
       Compute the Latency (L) and Tardiness (T) values;
       Spin the roulette to obtain a rand number;
       if rand ≤ threshold then
          Apply the local search procedure over the individual C;
       end
       Insert the new solution C in R.;
   end
end
```
3.1.3. Local Search (LS) Procedure

The LS procedure is based on local search strategies, applied to intensify the search in pursuit of finding local minima. This procedure consists of five different neighborhood structures arranged into two classes, namely intra-route and inter-route mechanisms, performing them iteratively. This strategy has proved to be successful for a mono-objective version of the CCVRP [10]. Below, we describe each type of move:


The two major strategies operate as follows: At first, the initial solutions are sent to the intra-local search procedure, where the intra-local search strategies are applied. Then, the local minimum is forwarded to perform inter-routes local search strategies. These procedures are iteratively implemented, while the current solution value *L* keeps improving. In each process, the reallocation movement is performed first, and the execution of the interchange movement next. The first improvement criterion (*FI*) is used. Algorithm 4 exhibits this process.

```
Algorithm 4: Local search (S, L, T).
```
#### **begin**

```
repeat
       S∗ = S and L∗ = L, and T∗ = T;
       S',L' T' ← Intra-route local search (S,L, T);
       if The solution is non-dominated then
           S = S and L = L
                             , T∗ = T;
       end
       S",L" T"← Inter-routes local search (S,L, T);
       if The solution is non dominated then
           S = S" and L = L", T∗ = T;
       end
   until L∗ > L;
   return S∗, L∗, T∗;
end
```
As observed, this mutation procedure seeks to insert improved individuals to the next generation, although the mechanism does not guarantee that the chromosome selected can be deeply improved. Because the size of this subset is relatively small, it is always possible to find a chromosome to be improved.

This procedure differentiates the two versions of the MA. For the first version (MA-RK v1), all of the feasible individuals generated by the crossover mechanism are sent to the LS procedure (*threshold* = 1). For the second version (MA-RK v2), the local search mechanism is applied only to a certain percentage of individuals, expecting to accelerate the performance of the algorithm in terms of CPU time.

#### **4. Computational Results**

This section is devoted to reporting the computational experiments conducted to assess the efficiency of the proposed approach. First, we provide the set of instances used to perform the tests, as well as the characteristics of the computational equipment used. Secondly, we present the parameter setting for our versions of the MA. Finally, the experimental results for both the mathematical formulation and the metaheuristic procedure are displayed, accompanied by the respective discussion.

#### *4.1. Test Instances*

The instances used to conduct the experimentation were adapted from the ones used in the literature to evaluate the multi depot VRP with heterogeneous fleet: Koulaeian et al. [41] (Kou15), Chunyu and Xiaobo [42] (CaX10), Gillett and Jhonson [43] (GaJ76-7–GaJ76-12), and Augerat et al. [44] (Pn16k8 and Pn23k8). Even though there are some instances proposed by Talliard [45] and Li et al. [46] for the classical Heterogeneous Fleet Vehicle Routing Problem, we decided to use these instances since some of them provide a reasonable size in the number of customers for assessing the model.

The size of the instances ranges from 12 to 100 nodes and from 2 to 10 vehicles. The generated problem instances are characterized by the following criteria: (i) number of customers; (ii) number of vehicles; (iii) coordinates (x,y) for all locations (including the depot); (iv) demand of each node; and (v) priority index for each node. Since the original instances consider multiple depots and do not consider any preference index between the customers to be served, we selected the first depot as the single-origin, and we included a priority parameter by assigning a numerical index within {1, <sup>√</sup>*n*}, based on a uniform distribution. The customers having the highest value of the index represent the ones that should be first served (highest level of priority).

Since the modified instances are set to deal with a different problem, and to facilitate the report of the results, we decided to rename them using the nomenclature "FNO-x", where *x* denotes a consecutive number assigned according to the rank assigned to the instance (in terms of the number of nodes, following a non-increasing order). For example, the instance Kou15 is the one with the lowest number of customers (12); therefore, it was renamed as FNO1. The remaining instances based on Pn16k8, CaX10, Pn23k8, GaJ76-7, GaJ76-8, GaJ76-9, GaJ76-11, and GaJ76-12 were renamed as FNO2, FNO3, FNO4, FNO5, FNO6, FNO7, FNO8, and FNO9 respectively. In the case of the instance GaJ76-10, it was named FNO10 because it has the largest size in the number of customers. These new instances are available by request.

All of the experiments were conducted using a PC Intel Core i7 @2.30 GHz with 16 GB of RAM Memory under Windows 10. The formulation was modeled using AMPL and solved using Gurobi 9.0. For each instance, we established a time limit of 7200 s (2 h). In the case of the MA-RK, both versions were coded using the C++ language. In the next subsection, the parameter tuning is presented.

#### *4.2. Parameters Setting*

In the case of the MA-RK v1, the values for the parameters corresponding to the size of the population *N*, the threshold value *β*, and the maximum number of generations *D* were set as 1000, 0.1, and 100, respectively. In the case of MA-RK v2, we set a threshold of 0.4. These values were obtained after performing a preliminary analysis over a subset of instances randomly selected. In addition, several tests with different number of iterations were conducted, finding that, for all the analyzed instances, the MA-RK stops improving when it reaches 80% of the maximum number of iterations, depending on the instance. Lastly, to evaluate consistency, each instance was executed 10 times, and the best front obtained is reported. The next sections show the numerical results computed over the test instances.

#### Experimental Results

The first set of experiments aims at evaluating the performance of the formulation concerning optimality (optimally solved instances), and the effectiveness of the MA-RK comparing the results with those obtained by the resolution of the model. We first present the results for Instances FNO1 and FNO2 (up to 15 nodes).

The following metrics were used to compare the performance of the exact and approximation procedures:


These metrics have shown their successful implementation in biobjective VRPs [19,39].

The number of non-dominated points measures the ability of each method to find efficient fronts. Table 1 summarizes these results for Instances FNO1 and FNO2. Figures 7 and 8 show the Pareto front for each instance and method.

**Figure 7.** Pareto front for Instance FNO1.

**Figure 8.** Pareto front for Instance FNO2.

A point to highlight in Figures 7 and 8 and Table 1 is that, for both instances, the Pareto front obtained by Gurobi is densely crowded. Additionally, notice that both versions of MA-RK performed differently over the solved instances. In particular, for Instance FNO1, the second version of the algorithm produced a front that is closer to the optimal front obtained by CPLEX. On the contrary, for Instance FNO2, both algorithms produced fronts near to the optimal front and, in particular, the MA-RK v1 produced a more dense front than the MA-RK v2.


**Table 1.** Quantity of non-dominated points for Instances FNO1 and FNO2.

Regarding the computational time, the summarized results are presented in Table 2. In this case, the exact method required around 1 h for solving the FNO1 instance (12 nodes), whereas the time required to solve the FNO2 (15 nodes) instance was almost four times as long. In particular, both versions of the metaheuristic required less than 1 s to obtain the solutions. This fact, in conjunction with the metric of the quantity of non-dominated points, supports the evidence that, in general, the MA-RK algorithm performed very well.

**Table 2.** Elapsed CPU time (in seconds) for Instances FNO1 and FNO2.


Regarding the density of the fronts, Table 3 shows the average *k*-distance value of all points on the efficient frontiers for each instance, with *k* = 3. Specifically, the MA-RK produced fronts with more density than AUGMECON2. In particular, for the FNO2 instance, it can be seen that the NSGA-v1 obtained the minimum values for the maximum and average distances, while the MA-RK v2 obtained a denser front for Instance FNO1.

**Table 3.** Maximum and average *k*-distances for the FNO1 and FNO2.


To verify the efficiency of the MA-RK (in both versions), we used the hypervolume metric. Table 4 displays the obtained results. Again, the MA-RK v2 provided a better value of hypervolume for Instance FNO1, while the MA-RK v1 performed better on Instance FNO2.

**Table 4.** Hypervolume for Instances FNO1 and FNO2.


Finally, we used the coverage measure (considering only the strict domination). Tables 5 and 6 exhibit the results. In these tables, a value of C(X', X") equal to 1 means that all points in the estimated efficient frontier X" are strictly dominated by points in the estimated efficient frontier X'. As expected, the exact method dominates both algorithms entirely in terms of the space covered. Regarding the

metaheuristic procedures, for Instance FNO1, the MA-RK v2 was able to generate points that dominate almost 77.78% over the ones generated by the MA-RK v1. For Instance FNO2, the front of the MA-RK v2 dominates 33.33% of the points generated by the MA-RK v1 which, in turn, dominates 25% of the points generated by the MA-RK v2.


**Table 5.** Coverage of two sets value for Instance FNO1.



We observe that, for Instances FNO1 and FNO2, when the minimum value of latency is obtained, the maximum amount of total tardiness rises to 3.2 times the cost of latency, which might translate to a higher level of customer dissatisfaction. On the contrary, when the minimum value of total tardiness is reached, the overall latency of the system rises up to 1.4 times the optimal (minimum) value. In this case, the increment of latency translates into a significant increase in the cost and, therefore, in a reduction of profit. In addition, it is obvious that the prioritization of the customers generates an unbalance in their demand, especially for the case of having routes where relatively few customers can have significantly high amounts of demand compared to the rest.

Another aspect to highlight is that the decision-making process can be seen from two perspectives: (1) savings in tardiness costs can represent up to 72% of the total costs; and (2) savings in latency produce savings for up to 28% of the total costs. In other words, according to the objective function, if a preference must be defined a priori, tardiness must be more important than latency.

#### *4.3. Experimental Results for Larger Instances*

The experimentation considering large size instances was conducted in both versions of the algorithm. The complementary experimentation involves instances of up to 100 nodes. Tables 7–12 display the results of our computational experimentation.

In Table 7, Column 1 displays the name of the instance. Columns 2 and 3 indicate the size in terms of the number of nodes and the number of routes. Columns 4 and 5 report the number of non-dominated solutions obtained by each algorithm. For Tables 9 and 10, Column 1 shows the name of the instance, whereas Columns 2 and 3 report the value of the corresponding algorithms over the evaluated metric. Specifically, for Table 11, two columns are used to indicate the maximum and average *k*-distances for each procedure.

Following the same sequence used in the previous section, the first metric to compare is the quantity of nondominated points. Table 7 reports the number of non-dominated points obtained by each algorithm (Pareto front). According to the information there, MA-RK v2 was able to obtain a higher quantity of non-dominated points. In some instances, the number of points reported almost doubled the amount of the ones obtained by MA-RK v1.This can be explained by the fact that MA-RK v2 generates more diverse solutions, since the process of intensification is selective.


**Table 7.** Quantity of non-dominated points for large-size instances.

In Table 8, the minimum and maximum values for each objective are shown. According to the information obtained, for most of the instances, the MA-RK v2 reports better values for the total latency. In addition, the MA-RK v2 produces better values of tardiness for most of the instances. In summary, the selective version of the MA-RK clearly outperforms the MA-RK v1.

**Table 8.** Minimum and maximum values for both objectives functions for large-size instances.


Regarding the performance of the algorithms, it can be noticed that, for larger instances, the MA-RK v2 clearly outperforms the MA-RK v1. To better illustrate this, the fronts of Instances FNO8 and FNO10 are displayed in Figures 9 and 10.

**Figure 9.** Pareto fronts for Instance FNO8.

**Figure 10.** Pareto fronts for Instance FNO10.

Due to this, the metric of the execution time was evaluated to verify if any of the versions performs more quickly. Table 9 displays the elapsed CPU time for the best execution.


**Table 9.** Elapsed CPU time for the rest of the instances.

As expected, the required time increases as the size of the instances increased. However, both versions of the metaheuristic were working within a similar computational performance range.

As for the third metric, the hypervolume, the results of the algorithm are displayed in Table 10. There, we do not have enough evidence to confirm that any algorithm outperforms the other. What can be confirmed is that MA-RK v2 produced higher values of hypervolume for seven out of eight instances. However, for the instance where the MA-RK v1 obtained better values, the difference against the NSGA was small.

**Table 10.** Hypervolume for the rest of the instances.


Regarding the density of the frontiers, Table 11 shows the results obtained. From these results, it can be noticed that, in most of the cases, the MA-RK v2 produced lower values for the maximum distances (more compactness). However, the MA-RK v1 algorithm produced better values for the average distances.

**Table 11.** Maximum and average *k*-distances for the large-size instances.


Finally, Table 12 reports the values obtained for the set coverage metric. The first column refers to the name of the instance, and the rest show comparisons in coverage between the algorithms. Again, MA-RK v2 performed better than MA-RK v1, by dominating the entire front provided by MA-RK v1. This confirms that the selective version of the MA-RK clearly dominates.


**Table 12.** Coverage of two sets value for the rest of the instances.

In summary, we conclude that, even when both metaheuristics provide good results in a reasonable computational time, the MA-RK v2 consistently outperforms the non-selective MA-RK version.

#### **5. Conclusions and Future Work**

This study addressed the biobjective Cumulative Capacitated Vehicle Routing Problem. This problem mainly arises in commercial contexts such as the delivery of perishable goods, in which there are differentiated based on priority indexes. In the case of a pooled transportation service, it might help to estimate the trade-off between delivering the orders in the same sequence as customers board the vehicle and the minimum arrival time of the system. For this problem, a mixed-integer programming formulation and two metaheuristic algorithms were developed. A commercial optimization software was able to solve the model for small size instances, whereas the algorithms showed their effectiveness by providing feasible results in a reasonable amount of computational time.

The algorithms showed their efficiency by providing good quality fronts for the small size instances. Additionally, for larger instances, both algorithms provided good values for the multiobjective metrics evaluated. Although none of the algorithms outperformed each other, the MA-RK v2 obtained fronts with a higher quantity of points, more density, and more coverage of the sets. However, the MA-RK v1 was slightly faster. One intuition is that MA-RK v1 is more intensive in the local search, stagnating in local optima, while MA-RK v2 maintains the diversity, allowing to escape from local optima and populating the Pareto-fronts. However, more computational experiments are needed to clarify this effect.

In summary, all procedures provided a positive contribution to a more sustainable balance between economic and customer service objectives. Our results provide useful insights for business applications in terms of considering customer satisfaction and gaining a valuable sustainable advantage given that the reduction in the traveled time translates into a reduction of CO2 emissions.

Future research lines include the design of routes using congested environments with travel speed variation. This fact can be addressed either by modifying the objective function to include the variability in the travel time during the day or using a risk aversion approach, by adding a profit to each node associated with the order of visit. In addition, considerations involving time windows as priority metrics, and demand uncertainty can be worth consideration, as well as factors such as labor costs or balance of the total traveled distance among routes, which seem to dominate the overall cost.

**Author Contributions:** Conceptualization, S.N.-G. and E.O.-B.; methodology, S.N.-G. and D.F.-D.; software, S.N.-G. and D.F.-D.; validation, S.N.-G., D.F.-D., and E.O.-B.; formal analysis, S.N.-G., D.F.-D., and E.O.-B.; investigation, S.N.-G. and D.F.-D.; resources, S.N.-G. and E.O.-B.; data curation, S.N.-G. and D.F.-D.; writing—original draft preparation, S.N.-G.; writing—review and editing, E.O.-B. and A.M.; visualization, S.N.-G.; supervision, S.N.-G. and E.O.-B.; project administration, S.N.-G. and E.O.-B.; and funding acquisition, S.N.-G., E.O.-B., and A.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by Universidad Panamericana through the grant "Fomento a la Investigación 2019", under project code UP-CI-2019-ING-GDL-08.

**Acknowledgments:** This work was supported by the Universidad Panamericana through the grant "Fondo Fomento a la Investigación UP 2019", under project code UP-CI-2019-ING-GDL-08.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **A Parallel Meta-Heuristic Approach to Reduce Vehicle Travel Time in Smart Cities**

**Hector Rico-Garcia 1, Jose-Luis Sanchez-Romero 1,\*, Antonio Jimeno-Morenilla <sup>1</sup> and Hector Migallon-Gomis <sup>2</sup>**


**Abstract:** The development of the smart city concept and inhabitants' need to reduce travel time, in addition to society's awareness of the importance of reducing fuel consumption and respecting the environment, have led to a new approach to the classic travelling salesman problem (TSP) applied to urban environments. This problem can be formulated as "Given a list of geographic points and the distances between each pair of points, what is the shortest possible route that visits each point and returns to the departure point?". At present, with the development of Internet of Things (IoT) devices and increased capabilities of sensors, a large amount of data and measurements are available, allowing researchers to model accurately the routes to choose. In this work, the aim is to provide a solution to the TSP in smart city environments using a modified version of the metaheuristic optimization algorithm Teacher Learner Based Optimization (TLBO). In addition, to improve performance, the solution is implemented by means of a parallel graphics processing unit (GPU) architecture, specifically a Compute Unified Device Architecture (CUDA) implementation.

**Keywords:** smart cities; meta-heuristics; travelling salesman problem; TLBO; parallelism; GPU

**Citation:** Rico-Garcia, H.; Sanchez-Romero, J.-L.; Jimeno-Morenilla, A.; Migallon-Gomis, H. A Parallel Meta-Heuristic Approach to Reduce Vehicle Travel Time in Smart Cities. *Appl. Sci.* **2021**, *11*, 818. https:// doi.org/10.3390/app11020818

Received: 12 November 2020 Accepted: 13 January 2021 Published: 16 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

The Smart City concept involves providing the urban environment with smart infrastructure, technology, and procedures to ameliorate the quality of life of inhabitants from an integrated perspective. Among the different aspects that have an impact on improving the quality of life, several action fronts can be noted related to the movement of people and goods, that is, traffic in the urban or metropolitan environment [1]. Indeed, effective traffic management results in a reduction in the travel times of citizens in their vehicles, which in turn has a positive impact on reducing the stress levels of drivers and optimizing the arrival at their workplaces for the effective performance of different professional activities. In addition, effective traffic management also entails better use of existing infrastructure and a reduction in pollution levels, an issue that is becoming increasingly relevant given people's awareness of environmental care and the impact on health.

The problem of traffic in cities becomes more relevant if we take into account the displacements that must be made by carriers and couriers to take different goods from headquarters to different delivery points, whether they are supermarkets, offices, private or homes. It should also be pointed out that the problem will affect the different platforms of autonomous vehicle fleets in the near future, in which the driver takes a passive role and the vehicle makes the decisions regarding the route to be followed to reach its destination in the shortest time.

The large amount of data and measurements that Internet of Things (IoT) devices can provide allows researchers to accurately model the different routes within an urban environment [2]. In this way, as explained in [3,4], smart cities should provide user-centered mobility services, implementing intelligent and/or automated transportation systems that incorporate strategies of artificial intelligence and technologies, such as the IoT and Physical Internet (PI). The integration of different technologies, such as RFID and LoRaWAN (Long-Range WAN), to implement PI oriented to provide Mobility-as-a-Service is demonstrated in [5].

With the information available, routes are not only determined by the distance between different geographical points, but can also incorporate aspects related to the expected time of arrival, which include, in addition to the distance itself, traffic lights and their crossing times, pedestrian crossings, and possible traffic jams.

A correspondence can be established between this problem and the classic combinatorial travelling salesman problem (TSP), in this case applied to car movements within urban environments. This problem can be informally described in the following manner: "Given a list of geographical points and the distances between each pair of points, what is the shortest possible path that passes one and only one point and returns to the starting point?".

The problem has given rise to a wide variety of research, and various algorithms and heuristics have been developed to try to reduce the complexity when obtaining solutions. In this paper, the aim is to provide an optimal or suboptimal solution to TSP instances applied to smart city environments, using a modified version of the metaheuristic optimization algorithm Teacher Learner Based Optimization (TLBO). In addition, performance is improved by implementing the solution on a parallel graphics processing unit (GPU) architecture, specifically a Compute Unified Device Architecture (CUDA) implementation.

This paper is organized as follows: Section 2 provides a review of state-of-the-art traffic congestion management in smart cities. Section 3 introduces a formal description of the TSP and its similarity with the problem of managing traffic in urban environments. Section 4 describes the TLBO method applied to both continuous and discrete domains. Section 5 shows the manner in which TLBO is implemented on a parallel CUDA architecture to enhance its performance. Section 6 depicts the results of the parallel TLBO when applied to a set of real scenarios from a well-known benchmark. Finally, in Section 7 conclusions of the research work are outlined and future lines of research are proposed.

#### **2. The Concern of Traffic Management in Smart Cities and Urban Environments**

One of the main problems to be addressed in an urban environment is vehicle traffic management. Although freight traffic is almost inevitably composed of trucks and vans, passenger movement throughout a city is still mostly operated by private vehicles [6]. As an example, 45% of American people had no access to public transportation in 2018 [7].

Numerous issues regarding the problems associated with car traffic in urban environments are currently being studied. Because the main objective in designing a smart city is to improve "personal satisfaction by utilizing innovation to enhance the proficiency of administrations and address occupants issues" [2], the movements of inhabitants through the city must be managed in such a way that travel time is reduced according to the specific needs of each inhabitant, but concerns about pollution and well-being must also be addressed. The same issues apply to freight transport, because transporters must make their deliveries in an efficient way, trying to reduce delivery times, pollution rates, and stress and tiredness of drivers.

Several research works can be found that deal with the possibility of monitoring traffic in urban environments by means of different technologies and methods. In the work of Rizwan et al. [2], an inexpensive real-time traffic management system is proposed to provide a smart service by activating traffic indicators to instantly update the traffic information. Inexpensive vehicle detection sensors are embedded in the middle of the road every 500 m or 1000 m; IoT is used to quickly acquire traffic information and send it for processing; and real-time flow data is sent for Big Data analysis. In the work of Fernandez-Ares [8], an approach was developed which tracks the movement of people and vehicles by monitoring the radioelectric space, capturing Wi-Fi and Bluetooth signals supplied by smartphones or on-board hands-free devices. In the work of Sendra et al. [9], a collaborative Long-Range (LoRa)-based sensor network to monitor the pollution levels in smart cities

is developed. The system consists of geo-located nodes embedded in vehicles and fixed places to monitor temperature, relative humidity, and concentration of CO2 in urban environments. The system uses the gathered data to generate the necessary orders to traffic signs and panels that control the traffic circulation. In Kazmi et al. [10], a methodology is proposed that uses VITAL-OS, an IoT platform that enables collection, integration, and management of data streams coming from multifunctional IoT devices and data sources. The information gathered is analyzed to detect traffic noise events; if the noise level on a particular road is higher than a certain threshold, the required traffic signal junction is managed for traffic re-routing to alternative paths. In Kök et al. [11], a deep learning model is proposed for analyzing IoT smart city data. The model is based on Long Short Term Memory (LSTM) networks to predict future values of air quality. In Jabbarpour et al. [12], an IoT application called Intelligent Guardrails is shown. It uses vehicular networks to identify traffic situation of the roads, and incorporates electronic and mechanical techniques to increase the number of lanes of the congested side of a highway while decreasing the lanes on the non-congested side. In Singh et al. [13], a visual Big Data analytics framework for automatic detection of bikers who do not wear a helmet in city traffic is proposed. The paper discusses the issues involved in visual Big Data analytics for traffic control in a city on a surveillance data scope. In Pawłowicz et al. [14], an approach of a traffic management system is proposed. It includes 5G communication, RFID-based parking space monitoring, and cloud services for supervision and machine learning. In Rathore et al. [15], a system for a smart digital city that uses IoT devices for collecting city data and Big Data analytics for acquiring knowledge is proposed. A model is proposed to develop a system that can handle a large volume of city data and provide guidelines to the local authorities. The system implementation involves several stages, including data generation and gathering, aggregation, filtering, classification, preprocessing, computing, and decision making. The gathered data from smart components within the city is processed in real-time to achieve a smart service using Hadoop under Apache Spark. Data generated by smart homes, smart parking areas, weather systems, pollution monitoring sensors, and vehicle networks are used for analysis and testing. In the work of Behnke and Kirschstein [16], a study on the effect of path selection in emission-oriented urban vehicle routing is presented, which is composed of a heterogeneous road network with regard to speed and acceleration frequency. An algorithm to determine all means of emission minimization for pairs of origin and destination nodes given a specific vehicle is proposed and tested using the road network from the city of Berlin.

In Ehmke et al. [17], two data-driven approaches for determining time-dependent emission minimizing paths in urban environments are proposed. Their performances are compared with respect to computing efficiency and solution quality. On average, emissions-minimizing methods can reduce emissions by approximately 3.5% compared to distance-minimizing paths, and 5.0% with regard to minimum time-dependent paths. The emissions-optimized path is roughly 4% longer than the paths generated by distancebased methods, and around 6% longer with regard to travel time compared to travel-time optimized paths.

In the work of Suzuki [18], attention is paid on the so-called pollution routing problem (PRP), which tries to minimize the fuel consumption or pollutants emission of trucks. The final goal is to develop a decision support system of pollution vehicle routing for eco-friendly enterprises. The work identifies, by analyzing the state-of-the-art PRP and gathering expert opinions from carrier managers, a practical PRP model that uses a minimal subset of the main factors affecting fuel consumption, and then develops a solution for this model. In Ehmke et al. [19], research is focused on the problem of minimizing CO2 emissions in the routing of a fleet of capacitated vehicles in urban environments. A local search procedure, named the tabu search heuristic, is adapted to solve the problem. The research uses instances from a real road network dataset and 230 million speed observations. In Kramer et al. [20], a metaheuristic approach, called ILS-SOA-SP, is proposed to solve the PRP; this method integrates iterated local search (ILS) with a set partitioning (SP) procedure and a speed optimization algorithm (SOA). The proposed method has the benefit of performing combined route and speed optimization within several local search and integer programming components.

As a summary, a wide variety of research has been carried out with regard to the improvement of quality of life in smart cities in terms of traffic management and, consequently, the reduction of fuel consumption, street noise, and pollution.

#### **3. The Travelling Salesman Problem**

The TSP has given rise to a wide variety of research, due to its simplicity of description but its complexity at the time of obtaining a solution [21]. If the problem is formulated using graph theory terminology, it can be defined as a graph *G* = (*V*, *A*), where *V* = {*v*1, ... ,*vn*} consists of a set of *n* vertices (nodes) and *A* = {(*vi*, *vj*)/*vi*, *vj* ∈ *V*, *i* = *j*} is a set of edges with an associated non-negative cost (distance) matrix *D* = (*dij*). The problem is symmetric if *dij* = *dji* for any pair of vertices (*vi*, *vj*) ∈ *A*, and it is said to be asymmetric (ATSP) otherwise.

If *I*:{1, ... , *n*} → {1, ... ,*n*} is a bijective function that determines a reordering of vertices *vi* in *V*, the TSP can be defined as obtaining the minimal result for the following calculation:

$$F\_{TSP} = \sum\_{j=1}^{n-1} d\_{I(j)I(j+1)}\tag{1}$$

It is easy to establish an analogy or correspondence between the TSP and the problem of managing traffic in smart cities. For example, the set of nodes can be related to the different city locations where a van or truck must deliver its cargo from a depot. As previously explained, the meaning of *distance* in the case of urban environments is different from the mere concept of geographical distance, because it can also incorporate traffic lights and their crossing times, pedestrian crossings, speed limits, and even dynamic factors, such as temporary traffic jams, road maintenance works, and accidents. In addition, data collected on fuel consumption and pollution related to different roads can be added.

For a set of *n* nodes, obtaining the optimal solution by means of exhaustive search is a problem that implies (*n* − 1)! comparisons. As an example, in the case of ten nodes, the required comparisons between possible solutions would be 9! = 362,880, and 15 nodes would imply comparing 14! = 87,178,291,200 possible solutions to obtain the best solution. The extensive list of research papers related to TSP has produced different methods that try to decrease the aforementioned complexity. Heuristic methods are suitable for managing TSP, and much attention has been paid to these types of methods among researchers on optimization. In [22], a review of swarm intelligence applied to graph search problems is contributed, but the paper focuses almost solely on ant colony optimization (ACO) and bee colony optimization (BCO).

In [23], two different modifications of the artificial bee colony (ABC) method are applied to TSP: a combinatorial algorithm, called CABC, and an improved version of CABC, called qCABC. Experiments are performed on a benchmark of 15 TSP instances. The performance of these two algorithms and eight different genetic algorithm (GA) versions is compared. Moreover, the performance of ABC is also compared with the ant colony system (ACS) and bee colony optimization methods. Results show the suitability of CBAC and qCABC algorithms to be applied to TSP problems.

Most of the works related to solving the TSP by means of metaheuristic optimization methods use the ant colony optimization method [24–28], sometimes in a hybrid version with other methods [29]. This is because the natural behavior of ants in the development of a colony can be easily matched to the problem of finding the shortest path between a series of nodes.

#### **4. The TLBO Optimization Method**

*4.1. Original TLBO Formulation*

Rao [30] developed a series of metaheuristic algorithms that do not require the adjustment of initial parameters to operate. Among these algorithms, TLBO imitates the learning process that takes place in the classroom: from the teacher to the students. It is an iterative process that evolves in stages to optimize a function with multiple variables. This process culminates when the conditions that make the solution valid are met.

For each stage, a random population of possible solutions (called individuals) is created. These values are assigned taking into account the domain of the function *f*. Once these values are assigned for the population, two stages are carried out, as detailed below.

#### 4.1.1. Teacher Stage

Individual *i* within the population is represented by vector *X*(*i*). Therefore, *X*(*i,j*) refers to variable *j* of individual *i*. The teacher stage aims to assess the function for each member of the population considering its own parameters, *f*(*X*(*i*)). The individual with a value closest to the optimum is selected as the teacher (*Xbest*). Then, teacher values are used to bring his disciples closer to him by following Equation (2), which also takes into account the mean values of the population. Thus, each individual will have its new (*Xnew*) solution.

$$X\_{new}(i,j) = X(i,j) + rand(0,1)(X\_{best}(j) - TFactor \cdot X\_m(j))\tag{2}$$

As can be observed in Equation (2), the value of *X*(*i,j*) is modified according to the distance that exists between the teacher and the mean of the population (*Xm*). The factor *T* is a random integer value that can be valued at 1 or 2, and whose calculation is shown in Equation (3). After changing the values of each individual, the function is evaluated again (*f(Xnew(i*)). Only in the case that the evaluation offers a more optimal solution than the original one, the new values calculated for the individual *i* are taken.

$$TFactor = round(1 + rand(0, 1))\tag{3}$$

#### 4.1.2. Learner Stage

At the learner stage, again all individuals are evaluated and compared. This time the comparison takes place with another randomly selected student. In this pair-wise comparison, the student with the optimal solution is called *Best Learner* and the other is called *Worst Learner*. Both students are used to set up a possible new individual *Xnew* from the original *X*(*i*) according to Equation (4). In case this proposal is more optimal than the original one, the new values calculated will replace the original ones for that individual.

$$X\_{\text{new}}(i,j) = X(i,j) + rand(0,1) \cdot (BestLexner(j) - WordLeftmer(j)) \tag{4}$$

In this algorithm only the number of iterations and the number of individuals in the population have to be fixed; therefore, its main advantage is that there are no parameters that have to be adjusted for the algorithm to converge towards an optimal solution, as happens in other metaheuristic algorithms. In the recent years, there has been an increase in the number of publications that highlight the advantages of TLBO for the resolution of a large number of engineering problems [31–34].

#### *4.2. Discrete TLBO*

The problem to be solved in this research, the TSP, is of a discrete nature which comes from the appropriate selection of a combination of sites whose sorting satisfies a criterion. In the case of this research, the aim is to find the shortest route.

This problem does not fit with the characteristics of the original TLBO, whose domain of solutions is established in a continuous range in which the selection of values taken from a set is not possible. For this reason, it is necessary to make a change in the original TLBO algorithm, so that discrete values taken from a set can be selected. This discrete version of the algorithm (DTLBO) is based on the research proposed in [35] and substantial changes have been made to adapt it to the TSP case study.

In [35], the whole population consists of 100 individuals. Although the paper does not mention it implicitly, it can be guessed that each subpopulation contains four individuals, so 25 subpopulations are supposed to be formed. The division of a population into different subpopulations is a strategy used in several metaheuristic methods to increase the diversity of the population and, therefore, reduce the probability of being trapped in a local optimum.

#### 4.2.1. Representation of Individuals

The representation of individuals in DTLBO changes substantially from the original algorithm. In this version of the algorithm the individuals represent routes through a set of places. Each variable within an individual corresponds to a node or location. As shown in Figure 1, if a set of eight places has to be visited (*v*0, *v*<sup>1</sup> ... *v*7), an individual could be represented as eight connected nodes that determine the next order of visit: *v*7→*v*0→*v*1→*v*6→*v*4→*v*2→*v*5→*v*3.

**Figure 1.** Representation of an individual solution.

In contrast to the work proposed in [35], which uses four completely random subpopulations, in this version of the DTLBO one of the individuals is not generated randomly but is assigned initial values based on a greedy strategy. This is intended to accelerate the convergence of the algorithm and preserve its ability to avoid local minima by maintaining randomness in the subpopulations.

#### 4.2.2. DTLBO Teacher Stage

At this stage, a Partial Teacher is selected for each subpopulation. As with the original TLBO, this teacher is used to improve the students in each subpopulation taking into account the mean value of the subpopulation. In addition, to avoid losing the global vision of the route, a global Teacher is also selected as the best individual of the whole population. It must be taken into account that the calculation of the mean individual could violate the path conditions of the TSP, so a viability condition must be added to its calculation. This condition is performed in the following four steps


**Figure 2.** An example of the viability operation.

The crossover operation is used in the DTLBO to generate new individuals from existing ones. This operation is indicated by the symbol ⊗. In this version of the algorithm, four different ways of creating new individuals from crossovers were introduced and are shown in Equation (5). In contrast to the work presented in [35], in which a fixed cross is assigned between individuals, this approach makes a random selection for each cross operation. This randomness avoids falling into local minima and increases the variability of populations.

$$\begin{array}{l} \textit{Xnew(i)} = \textit{X(i)} \otimes \textit{Teacher} \\ \textit{Xnew(i)} = \textit{X(i)} \otimes \textit{Partial} \\ \textit{Xnew(i)} = \textit{X(i)} \otimes \textit{Mean(i)} \\ \textit{Xnew(i)} = \textit{Partial} \textit{Teacher(i)} \otimes \textit{Mean(i)} \end{array} \tag{5}$$

The operation of the crossover can be shown in Figure 3. Suppose that two individuals *A* and *B* must generate a new one called *Ac*. The initial and final positions of the new individual are selected randomly and thus the new individual *Ac* replaces the selected items by those of individual *B*. Logically, after this crossover is performed, the viability operation must be applied.

**Figure 3.** An example of the crossover operation.

To increase the variability of the populations, a mutation operator is included, which is denoted by the symbol Θ (shown in the Equation (6)). Figure 4 shows the functionality of this operator. Given an *Ac* individual, initial and final positions are randomly selected; in the current example, these are positions 3 and 6. Once these positions are determined, the elements included in this sequence (items of positions 3, 4, 5, and 6) are inverted to create the mutated element *Acm*.

$$A\_{cm}(i) = \Theta A\_c(i) \tag{6}$$

**Figure 4.** An example of the mutation operation.

#### 4.2.3. DTLBO Learner Stage

Similar to the original TLBO, in the learner stage a random individual *k* is chosen for each individual *i* in the subpopulation to be compared with. The new individual is created from Equation (7), where ⊗ represents the crossover operation, working in the same way as the crossover operation in the teacher stage. Then it will be necessary to apply the condition of viability on the new individual *Xnew*(*i*) and then apply the mutation operator as described in the teacher stage.

$$Xnew(i) = X(i) \circledcirc X(k)\tag{7}$$

Although the accuracy of the DTLBO presented in [35] is best when compared with well-known algorithms such as ACO, Particle Swarm Optimization (PSO), and Genetic Algorithm (GA), and only in some specific cases it is slightly beaten by ACO and ABC. It is also the case that its performance is low when cities have a large number of places to visit. For this reason, this investigation focused on a revision of this algorithm based on the improvement of its convergence and on a parallel implementation that allows taking advantage of the performance of the multiprocessing present in the existing GPUs.

#### **5. Parallel TLBO Implementation on GPU**

As explained above, applying a series of modifications to the TLBO algorithm can achieve significant results in discrete combinatorial problems, in this case, in the TSP problem. Satisfying results are achieved compared to other algorithms [35].

The changes implemented in the original TLBO cause the different stages to be more complex and add a higher computational cost to the algorithm. As a consequence, the iterations of the DTLBO are significantly more expensive than those of the original TLBO, and therefore the computation time is penalized. With the aim of minimizing the impact of these modifications and the extra cost of computing on the performance of the algorithm, a parallel implementation of the algorithm was developed using a CUDA architecture in a GPU environment. Although parallelization of an algorithm using CUDA is not always the best solution when applying parallelization techniques, the specific features of DTLBO can be exploited to provide a remarkable improvement in performance when compared to sequential solutions using this parallel architecture. Previous research has attempted improvement of metaheuristic methods applied to graph search problems by means of parallel implementations [36].

The initial and fundamental step when proceeding with the algorithm parallelization consists in creating an adequate design of the memory structure and the execution flow to minimize the global thread blockages. If GPU memory is mismanaged, the impact in execution time can be detrimental because transference operations between the different memory levels within the GPU imply a high latency. Therefore, these transferences should be minimized.

#### *5.1. Design of the Memory Organization*

The first task that must be carried out to parallelize the algorithm is to conceive an adequate design of the memory structure and the execution flow to minimize the blockage of the different threads executing in parallel within the GPU. Thus, the different GPU memory levels are organized to manage different kinds of data (thread, subpopulation, and global levels) as shown in Figure 5.

**Figure 5.** Organization of the Compute Unified Device Architecture (CUDA) memory.

**Global memory**. This memory stores two main data blocks: an array that contains the required points for generating the individuals of the populations in each block; and an array that contains the whole set of pre-calculated distances between the different points. Global memory also stores the best individual of the whole population and its evaluation (global Teacher), which are globally calculated in each iteration to be used by every individual within the population. Much of the data stored in the global memory will be only read but never modified, and will be used to avoid computing the total distance travelled when evaluating each individual. Only the best evaluation (global Teacher) must be modified if required each iteration.

**Block memory**. This memory stores the data shared by a subpopulation. The memory is organized in a matrix; each row corresponds to an individual, and each column stores the index of an urban node. An extra column is in charge of storing the individual's solution to avoid repeating the evaluation.

**Thread memory**. This consists of a private, local memory to each thread which is not shared with the remainder of individuals. The variables stored in this memory are used for the calculations needed at each stage of the DTLBO, and are updated during each of the iterations.

#### *5.2. Execution Flow*

An execution flow was devised to minimize the blockages of the threads at the time of synchronization when running the algorithm (*syncthreads*). The TLBO consists of a series of phases linked in such a way that they make it necessary to synchronize the threads to obtain common data for the whole population, such as the mean individual and the best individual (Teacher). In addition, to obtain this information, the reduction technique is utilized to achieve a minimal number of iterations needed to obtain the aforementioned values from the population. To build a parallel execution of the algorithm, each thread corresponds to an individual of the population. The first thread (with index 0) of each subpopulation is responsible for carrying out the operations in the local memory, and the first thread of the first population is responsible for carrying out the necessary operations in the global memory. This is represented in Figure 6. In this manner, conflicts and delays in the access to the different memory levels are avoided.

**Figure 6.** Subpopulation organization and memory accesses.

#### **6. Experimentation**

Experimentation was performed to compare the execution time when solving four TSP problems from the TSPLIB library [37]. A sequential implementation and a manycore GPU parallel implementation based on CUDA (9.2 version) were compared. The system used for performing the experiments was equipped with a Pentium i7 3.2 GHz and an NVIDIA GeForce 1060p graphics accelerator, 6 GB GPU RAM, and 32 GB DDR4 RAM. Different scopes were devised for each of the problems by varying the population sizes (64 and 128 individuals) and the number of iterations performed (1000, 5000, and 10,000 iterations). The cities from TSPLIB that were considered for the experiments were Berlin52, Att48, Eil76, and Ch130, with 52, 48, 76, and 130 urban nodes, respectively. They represent real scenarios taken from cities in Europe, USA, and China.

Figures 7–14 depict the results with respect to the CPU and GPU mean time in milliseconds (ms) of 10 blocks of 1000 runs, each with a PC clean reset. Results show that the DTLBO GPU parallel version improves performance up to 6× when dealing with a high number of individuals and iterations. DTLBO reached the optimal path in Att48 (distance = 33,523); in the case of Berlin52, the difference obtained with regard to the optimal was only 2 (7544 versus 7542); in the case of the problems with a higher number of urban nodes, DTBLO obtained 6336 in Ch130 (optimal = 6110), and 552.63 in Eil76 (optimal = 538). The paths obtained for these four real urban scenarios are shown in Figures 15–18.

**Figure 7.** CPU and graphics processing unit (GPU) times (in ms) from Berlin130 with respect to different populations (64 and 128) and iterations (1000, 5000, and 10,000).

**Figure 8.** Speedup when comparing CPU and GPU times from Berlin130 with respect to different populations (64 and 128) and iterations (1000, 5000, and 10,000).

**Figure 9.** CPU and GPU times (in ms) for Att48 with respect to different populations (64 and 128) and iterations (1000, 5000, and 10,000).

**Figure 10.** Speedup when comparing CPU and GPU times for Att48 with respect to different populations (64 and 128) and iterations (1000, 5000, and 10,000).

**Figure 12.** Speedup when comparing CPU and GPU times for Eil76 with respect to different populations (64 and 128) and iterations (1000, 5000, and 10,000).

**Figure 13.** CPU and GPU times (in ms) for Ch130 with respect to different populations (64 and 128) and iterations (1000, 5000, and 10,000).

**Figure 14.** Speedup when comparing CPU and GPU times for Ch130 with respect to different populations (64 and 128) and iterations (1000, 5000, and 10,000).

**Figure 15.** DTLBO solution for Berlin52.

**Figure 16.** DTLBO solution for Att48.

**Figure 17.** DTLBO solution for Eil76.

**Figure 18.** DTLBO solution for Ch130.

The GPU's features allow it to process a high number of threads per block. As a consequence, it was noted that the increase in the population does not have as much time penalty as the CPU, so the GPU times are only affected by the number of iterations. Moreover, the GPU block architecture is highly suitable for subpopulation algorithms, due to the fact that they can be placed in different blocks of the GPU and consequently run in parallel if the number of subpopulations allows it. It is possible for different problems to be executed on the same GPU, achieving a maximal utilization of the GPU resources and a minimal response time. All of the optimal solutions were obtained with populations of 128 individuals. With populations of 64 individuals, it was only possible to generate one of the optimal solutions (the one of the smallest problem).

It was established that population sizes of 64 and 128 are capable of computing all scenarios without memory problems: in the case of 64 individuals, to have a population size close to the number of variables of some problems; and in the case of 128 individuals, to have a population size above almost all of the problems. The maximum of 128 is also related to the limits of memory per block. The data stored at block level to be shared among the whole population for learning, and the comparisons between individuals, limit the population size. Although there are no thread-level memory conflicts, the information needed to make certain calculations, especially for the calculation of valid solutions and the choice of learners, makes it possible for memory conflicts to arise in some of the scenarios.

#### **7. Conclusions**

In this research, a parallel implementation of the discrete Teaching Learning Based Optimization algorithm (DTLBO) using a manycore GPU environment to improve the algorithm performance was proposed and applied to provide optimal or suboptimal solutions to the traveling salesman problem. Previous research has found this algorithm to be an excellent option for solving the TSP, but its biggest drawback is its complexity and computational cost when compared to other metaheuristic optimization algorithms. The parallel implementation using a manycore GPU proposed and developed in this research work substantially improved the algorithm performance, achieving important speedups with respect to a sequential implementation. When a high number of individuals and iterations are considered, the speedup is high as 6×. The results show that the algorithm is adequate to be applied to problems in which the execution time is one of the determining factors, especially when there are numerous urban nodes in the route that must be traveled in an optimal way.

To summarize, the original DTLBO method was modified by including a greedy strategy when searching for a first initial best individual instead of a purely random generation. When creating new individuals from crossovers, four different ways were considered, with one of them randomly selected each time. Indeed, because the method is time-consuming, another significant contribution of the work consists in accelerating the algorithm by means of a parallel implementation supported by CUDA architecture. The parallelization of this kind of algorithm is not a trivial task and, in fact, the results obtained when parallelizing an algorithm are not always better than those obtained with a sequential implementation. The parallel implementation of DTLBO proposed in this research work achieves satisfactory speedup results.

The developed implementation could be further enhanced in several parts by focusing on improvements in thread block management to optimize the use of GPU resources.

Moreover, although in this research work parallel implementations of the algorithm on CUDA were performed on a desktop computer, these algorithms can be executed in embedded systems inside cars, because some cars already exist with NVIDIA chipsets and support for CUDA, that are used for image recognition within smart driving. With these systems, smart driving could be improved by performing route enhancement using checkpoints within the GPU processing without penalizing the CPU. This is an important issue due to the fact that, in embedded systems in cars, the CPU is a highly demanded resource. Thus, it can be troublesome to run intensive processes on the CPU because the CPU must manage other subsystems in the vehicle. Another line of future research is the adoption of a multi-objective strategy aimed at improving travel time, which would include factors other than distance, such as traffic lights, car accidents, and road works.

**Author Contributions:** H.R.-G.: Conceptualization, Methodology, Software, Validation, Investigation, writing—original draft, Writing—review & editing. J.-L.S.-R.: Conceptualization, Methodology, Software, Validation, Investigation, writing—original draft, Writing—review & editing, Supervision, Project administration. A.J.-M.: Validation, Investigation, writing—original draft, Writing—review & editing, Supervision, Project administration, Funding acquisition. H.M.-G.: Validation, Investigation, writing—original draft, Writing—review & editing, Supervision, Funding acquisition. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the Spanish Ministry of Science, Innovation and Universities and the Research State Agency under Grant RTI2018-098156-B-C54 co-financed by FEDER funds, and by the Spanish Ministry of Economy and Competitiveness under Grant TIN2017-89266-R, co-financed by FEDER funds.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA Systems**

**Mei-Ling Chiang \* and Wei-Lun Su**

Department of Information Management, National Chi Nan University, Puli 54516, Taiwan; s94213034@gmail.com

**\*** Correspondence: joanna@mail.ncnu.edu.tw

**Abstract:** NUMA multi-core systems divide system resources into several nodes. When an imbalance in the load between cores occurs, the kernel scheduler's load balancing mechanism then migrates threads between cores or across NUMA nodes. Remote memory access is required for a thread to access memory on the previous node, which degrades performance. Threads to be migrated must be selected effectively and efficiently since the related operations run in the critical path of the kernel scheduler. This study focuses on improving inter-node load balancing for multithreaded applications. We propose a thread-aware selection policy that considers the distribution of threads on nodes for each thread group while migrating one thread for inter-node load balancing. The thread is selected for which its thread group has the least exclusive thread distribution, and thread members are distributed more evenly on nodes. This has less influence on data mapping and thread mapping for the thread group. We further devise several enhancements to eliminate superfluous evaluations for multithreaded processes, so the selection procedure is more efficient. The experimental results for the commonly used PARSEC 3.0 benchmark suite show that the modified Linux kernel with the proposed selection policy increases performance by 10.7% compared with the unmodified Linux kernel.

**Keywords:** NUMA; Linux kernel; multithreaded; load balancing; remote memory access

#### **1. Introduction**

Multi-core systems allow parallel computing and have a higher throughput. To effectively utilize the performance of multi-cores, applications are coded as multithreaded. In Linux, the kernel maintains one runqueue for each core. When a process or thread is ready to run, it is put into the runqueue and waits to be run on the corresponding core. The Linux kernel [1] maintains a data structure, struct task\_struct, which records attributes and runtime information for each schedulable entity. Each schedulable entity in the Linux kernel is called a task. When several tasks with different run times are run simultaneously, the load between cores can be imbalanced, so performance is decreased. The kernel scheduler's load balancing mechanism then migrates tasks from the overloaded core's runqueue to the runqueue of a core that is not so heavily loaded.

Non-Uniform Memory Access (NUMA) [2] systems divide system resources such as processors, caches, and RAM into several nodes. It takes longer for one core to access the memory on different NUMA nodes than on the local node. This costly memory access is called remote memory access. For a task running on a NUMA system, the memory pages allocated to it may be scattered on different nodes. When a task accesses memory pages on nodes other than those on which it runs, remote memory access occurs. For Linux-based NUMA systems, the load balancing mechanism can migrate tasks to another node, so costly remote memory access is necessary after the migration. The benefit of load balancing is reduced by the need for remote memory access after task migration. A prior study [3] showed that reducing remote memory access is critical in designing and implementing an operating system on NUMA systems.

**Citation:** Chiang, M.-L.; Su, W.-L. Thread-Aware Mechanism to Enhance Inter-Node Load Balancing for Multithreaded Applications on NUMA Systems. *Appl. Sci.* **2021**, *11*, 6486. https://doi.org/10.3390/ app11146486

Academic Editor: Juan-Carlos Cano

Received: 8 June 2021 Accepted: 12 July 2021 Published: 14 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

In order to maintain a balanced load and reduce remote memory access, the kernellevel Memory-aware Load Balancing (kMLB) [4] mechanism was proposed to allow better inter-node load balancing for the Linux-based NUMA systems. The unmodified Linux kernel migrates the first movable task that can be run on the target core. The task may involve more remote memory access when it is migrated between nodes. The kMLB mechanism modifies the Linux kernel to track each task's number of memory pages on each node. This memory usage information is then used to select the task that is most suited to inter-node migration. Several task selection policies, such as Most Benefit (MB) [4] and Best Cost-Effectiveness [4], have been proposed and used different metrics. The selected tasks require less remote memory access after migration, and system performance successfully improves. Differently, Chen et al. [5] proposed a machine learning-based resource-aware load balancer in the Linux kernel to make migration decisions. The extra runtime overhead deducts the performance gain because scheduling operation is in the critical path of kernel operations.

A multithreaded process can create threads as needed during its execution. In the Linux kernel, these threads form one thread group and share memory space. On NUMA systems, threads of one thread group can be scheduled by the kernel to run on different nodes to balance the load, so the same memory pages can be accessed by threads that run on different nodes. Accessing one memory page involves local access for some threads and remote access for other threads. When a thread is migrated across nodes, remote memory access and cache misses increase as well, so it is difficult to determine the cost of memory access.

This study first analyzes multithreaded applications and their memory access in Linux and then proposes a new task selection policy, which is named Exclusivity (Excl) for multithreaded applications. This policy determines whether a task is suitable for inter-node migration using the exclusivity of the thread distribution on NUMA nodes in its thread group. The task for which the thread group is least exclusive is selected, which has a lesser effect on data mapping and thread mapping for its thread group.

Although selecting a suitable task for inter-node migration can reduce remote memory access, the selection procedure must evaluate all tasks in the runqueue, which involves a processing overhead. Since the selection is in the critical path for the kernel scheduler, the cost of more operations outweighs any benefit. Therefore, we further improve the procedure for selecting tasks by using thread group information to eliminate superfluous evaluations. Only a subset of movable tasks in the runqueue is evaluated, so the selection procedure is more efficient.

The contribution of this study is as follows. First, multithreaded applications and their memory access in Linux are analyzed. The analysis indicates that the existing memoryaware MB policy is still effective for multithreaded applications. However, it requires the kMLB mechanism to track per-task memory usage on per NUMA node. Thus, its implementation needs to modify many kernel operations and data structures that are affected. Second, the thread-aware Exclusivity policy is proposed, which is a relatively lightweight task selection policy since it does not need the kMLB mechanism. Instead, it uses the exclusivity of the thread distribution on nodes in the thread group to determine the target thread for inter-node migration. Third, several methods to enhance selecting tasks for inter-node migration for multithreaded applications are proposed.

Finally, the proposed Exclusivity policy and enhancement methods are practically implemented in the Linux kernel. Extensive experiments using the PARSEC 3.0 [6] benchmark suite run on the modified Linux kernel with various task selection policies. Compared with the unmodified Linux kernel, the results show that when the task selection procedure is enhanced, the Most Benefit Plus (MB+) policy, which requires the kMLB mechanism, increases performance by 11.1%. The proposed Exclusivity policy increases performance by 10.7%. This policy is competitive and does not require the kMLB mechanism. Besides, it is more easily adapted to a newer Linux kernel.

The remainder of this paper is organized as follows. Section 2 introduces the technological background and related work. Section 3 presents the improvements for inter-node task migration for multithreaded applications and the new task selection policy. Section 4 details the experimental results, and Section 5 concludes.

#### **2. Technological Background and Related Work**

Though multi-core systems have high throughput, the performance increase depends on the placement of tasks and their data on nodes during runtime. Task placement affects contention between resources for the cache and cores, and data placement affects the memory access cost for a task. To utilize system resources more efficiently and increase performance, many studies [7–15] design specific placements of tasks and data to decrease resource contention, balance the loads in the cores, and reduce access to remote memory. The solutions focus on improving the performance of Symmetric Multi-processing (SMP) and Non-Uniform Memory Access (NUMA) [2] systems.

However, the rises of multithreaded applications cause finding specific task placement or data placement to increase performance more complicated. Since threads in one multithreaded application commonly share memory address space, and if they run on cores that do not share or share fewer cache resources, tasks are placed less efficiently because of more cache misses. In terms of data placement, there are different memory access latencies for cores access memory on different nodes. It is more challenging to determine where to allocate the required memory for the requesting task on NUMA systems.

This section first describes two types of multi-core systems and then reviews related work in Section 2.2. Section 2.3 introduces the kernel-based Memory-aware Load Balancing (kMLB) [4] mechanism and task selection policies proposed for improving inter-node migration.

#### *2.1. Multi-Core Systems*

A Symmetric Multi-processing (SMP) system is a multi-core system in which cores can share different levels of cache and the cost to access any location in the main memory is the same for all cores, as shown in Figure 1a. For a Non-Uniform Memory Access (NUMA) [1] system, as shown in Figure 1b, system resources such as cores, RAM, and memory controllers are divided into several nodes, and interconnection links connect these nodes. Cores can access the memory located on the same node with it, which is called local memory access. Accessing memory on other nodes via interconnection links is called remote memory access. This requires a longer time. The NUMA factor is the ratio between the remote memory access latency and the local memory access latency.

**Figure 1.** The load balancing mechanism affects SMP and NUMA systems differently. (**a**) SMP system; (**b**) NUMA system.

Though a NUMA system is more scalable than an SMP system and more memory accesses can occur simultaneously, operating system design must reduce costly remote memory access. Nevertheless, for load balancing, modern operating systems migrate tasks from an overloaded core's runqueue to the runqueue of a core with a lesser load. The load-balancing mechanism affects SMP and NUMA systems differently, as shown in Figure 1. Unlike an SMP system that features a uniform cost to access memory, migrating a task across nodes can involve costly remote memory access after migration. The benefit gained from load balancing thus is reduced by the need for remote memory access.

#### *2.2. Related Work*

In the Linux kernel, a memory page is allocated to a task when it first accesses the page. The page is allocated on the node where the requesting task is running. This is called a first-touch strategy [16]. Suppose a task is scheduled to run on another NUMA node during context switches or is migrated to another node for load balancing. In that case, the task requires remote memory access to access the page on the original node. Linux [17] offers NUMA-related system calls for NUMA-aware programs and provides commands and tools that constrain tasks to run on specific nodes. Several studies [11–14] proposed methods that use NUMA-related system calls to bind one task on specific nodes to decrease remote memory access. However, these performance improvements are reduced because a multithreaded application can create threads as needed during runtime, and the thread load is not deterministic. Binding tasks on specific nodes results in a load imbalance between nodes, and CPU utilization decreases accordingly.

Chen et al. [5] implemented a machine learning (ML)-based resource-aware load balancer in the kernel to make migration decisions. An ML model is implemented inside the kernel to monitor real-time resource usage in the system. This identifies potential hardware performance bottlenecks and then makes load balancing decisions. This ML model is trained offline in the user space and is used for online inference in the kernel to generate migration decisions. The results show no significant difference in the performance of the original kernel and the modified kernel when running benchmarks. Performance gains are negated largely because of the extra runtime overhead and the fact that scheduling operations are in the critical path of kernel operations.

Migrating memory pages to the NUMA node on which their requesting task is currently running reduces remote memory access. Mishra and Mehta [15] proposed an on-demand memory migration policy that migrates only the referenced pages to the current node where the requesting task is running. Terboven et al. [11] proposed a user-level implementation of a Next-touch approach in Linux. The *mprotect()* system call is used to change protection on a memory region, so the successive reads and writes incur segmentation faults. A signal handler that handles segmentation fault is implemented and invokes the *move\_pages()* system call to migrate the accessed page to the node on which the task is currently running. Goglin and Furmento [13,14] presented two different implementations of a Next-touch approach in Linux. The user-space implementation also uses the *mprotect()* system call and a segmentation fault signal handler to migrate the accessed page. The kernel-level implementation uses the *madvise()* system call and modifies the kernel page fault handler to migrate the accessed page. The results show that the kernel-based implementation is more efficient than the user-space implementation. However, if the memory page is not accessed again after it is migrated, the cost of accessing the remote memory page may be less than the cost of migrating it.

In the Linux kernel, all threads of one multithreaded process share memory address space and use the same page table. If these threads run on different nodes, it is hard to determine whether pages should be migrated to the node where the requesting thread runs and to track the memory access pattern for an individual thread. Therefore, it is more challenging to perform thread mappings or data mappings to reduce remote memory access. To overcome these difficulties, Diener et al. [7] modified the kernel page fault handling routines to track the memory access patterns for any threads. The present flag

of the page table entry is cleared so that whenever one memory page is accessed, a page fault occurs, even though the faulted memory page has already been in the memory. This identifies which thread on which node accesses this memory page and its access pattern. This mechanism is named kMAF [7] and uses the memory access patterns to determine which threads are more relevant and migrates them to the same node to allow better thread mappings. For data mappings, kMAF migrates one memory page to the node where the frequency of faults for this page is exclusive, so the page is mostly accessed from that node. This reduces remote memory access.

For methods using page fault to trigger migrating the faulted page to the same node as the faulting thread, the induced faults reduce performance. Existing studies also use page faults on the same page table for all threads for one multithreaded process to determine the memory access pattern for data mappings or thread mappings. Since several threads of a multithreaded process may fault one page, Gennaro et al. [8] indicated that this might result in an inaccurate estimation of the working-set of individual threads for one multithreaded process performance is decreased in terms of thread mappings. They then proposed a solution that uses the multi-view address space (MVAS). If MVAS is switched on while one multithreaded process runs, one individual page table is created for each of its threads. The memory access pattern for different threads can be separately tracked in different page tables until MVAS is switched off for the multithreaded process. MVAS does not incur extra page faults, so it can support those studies [11,13,14] that use page faults to perform page migrations to reduce remote memory access.

Lepers et al. [9] studied the placement of threads and data on NUMA nodes and the asymmetry of interconnecting links for nodes connected by links of different bandwidths. A dynamic thread and memory placement algorithm was developed in Linux to minimize contention for asymmetric interconnect links and maximize bandwidth between communicating threads. Li et al. [10] also studied the effect of hardware asymmetry. The AMPS scheduler is implemented in the Linux kernel to support asymmetric multicore architectures for which cores in the same processor have different performances. The Linux kernel is modified to track the memory usage for each thread on each node and predicts the migration overhead for a thread. Threads are migrated to faster cores when they are under-utilized. However, if the predicted migration overhead is too high or the thread is in the memory allocation phase, this thread cannot be migrated across nodes.

#### *2.3. The Kernel-Based Memory-Aware Load Balancing (kMLB) Mechanism*

Inter-node task migration is necessary for an operating system to balance the load between NUMA nodes, and migrating different tasks for inter-node load balancing incurs a different amount of remote memory access. As shown in Figure 2, migrating the first task in the source runqueue, i.e., Task#3, incurs 3-page remote memory access but migrating Task#10 incurs 5-page remote memory access. However, the unmodified Linux kernel always selects the first task that can be migrated to run on the destination core. It allows rapid selection, but the first task may not incur the least remote memory access after the inter-node migration.

A previous study [4] shows that it is better to select the task that involves less remote memory access after migration. The kernel-based Memory-aware Load Balancing (kMLB) mechanism was proposed to select suitable tasks to migrate between nodes to allow better load balancing in the Linux kernel. The memory usage for each task on each node is tracked. Depending on a task's current running node, the physical pages that it occupies on each node are identified as local or remote. The load balancer then uses this information to determine the most suitable task for inter-node migration.

In the Linux kernel, the Resident Set Size (RSS) [1] for one process is the number of page frames occupied by this process. The original RSS only tracks the total number of page frames that are occupied by each process. The kMLB mechanism modifies the kernel operations to track the RSS counters on each node for each process, including dynamic

memory allocation and releases, demand paging, swapping, system calls, and inter-node page migration.

**Figure 2.** Migrating different tasks incurs a different degree of remote memory access after migration.

The following details task selection policies that are used with the kMLB mechanism. For the example in Figure 2, Table 1 shows the selected task for each policy according to the specific metric used.


**Table 1.** Selecting a task using different selection policies.


$$\text{MB}\_p = \text{RSS}\_p(\text{dest\\_node}) - \text{RSS}\_p(src\\_node)$$

• Best Cost-Effectiveness (BCE) Policy selects the task in the source core's runqueue for which inter-node migration is the most cost-effective. Based on the MB policy, the BCE policy also considers the maximum cost of page migration, which is the amount of memory occupied by one task on the source node. The selected task is the one that can reduce the maximum remote memory access relative to the maximum migration cost when it is migrated. The following metric is used to select the target task with the maximum value:

$$\text{BCE}\_{\mathcal{V}} = (\text{RSS}\_{\mathcal{V}}(\text{dest\\_node}) - \text{RSS}\_{\mathcal{V}}(\text{src\\_node})) / \text{RSS}\_{\mathcal{V}}(\text{src\\_node})$$

#### **3. Improved Inter-Node Load Balancing for Multithreaded Applications**

The kMLB mechanism [4] tracks the number of physical pages per node occupied by each task. For each movable task in the overloaded core's runqueue, the modified OS scheduler uses this information to calculate the metric to determine the most suitable task for inter-node migration. However, the kernel scheduler must evaluate each task in the runqueue to identify a target task to be migrated. The source (i.e., the busiest) and the destination (i.e., the idlest) runqueues must be locked, so the target task must be identified efficiently because it runs in the critical path of the kernel scheduler. Besides, the threads of one multithreaded application share memory pages and may be distributed on different nodes. When threads running on different nodes access their shared memory pages on the remote node, cache misses and remote memory access slow access.

This study improves inter-node task migration for multithreaded applications in two respects. This section first introduces multithreaded applications and their memory access in Linux. Section 3.2 describes the improvements in selecting tasks for migration between nodes for multithreaded applications. Section 3.3 presents the proposed threadaware task selection policy for inter-node migration for multithreaded applications.

#### *3.1. Multithreaded Applications and Their Memory Access in Linux*

During the execution of a multithreaded application, threads are created as needed. In Linux, these threads form one thread group. The first thread in a multithreaded process is the thread group leader, and other threads are the members of this thread group. Each thread is regarded as one schedulable entity in the Linux kernel, so it is one task. Threads of the same thread group share the same memory address space and page table.

When a task is created, the OS scheduler dispatches it to the core with the least load to maintain load balance within multi-core systems. On NUMA systems, threads of the same thread group may be distributed on different nodes, and their memory pages can also be allocated on several nodes, as shown in Figure 3. Therefore, memory pages are local memory pages for some threads and remote memory pages for others. Access to Data0 is local access for threads T0, T1, and T7 and remote access for threads T2, T3, T4, T5, and T6. The difference in the total memory access cost for a thread group when one of its threads is migrated across nodes must be determined.

As depicted in Figure 4, threads within one thread group can be scheduled to run on different nodes, and their memory spaces can be on different nodes. A thread group's total memory access cost sums up the local memory access costs and the remote memory access costs for threads in this thread group. For a NUMA system with *n* nodes for which the memory access latency from a remote node to the local node is constant, the number of threads within a specific thread group on the NUMA node *i* is denoted as *Ni*. The number of memory pages allocated to this thread group on the NUMA node *i* is denoted as *Ri*.

Regardless of the locality of memory references, the latency for all threads in the thread group to access the entire memory allocated to the thread group, which is the estimated total memory access cost, is shown in Equation (1). The local memory access cost is shown in Equation (2). Equation (3) shows the remote memory access cost, in which f is the NUMA factor that represents the ratio between the remote memory access latency and the local memory access latency.

**Figure 4.** Migrating one thread of a specific thread group across nodes.

Total memory access cost (TMA) = Local memory access cost (LMA) + Cache miss cost + Remote memory access cost (RMA) (1)

$$\text{Local memory access cost (LMA)} = \sum\_{i=0}^{n-1} N\_i \ast R\_i \tag{2}$$

$$\text{Remote memory access cost (RMA)} = \left(\sum\_{i=0}^{n-1} \sum\_{\forall \ j \neq i} N\_i \ast R\_j\right) \ast f \tag{3}$$

Therefore, for the case in Figure 4, the estimated total memory access cost for this thread group is the value that is shown in Equation (4):

$$\begin{aligned} \text{TMA} &= \sum\_{i=0}^{3} N\_i \ast R\_i + \text{Cache miss cost} + (\sum\_{i=0}^{3} \sum\_{j \neq i} N\_i \ast R\_j) \ast f \\ &= N\_0 R\_0 + N\_1 R\_1 + N\_2 R\_2 + N\_3 R\_3 + \text{Cache miss cost} + \\ (N\_0 R\_1 + N\_0 R\_2 + N\_0 R\_3 + N\_1 R\_0 + N\_1 R\_2 + N\_1 R\_3 + N\_2 R\_0 + N\_2 R\_1 + N\_2 R\_3 + N\_3 R\_0 + N\_3 R\_1 + N\_3 R\_2) \ast f \end{aligned} \tag{4}$$

Because only the values of *N*<sup>0</sup> and *N*<sup>2</sup> change, and the others remain the same, the difference between the total memory access cost after a thread is migrated from node 0 to node 2 is simplified and calculated using Equation (5):

Difference = TMA (after the migration of one thread from node 0 to node 2) − TMA (before the migration) = (*R*<sup>2</sup> <sup>−</sup> *<sup>R</sup>*0) <sup>∗</sup> (1 <sup>−</sup> *<sup>f</sup>*) (5)

> That is, if one thread is migrated, the difference between the total memory access cost after the migration is calculated using Equation (6), where *RD* is the RSS value in the destination node, *RS* is the RSS value in the source node, and *f* is the NUMA factor:

$$\text{Difference} = (R\_D - R\_S) \ast (1 - f) \tag{6}$$

Regarding inter-node task migration, the RSS values for a thread group on the source and the destination nodes have the most significant effect on the total memory access cost. Therefore, Most Benefit (MB) [4], which uses the same metric to select the most beneficial task is also appropriate for multithreaded applications.

#### *3.2. Enhancements for Selecting Tasks for Inter-Node Migration for Multithreaded Applications*

Selecting a suitable task for inter-node migration requires additional overhead because the selection procedure must evaluate all tasks in the runqueue in order. The evaluation cost increases as the number of tasks in the runqueue increases. For multithreaded applications, some evaluations are superfluous and can be eliminated because some tasks are less suitable than the candidate task. In this study, the thread group leaders are not migrated. The thread group leader's current node is determined when its thread group members are evaluated, and only one thread member per thread group is evaluated. Some tasks for the evaluation are eliminated if they match one of these aspects. Therefore, only the subset of movable tasks in the runqueue is evaluated, and the procedure for selecting tasks is more efficient. Figure 5 shows the flow for the improvements, and these methods are explained in the following subsections.

#### 3.2.1. Eliminating the Thread Group Leader for Migration

The thread group leader is not selected because the Linux kernel uses the first-touch method [16] for physical memory allocation. A physical memory page is allocated the first time a thread accesses it and on the same node as the requesting thread. We observe the memory consumption for multithreaded applications, the first thread that touches the memory page is usually the thread group leader. The PARSEC 3.0 [6] benchmark suite contains 30 multithreaded applications, including 13 programs from PARSEC 2.0, 14 programs from the SPLASH benchmark suite, and three network programs. Table 2 shows these benchmark programs [4,6].

Each benchmark program's memory consumption was measured during its execution on the AMD server [18]. In Linux, the *free* command is used to obtain the current status for memory usage. A script that executes the free command every 0.1 s is used to record the memory usage footprint for the entire system. The increased memory usage during the benchmark program's execution is then attributed to the benchmark program. Each benchmark program is run with different threads, ranging from 1, 2, 4, 8, 16, and 32.

The results show that memory consumption is independent of the number of created threads for some benchmark programs. Other benchmark programs consume more memory as the number of created threads increases. Figure 6a shows the former situation for the benchmark program parsec.canneal. For this type of benchmark program, the memory pages of a multithreaded process are first touched by the first thread. For the benchmark program splash2x.fmm in Figure 6b, the individual threads first touch the memory pages of a multithreaded process. The results in Table 2 show that 22 of 30 applications allocate and initialize the allocated memory pages in the initializing thread (i.e., the thread group leader). The thread group leader for each multithreaded process is not migrated to reduce the scattering of memory pages on different nodes.


**Figure 5.** The modified flow for the efficient selection of tasks for inter-node migration.

**Figure 6.** Memory consumption for benchmark programs running with different numbers of threads. (**a**) parsec.canneal; (**b**) splash2x.fmm.

3.2.2. Evaluating Only Those Tasks with Thread Group Leaders on Other Nodes

The observation presented in Section 3.2.1 shows that most of the memory pages allocated to a multithreaded process are on the node where the thread group leader is located. Therefore, ensuring that threads are executed on the same node as the thread group leader involves more local memory access.

For the proposed design, during the task evaluation for inter-node task migration, each task in the runqueue is classified into three types, according to where its thread group leader is currently located. Different decisions are made as follows. If the thread group leader is on the destination node, migrating this task to the destination node may involve more local memory access. Therefore, it is migrated directly instead of evaluating the remaining tasks in the runqueue. Suppose the thread group leader is on the source node. In that case, it is not selected for the migration because migrating it to the destination node may involve more remote memory access after migration. Therefore, only those tasks for which the thread group leaders are currently located on other nodes are evaluated.

Figure 7 illustrates several multithreaded processes running on a 4-node NUMA system. There are four thread groups and 19 tasks. The thread group leaders are denoted as "TGLR." Because Core#3 is idle, the OS scheduler performs the load balancing mechanism to migrate tasks from the overloaded runqueue (Core#1 on Node#0). The thread group leader for Task#3 is on the source node, so Task#3 is not selected for migration. Similarly, Task#11 is the target task and is migrated immediately because its thread group leader is on the destination node. If more tasks must be migrated to achieve load balancing, the remaining tasks (Task#19, Task#6, Task#16, and Task#9) are evaluated to determine which is best suited to inter-node migration.

3.2.3. Evaluating Only the First Thread Member in a Thread Group

In the source runqueue, only the first member in a thread group is evaluated in the selection procedure. This is because threads in the same thread group share the physical memory pages; their RSS counter values are the same. Figure 8 shows that the Linux kernel represents each thread with a *task\_struct* structure, but all threads in the same thread group share the same memory management information (i.e., *mm\_struct* structure) and page table. Their RSS counter values (i.e., *mm\_rss\_stat* structure) are also shared.

For the proposed design, while several tasks in the source runqueue are examined, only the first encountered thread member of a thread group is evaluated; other thread members in the same thread group are not evaluated. Therefore, the identical metric calculations to evaluate the threads of the same thread group in the runqueue are omitted. In the example of Figure 7, for Task#19, Task#6, Task#16, and Task#9, only Task#19 and Task#6 are evaluated.

**Figure 7.** Several multithreaded processes running on a 4-node NUMA system.

**Figure 8.** Tasks in a thread group share memory management information.

#### *3.3. Task Selection Policy with Exclusivity for Multithreaded Applications*

For existing policies, except for the First-Fit policy used in the unmodified Linux kernel, memory-aware policies, such as MB [4] and BCE [4] work with the kMLB mechanism. The Linux kernel must be modified to allow the kMLB mechanism to be used to determine the per-node memory pages per task. When the invoked functions or required data, structures are changed in the newer kernel, these memory-aware policies and the kMLB mechanism also require modification.

This study proposes a new thread-aware policy named Exclusivity (Excl) that does not require the kMLB mechanism. Instead, it considers the exclusivity of thread distribution on nodes for a thread group. The more evenly the threads of a thread group are distributed to nodes, the less beneficial it is for data mapping and thread mapping. Migrating a task across nodes also changes the thread distribution for the thread group. It is better to select a task for which its thread group's threads are distributed more evenly on nodes. The Excl policy selects the task for which the thread group is least exclusive in terms of thread distribution for inter-node migration.

Figure 9 illustrates an example in which 10 tasks belong to two thread groups. Most threads of one thread group are distributed on Node #0, but threads of the other group are evenly distributed on nodes. Migrating Task#10 is more beneficial.

**Figure 9.** Two thread groups with different levels of exclusivity.

For the proposed policy using the thread distribution of its thread group, for each movable task *p* in the runqueue, Equation (7) is used to evaluate the exclusivity of this thread group. *thrd\_nri* means the number of threads on NUMA node *i*, and *n* is the number of nodes. For tasks in the source core's runqueue, the task for which the thread group has the minimum value for exclusivity is selected. For tasks with equal exclusivity, the first one to be evaluated is the target task:

$$\text{Excl}\_{\mathcal{V}} = \max\_{0 \le i \le n-1} \text{thrd\\_nr}\_i / \sum\_{i=0}^{n-1} \text{thrd\\_nr}\_i \tag{7}$$

In Figure 9, threads in hexagon have the value of exclusivity 0.8, and threads in the rectangle have the value of exclusivity 0.4. In Figure 7, Task #11 is selected because it is the least exclusive. The evaluation is listed in Table 3.

**Table 3.** The evaluation of the Exclusivity policy. (**a**) The evaluation for tasks in Figure 9; (**b**) The evaluation for tasks in Figure 7.


However, exclusivity is not the sole criterion for consideration. There is an exceptional case for a multithreaded process for which most of the memory pages are allocated on some nodes, but most threads are on other nodes. In this situation, much remote memory access is necessary. As shown in Figure 10, the thread group in the hexagon and the thread group in the rectangle both have the same value of exclusivity 0.8. However, more remote memory access is necessary for the thread group in the hexagon. Suppose the task for which the thread group is least exclusive is selected. In that case, the highly exclusive thread group may still involve much remote memory access because the thread group leader is on a node different from those where most of the thread group members are located.

**Figure 10.** A highly exclusive thread group may involve much remote memory access.

To ensure that threads and the data for a thread group are located on the same node, the proposed policy incorporates the consideration of a thread group leader in the task selection procedure. This makes members of a thread group remain on the same node as the thread group leader. Besides, as described in Section 3.2, most memory pages for a multithreaded process may be allocated by the thread group leader.

Each task in the runqueue is classified into three types according to the node where the thread group leader is currently located. If the thread group leader for a task is on the destination node, this task is not evaluated and is migrated immediately. A task for which its thread group leader is on the source node is not selected for migration. For tasks for which the thread group leader is on other nodes, Equation (7) is used to select the target task for inter-node migration.

#### **4. Performance Evaluation**

To measure the performance improvement due to the use of an enhanced inter-node load balancing procedure and the proposed policy for multithreaded applications, the benchmark suite PARSEC 3.0 [3] is used to test systems using different task selection policies and running benchmarks with various numbers of threads. The experiments record the elapsed running time for each test case, several performance counter events, and the elapsed running time for each benchmark program. The results are used to determine the reasons for an increase or decrease in performance.

Section 4.1 details the experimental environment, Section 4.2 describes the experimental design, and Section 4.3 presents the experimental results. Summary and discussions for experimental results are also provided.

#### *4.1. Experimental Environment*

The experiments are performed on the NUMA system, Supermicro A+ 4042G-72RF4 [18]. The software and hardware specifications are listed in Table 4. This is a 4-node NUMA system with a constant NUMA factor, and each node is installed with one AMD Opteron 6320 processor [19]. This processor has eight cores, so there is a total of 32 cores in the system. The kMLB [4] mechanism and task selection policies are implemented in the Linux kernel. The Linux *numactl* tool on the system shows that 1.6 times more access is required for remote memory than for local memory.


**Table 4.** Software and hardware specifications.

The PARSEC 3.0 [6] is a multithreaded benchmark suite that provides a convenient interface for building and running each benchmark program, as shown in Table 2 of Section 3.2.1. The multithreaded configuration "gcc-pthreads" is used to construct most benchmark programs, but the configuration "gcc-openmp" is used to construct the benchmark program parsec.freqmine. The input set "native" is used for performance analysis on real machines to run benchmark programs. The interface allows each benchmark program to be run with the specified number of threads.

#### *4.2. Experimental Design*

This study focuses on reducing remote memory access by improving inter-node load balancing. A sufficient number of benchmark programs must be run simultaneously on the experimental system, such that the kernel scheduler migrates tasks between nodes to balance the load when the nodes have an imbalanced load. 29 benchmark programs with a specific number of threads are run simultaneously for each test case. The numbers of threads range from 1, 2, 4, 8, 16, and 32. The elapsed running time for each test case and the elapsed running time for each benchmark program are measured using the performance counter statistics. For each test case, eight to ten runs are performed.

For each run, the system is rebooted to prevent buffer caching. During each run, the performance counter events in Table 5 are also recorded for each benchmark program. These are used to determine the cause of any change in performance: Instructions Per Cycle (IPC), Last Level Cache (LLC), and Miss Per Kilo Instructions (MPKI) to estimate runtime patterns. Other performance data is obtained from *numastat* command-line utility and *vmstat* in the *proc* file system.


**Table 5.** Collected performance counter events.

This study also enhances existing task selection policies as described in Section 2.3 for multithreaded applications, so the experiments use different task selection policies, as shown in Table 6. The First-Fit policy is used in the unmodified Linux kernel. As presented in Section 3.1, the MB [4] policy considers the memory occupied by one task

on the destination and source nodes to select a target task for inter-node migration. This policy is suited to multithreaded applications. However, additional overhead is incurred because the policy relies on the information provided by the kMLB mechanism [4].


**Table 6.** The experimental system running different task selection policies.

To allow faster selection of tasks, we also enhance the MB policy to be MB+. Each task in the runqueue is evaluated only when its thread group leader is currently not located on the destination or the source nodes. Therefore, only the subset of tasks in the runqueue must be evaluated. Besides, a thread group leader is not migrated across nodes, and a task is migrated directly if its thread group leader is on the destination node.

#### *4.3. Experimental Results*

4.3.1. Performance Comparison for Varying Numbers of Threads

The benchmark programs were run with varying numbers of threads, ranging from 1, 2, 4, 8, 16, and 32. The average elapsed time, the standard deviation, and the ratio of these times to an unmodified Linux kernel are calculated for different task selection policies and the specific number of threads. Figure 11 shows the results.

**Figure 11.** Experimental results for different numbers of threads. (**a**) Average elapsed time of each test case; (**b**) Standard deviation of each test case; (**c**) The performance increase over the unmodified Linux kernel.

Figure 11a shows that all test cases behave similarly in terms of average elapsed time for different numbers of threads. As the number of threads increases, multithreaded benchmarks run faster since a multi-core system's parallel computing capability increases performance as well. Besides, since more threads wait in the runqueue, an effective task selection policy can select a more suitable one among them for migration. The proposed task selection policies allow more efficient inter-node task migration for the load balancing mechanism on the experimental NUMA system. These perform better than the First-Fit policy that is used in the unmodified Linux kernel.

However, the parallel computing capability of a multi-core system is limited, the elapsed time measured depends on the number of cores in the target system. Because the experimental system has 32 cores, if there are too many threads for a multithreaded benchmark, contention for cores and memory slows down the performance. On the contrary, if the number of threads in a multithreaded benchmark is much smaller, multicore is not fully utilized. Besides, few or no threads wait in runqueue, then task selection policies are not triggered or used effectively. Therefore, the increase in performance is not so great when the numbers of threads are 1, 2, and 32, as shown in Figure 11c.

4.3.2. The Effect of Enhancing the Task Selection Procedure

The procedure for selecting tasks runs in the critical path of the kernel scheduler, so the target task for the migration must be identified as efficiently as possible. Therefore, this study selects the target task more quickly to allow more efficient inter-node task migration for multithreaded applications. Only the task in the runqueue for which the thread group leader is currently on nodes rather than destination and source nodes is evaluated. No thread group leader is migrated across nodes.

Figure 12 shows the different improvement ratios. The Exclbase policy achieves a more significant improvement than the MB policy. This enhancement has a different effect on each because the MB policy evaluates each task in the runqueue to select the most beneficial task for migration. For the MB+ policy, though this enhancement allows faster task selection by only evaluating the subset of tasks in the runqueue, the selected target task may not result in a greater benefit than the most beneficial task. If there are a few tasks in the runqueue, the saved evaluation costs can be negated by selecting a target task that is not the most suitable.

**Figure 12.** Improvement ratio for enhancing the task selection procedure.

The Exclbase policy selects the task for which the thread group is least exclusive regarding the thread distribution on nodes. The Exclbase policy does not consider the examined task's thread group leader, so the selected task can be migrated to the node different from where its thread group leader is currently located. Moving tasks away from its thread group leader can involve more remote memory access after migration. In contrast, the Excl policy incorporates the proposed enhancements for task selection procedure into the Exclbase policy. The Excl policy evaluates the task for which the thread group leader is on other nodes and migrates the task to the node the thread group leader is on. Most memory pages are first touched by the thread group leaders, as presented in Section 3.2, so the Excl policy successfully enhances the Exclbase policy and improves the performance.

#### 4.3.3. Performance Counter Statistics

We analyze the causes of performance improvements using the measurement data obtained from the performance counter. Since 29 benchmark programs were run with varying numbers of threads, ranging from 1, 2, 4, 8, 16, and 32. They were run on the NUMA server with the Linux kernels using different task selection policies. Lots of figures are obtained from experimental results. From the measurement of the standard deviation of each test case, shown in Figure 11b, the standard deviations for test cases using 4 or 16 threads are relatively small for various task selection policies. Since there are 32 cores in the system, a sufficient number of threads must be run simultaneously on the system. The kernel scheduler then migrates threads between nodes to balance the load when the nodes' loads are imbalanced. As the number of threads increases, more threads wait in the runqueue, and an effective task selection policy can select a more suitable one for migration. Therefore, we observe the runtime patterns for these test cases that use 16 threads. Figure 13 shows the result.

**Figure 13.** Experimental results for systems using different policies. (**a**) Instructions per CPU cycle; (**b**) LLC misses per 1000 instructions; (**c**) The number of page faults; (**d**) The number of tasks that are migrated; (**e**) The number of pages that are migrated.

Figure 13a,b shows the Instructions Per Cycle (IPC) and the Last Level Cache (LLC) misses per 1000 instructions (LLC MPKI) for systems that use different task selection policies. IPC is used as a reference to evaluate the CPU utilization rate. Cores fetch instructions or data from caches for execution. If the LLC misses, the required instructions or data are then accessed from the main memory. Hence, the LLC MPKI is used as a reference to evaluate the frequency of memory access. The results show that systems using the proposed task selection policies outperform the unmodified Linux kernel, which uses the default First-Fit policy.

We also record the number of page faults, task migrations, and page migrations for each test case, and the results are respectively shown in Figure 13c,d. During the execution of tasks, page faults, task migrations, and page migrations increase the runtime overhead. Systems that use the proposed task selection policies all incur fewer of these operations and have a lower overall runtime overhead than the unmodified Linux kernel, so they perform better.

#### 4.3.4. Performance Results for Various Benchmarks

The performance improvement is measured for each benchmark program. Figure 14 shows the results for each benchmark program running on systems that use different task selection policies for the test cases with 16 threads. We observe that the benchmark programs for which the running time is longer (e.g., parsec.facesim and splash2x.raytrace) obtain more performance gains on systems that use the proposed task selection policies. The benchmark programs (e.g., splash2x.cholesky and parsec.vips) show no performance improvement because their running time is too short. Tasks requiring a longer elapsed time have more chances to be migrated and thus gain more performance improvement.

#### 4.3.5. Summary and Discussion

As the analysis presented in Section 3.1, MB is still effective for multithreaded applications. The experimental results demonstrate that MB and MB+ achieve good performance. The experimental results also demonstrate that the Exclusivity policy is competitive. Although its 10.7% performance improvement over the unmodified Linux kernel is still slightly less than the 11.1% performance improvement over the same for MB<sup>+</sup> with the kMLB mechanism.

As presented in the study [4], the MB policy works with the kMLB mechanism, and the implementation includes two major works. First, the kernel's memory management routines and exception handling routines are modified to obtain per-task memory usage on each node. Second, the kernel's inter-node load balancing procedure is modified to incorporate the task selection policy. Therefore, to support the kMLB mechanism in the Linux kernel, all kernel operations that update the values of RSS counters and related data structures are modified to track the RSS counters on each node for each process. In detail, seven types of operations are affected and modified, regarding dynamic memory allocation and releases, demand paging, copy-on-write mechanism, swapping, related system calls, and inter-node page migration. However, the latest version of the Linux kernel [20] still does not support the separate counting of memory usage on each node for each process.

In contrast, the Exclusivity policy does not require the kMLB mechanism. Instead, only the kernel's inter-node load balancing procedure is modified to incorporate the task selection policy. Thus, adapting the Exclusivity policy to a newer Linux kernel is less complicated than implementing memory-aware policies with the kMLB mechanism.

On the other hand, the proposed thread-aware mechanism aims to enhance inter-node load balancing for multithreaded applications on NUMA systems. Therefore, it has some limitations. A sufficient number of threads must be run simultaneously on the system, such that the kernel scheduler then migrates threads between nodes to balance the load when the nodes' loads are imbalanced. As the number of threads increases, more threads are likely to wait in the runqueue, and an effective task selection policy can select a more suitable one among them for migration. In contrast, the default Linux kernel always migrates the

first task in the runqueue. Therefore, as shown in the experimental results, its performance is not stable and not good since the first task may not be a good one for migration.

**Figure 14.** Experimental results for systems running each benchmark program. (**a**) The average elapsed time; (**b**) The standard deviation in the running time; (**c**) The speedup ratio.

However, the parallel computing capability of a multi-core system is limited and depends on the number of cores in the target system. If there are too many threads running, contention for cores and memory resource slows down the performance. On the contrary, if there are too few or no threads wait in runqueue, task selection policies are not triggered

or used effectively. The experimental results show that the increase in performance is not significant when the numbers of threads are few or too many.

Regarding the power conservation issue, most modern CPUs support many frequencies. The higher the clock frequency, the more energy is consumed over a unit of time. The longer a thread is running, the more energy is consumed as well.

The Linux kernel supports CPU performance scaling through the CPUFreq (CPU Frequency scaling) subsystem [21]. In our experiments, the Linux kernel uses the default setting of CPU frequency governor "ondemand" [21], which sets the CPU frequency depending on the current usage for all test cases. Since many benchmark programs with many threads are running simultaneously, the system load is high. Therefore, CPUs are set by the Linux kernel to run at the highest frequency during experiments. The experimental results show that benchmark programs run 10.7% faster on the modified Linux kernel with the proposed task selection policy than on the default Linux kernel. Under the same CPU frequency, the shorter the benchmark programs run, the less energy is consumed. The degree of energy-saving needs further experiments.

#### **5. Conclusions**

Multi-core systems feature a high throughput, but load imbalance can degrade performance. For NUMA multi-core systems, there is non-uniform memory access, so migrating tasks across nodes to achieve load balancing has a different memory access cost. Therefore, the tasks to be migrated must be selected effectively and efficiently, especially the related operations run in the critical path of the kernel scheduler.

On Linux-based NUMA systems, threads of a multithreaded application share the memory address space and can be scheduled to run on different nodes. Memory pages allocated to them can also be on different nodes. Several studies present specific mechanisms to adjust threads and memory pages on nodes to reduce remote memory access and achieve load balancing. However, strategies using page fault handling to migrate threads or memory pages induce certain overhead. Besides, the kernel scheduler's inter-node task migration can mess up the arrangement. Differently, the kernel-level kMLB [4] mechanism enhances inter-node load balancing for NUMA systems, which tracks the number of memory pages on each node occupied by each task. This memory usage information is then used by memory-aware task selection policies [4] to select the most suitable task for inter-node migration. Despite the required overhead, the kMLB mechanism with task selection policies increases the performance of NUMA systems.

This research studies the memory access for multithreaded processes and proposes a thread-aware kernel mechanism to enhance inter-node load balancing for multithreaded applications on NUMA systems. The proposed Exclusivity policy migrates the task for which its thread group is least exclusive in the thread distribution. A thread group for which tasks are distributed more evenly on different nodes has less impact after task migration. The enhanced task selection procedure does not select the thread group leader for migration to prevent memory pages for one multithreaded process from being scattered on multiple nodes. Besides, only those tasks for which their thread group leaders are on other nodes are evaluated. The proposed policy allows threads of the same thread group to remain on the same node, so performance increases.

This study shows that the Most Benefit (MB) policy is still effective for multithreaded applications, and the proposed Exclusivity policy is competitive with the MB policy. Compared with unmodified Linux, the system that uses the MB<sup>+</sup> policy with the kMLB mechanism increases performance by 11.1%. The system that uses the Exclusivity policy, which does not require the kMLB, increases performance by 10.7%. In comparison, it is less complicated to adapt the Exclusivity policy to a newer Linux kernel than to use memory-aware policies with the kMLB mechanism. Moreover, under the same CPU frequency, the shorter the programs run, the less energy is consumed. We plan to adapt our work to a newer Linux kernel and perform experiments on more NUMA systems and energy saving in the future.

**Author Contributions:** Methodology, M.-L.C. and W.-L.S.; software, W.-L.S.; writing—original draft preparation, M.-L.C. and W.-L.S.; writing—review and editing, M.-L.C.; project administration, M.-L.C.; funding acquisition, M.-L.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded in part by Ministry of Science and Technology, Taiwan, grant number 106-2221-E-260-001, 107-2221-E-260-005, 108-2221-E-260-005, and 109-2221-E-260-011.

**Acknowledgments:** We would like to thank Y. L. Sam for his assistance in performing extensive experiments. Special thanks to the anonymous reviewers for their valuable comments in improving this research.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **A Multi-Strategy Improved Sparrow Search Algorithm for Solving the Node Localization Problem in Heterogeneous Wireless Sensor Networks**

**Hang Zhang 1, Jing Yang 1,2,\*, Tao Qin 1, Yuancheng Fan 3, Zetao Li <sup>1</sup> and Wei Wei 4,\***


**Abstract:** Aiming at the problems of slow convergence and low accuracy of the traditional sparrow search algorithm (SSA), a multi-strategy improved sparrow search algorithm (ISSA) was proposed. Firstly, the golden sine algorithm was introduced in the location update of producers to improve the global optimization capability of SSA. Secondly, the idea of individual optimality in the particle swarm algorithm was introduced into the position update of investigators to improve the convergence speed. At the same time, a Gaussian disturbance was introduced to the global optimal position to prevent the algorithm from falling into the local optimum. Then, the performance of the ISSA was evaluated on 23 benchmark functions, and the results indicate that the improved algorithm has better global optimization ability and faster convergence. Finally, ISSA was used for the node localization of HWSNs, and the experimental results show that the localization algorithm with ISSA has a smaller average localization error than that of the localization algorithm with other meta-heuristic algorithms.

**Keywords:** sparrow search algorithm; gold sine algorithm; Gaussian disturbance; heterogeneous wireless sensor networks; node localization

#### **1. Introduction**

Wireless sensor networks (WSNs) consist of a large number of miniature sensor nodes with low energy consumption, low price, and reliable performance [1]. They are often used in environmental monitoring, geological disaster warning, military reconnaissance, and other fields [2]. In these fields, location information is crucial, as data without geographic coordinate information are worthless [3]. Usually, a WSNs consists of hundreds or even thousands of sensor nodes, and designers cannot guarantee that all sensor nodes are of the same model. Further, the signal transmission power of the sensors generated by different sensor manufacturers will be different, which leads to the heterogeneity of the communication radius of sensor nodes. There may be various reasons for the formation of heterogeneous wireless sensor networks (HWSNs), but for localization techniques, the most important concern is the heterogeneity of the node communication radius. The localization problem of heterogeneous wireless sensor networks is similar to that of homogeneous wireless sensor networks in that the coordinates of unknown nodes are calculated by a specific localization algorithm using anchor nodes containing geographic coordinates within the network. However, unlike homogeneous wireless sensor networks, the heterogeneity of the node communication radius leads to a further increase in the localization error. Moreover, there are relatively few studies on conducting node localization of HWSNs.

**Citation:** Zhang, H.; Yang, J.; Qin, T.; Fan, Y.; Li, Z.; Wei, W. A Multi-Strategy Improved Sparrow Search Algorithm for Solving the Node Localization Problem in Heterogeneous Wireless Sensor Networks. *Appl. Sci.* **2022**, *12*, 5080. https://doi.org/10.3390/ app12105080

Academic Editor: Peng-Yeng Yin

Received: 12 April 2022 Accepted: 16 May 2022 Published: 18 May 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Currently, researchers have proposed many localization algorithms. With the exception of the centroid localization algorithm, most localization algorithms can be divided into two stages: distance estimation and coordinate calculation. In the distance estimation phase, researchers can use the signal propagation time, the attenuation value of signal strength from the sending node to the receiving node, or the average hop distance and hop count between sensor nodes to calculate the distance between the unknown node and each anchor node [4]. In the coordinate calculation phase, the most commonly used method is the least-squares method (LS).

In recent years, meta-heuristic algorithms, which are known for their simplicity, flexibility, and spatial search capability, have provided a new idea for node coordinate calculation in WSNs. Liu et al. [5] replaced LS with a modified particle swarm algorithm (M-PSO) for the coordinate calculation of unknown nodes; when the error in the distance estimation stage was less than 10%, the error using M-PSO was 12.613 m less than that of the localization algorithm using LS. Chai et al. [6] proposed a parallel whale optimization algorithm to replace LS in the DV-Hop algorithm, and the obtained localization error was reduced by 8.4% compared with the DV-Hop algorithm. Cui et al. [7] improved the DV-Hop algorithm, made the discrete hop values continuous, and used the differential evolution algorithm (DE) to replace LS in the coordinate calculation stage. The positioning error of the algorithm was reduced by 70% compared with the DV-Hop algorithm. Although higher localization accuracy can be obtained using the meta-heuristic algorithm compared with LS, most of these studies focus on homogeneous wireless sensor networks, and few researchers have focused on HWSNs.

So far, not many results have been achieved on node localization of HWSNs. Assaf et al. [8] proposed a new expected hop progress (EHP) localization algorithm applicable to nodes with different transmission capabilities. The distance estimated by this algorithm in the distance estimation phase is closer to the real distance. However, the requirements for the potential forwarding area of the successor node are relatively high, and the communication radius of the sensor node cannot be too large. Wu et al. [9] optimized EHP and used elliptic distance to correct the distance calculated by the forward hop progress method and then used LS to find the coordinates of the unknown node. However, the node density required by the algorithm is too large. What is more, there is still room for improving the positioning accuracy of the nodes. Bhat et al. [10] proposed a minimum area localization algorithm for HWSNs combined with the Harris Hawk optimization algorithm, but the algorithm lacks a comparison with the use of other metaheuristic algorithms, and it is difficult to highlight the advantages of using HHO. No single meta-heuristic algorithm is suitable for the engineering applications used, so it is necessary to find a suitable metaheuristic algorithm in combination with specific application scenarios.

Meta-heuristic algorithms are mostly developed around physical phenomena, biological evolution, and group intelligence [11,12]. Among the many meta-heuristic algorithms, the most commonly used is the swarm intelligence optimization algorithm, such as the Harris Hawk optimization algorithm (HHO) [13], grey wolf optimizer (GWO) [14], butterfly optimization algorithm (BOA) [15], and so on. The sparrow search algorithm (SSA) [16] is a typical swarm intelligence optimization algorithm, which was proposed by Xue et al. in 2020. It has been widely used in many engineering fields, including UAV path planning [17], price prediction [18], and wind energy prediction [19]. In fact, the sparrow search algorithms suffer from the same problems as other meta-heuristic algorithms, such as low optimization-seeking accuracy and slow convergence. Thus, SSA needs to be improved. Yang et al. [20] used a chaotic mapping strategy to initialize the population and introduced adaptive weighting strategy and t-distribution variation strategy to balance the local exploration ability and global utilization ability of the algorithm. Yuan et al. [21] introduced centroid opposition learning, learning coefficients, and mutation operators in the original SSA to prevent SSA from falling into the local optima, and applied the improved algorithm to the control of distributed maximum power point tracking. Liu et al. [22] proposed a hybrid sparrow search algorithm based on constructing similarity, which overcomes the

problem of the algorithm falling into a local optimum by introducing an improved chaotic mapping circle and t-distribution variation.

In summary, this paper compares the performance of 15 common meta-heuristic algorithms in node localization, and the comparison results are shown in Section 4. Then, SSA with better comprehensive performance is selected as the algorithm for the coordinate calculation stage, and SSA is improved with regard to its problems. Finally, the improved sparrow search algorithm (ISSA) is applied to the node localization of HWSNs. The main contributions of this paper are as follows:


The rest of this paper is organized as follows. In Section 2, the original SSA is introduced, and ISSA is presented to address the problems of slow convergence and low accuracy of SSA. In Section 3, simulation results are described and discussed. In Section 4, the application of ISSA in the node localization of HWSNs is introduced. Finally, Section 5 gives the summary of this paper and the direction of future work.

#### **2. Basic SSA and Its Improvement**

#### *2.1. Basic SSA*

SSA completes the spatial search by iteratively updating the position of each sparrow [16], and the entire population is divided into three categories: producer, scrounger, and investigator. The producers and scroungers each make up a certain proportion of the population and are dynamically updated according to the results of each iteration. However, the investigators are selected randomly from the whole population, usually at a rate of 10 to 20%. The details of the three categories are described as follows.

#### 2.1.1. Updating Producer's Location

The producers are primarily responsible for searching for food and guiding the movement of the entire population. When the warning value *R*<sup>2</sup> is less than the safety value *ST*, it indicates that no predators were discovered during the search, necessitating a broad search by the producers. When the warning value *R*<sup>2</sup> exceeds the safe value *ST*, the sparrow has encountered a predator and is obliged to guide all scroungers to a safe area. The location update equation of the producers is shown in Equation (1).

$$X\_{i,j}^{t+1} = \begin{cases} X\_{i,j}^t \cdot \exp\left(\frac{-i}{a \cdot iter\_{\text{max}}}\right) & \text{if } R\_2 < ST\\ X\_{i,j}^t + Q \cdot L & \text{if } R\_2 \ge ST \end{cases} \tag{1}$$

where *t* represents the current iteration, *i* and *j* are the individual numbers of the population and the dimensions of the solution problem, respectively, *iter*max is the maximum number of iterations. *α* ∈ (0, 1] represents a random, and *Q* denotes a random number following the normal distribution. *L* is a matrix of 1 × *d* in which each element is 1, where d represents the dimension of the solved problem. *X<sup>t</sup> <sup>i</sup>*,*<sup>j</sup>* and *<sup>X</sup>t*+<sup>1</sup> *<sup>i</sup>*,*<sup>j</sup>* are the position of the *i*-th sparrow's *j*-th dimension in the *t*-th and (*t* + 1)-th iterations, respectively. *R*<sup>2</sup> ∈ (0, 1] represents an alarm value, and *ST* ∈ [0.5, 1) represents the safety threshold.

#### 2.1.2. Updating Scrounger's Location

The remainder of the sparrow colony are scroungers whose primary function is to frequently monitor the producers. Once producers discover a source of good food, they will immediately abandon their current location in order to compete for it. Otherwise, the scroungers will go hungry and will be forced to fly to other locations in search of food. The location update equation of the scroungers is shown in Equation (2).

$$X\_{i,j}^{t+1} = \begin{cases} \ Q \cdot \exp\left(\frac{X\_{\text{worst}}^t - X\_{i,j}^t}{i^2}\right) & \text{if } i > n/2\\\ X\_p^{t+1} + \left| X\_{i,j}^t - X\_p^{t+1} \right| \cdot A^+ \cdot \text{L otherwise} \end{cases} \tag{2}$$

where *X<sup>t</sup> worst* represents the worst position in the *<sup>t</sup>*-th iteration and *<sup>X</sup>t*+<sup>1</sup> *<sup>p</sup>* denotes the optimal position of the producer in the (*t* + 1)-th iteration. A is a matrix of 1 × *d*, in which the value of each element is randomly 1 or <sup>−</sup>1, and *<sup>A</sup>*<sup>+</sup> <sup>=</sup> *<sup>A</sup>T*(*AAT*) −1 . When *i* > *n*/2, it indicates that the sparrow is starving.

#### 2.1.3. Updating Investigator's Location

The investigators are primarily in charge of colony security. When they detect danger, sparrows on the colony's periphery flee to a safe area. Those in the colony's center, on the other hand, walk at random. The location update equation of the investigators is shown in Equation (3).

$$X\_{i,j}^{t+1} = \begin{cases} X\_{best}^t + \beta \cdot \left| X\_{i,j}^t - X\_{best}^t \right| \text{ if } f\_i > f\_{\mathcal{S}}\\ X\_{i,j}^t + K \left( \frac{\left| X\_{i,j}^t - X\_{west}^t \right|}{\left( f\_i - f\_w \right) + \varepsilon} \right) \text{ if } f\_i = f\_{\mathcal{S}} \end{cases} \tag{3}$$

where *X<sup>t</sup> best* represents the global optimal position in the *t*-th iteration, *β* denotes a normally distributed random number with mean 0 and variance 1, and *k* is a random number within [−1, 1]. *fi*, *fw* and *fg* represent the fitness values of the *i*-th sparrow, the worst individual, and the best individual, respectively. *fi* > *fg* means the sparrow is at the edge of the colony, and *fi* = *fg* means the sparrow is in the middle of the colony.

According to Equation (1), the further back a producer is ranked in the fitness ranking, the more likely it is to fall into a local optimum. As a result, it is necessary to improve Equation (2) in order to improve the convergence accuracy of SSA. As shown in Equation (3), the vigilantes' location updates are random, which is detrimental to SSA's convergence speed.

#### *2.2. Improved Sparrow Algorithm*

#### 2.2.1. Introduction of Golden Sine Strategy

The horizontal coordinates in Figure 1 represent producers sorted by fitness values, while the vertical coordinates represent the coefficient of variation of each producer's position. As illustrated in Figure 1, the larger the producer's ordinal number, the smaller the coefficient of variation of the position, which increases the probability of the producer falling into the local optimum. In summary, Equation (1) does not carry out a global search.

To improve the problem of the weak global search ability of the producers, the golden sine algorithm (Gold-SA) was introduced [23]. This algorithm traverses the whole search space by the relationship between the unit circle and the sine function on the one hand, and on the other hand, it gradually moves from the boundary of each dimension to the middle by the number of golden divisions until the best position for each dimension is found. Thus, in the paper, the golden sine algorithm is used to improve the global search ability of the producers in SSA. The equation of the discoverer position update after the introduction of the golden sine algorithm strategy is shown in Equation (4).

**Figure 1.** Relationship between location variation coefficient and numbering of producers.

$$X = \begin{cases} \begin{array}{l} X^t\_{b,j} \cdot |\sin(r\_1)| + r\_2 \cdot \sin(r\_1) \cdot |x\_1 \cdot X - x\_2 \cdot X| & \text{if } R\_2 < ST\\\ X^t\_{i,j} + Q \cdot L & \text{if } R\_2 \ge ST \end{array} \tag{4}$$

where *r*<sup>1</sup> and *r*<sup>2</sup> are the random numbers within [0, 2π] and [0, π], respectively. *X<sup>t</sup> b*,*j* represents the position of the *j*-th dimension of the optimal individual in the *t*-th iteration. The initial value of *x*<sup>1</sup> is *a* · (1 − *τ*) + *b* · *τ*, and the initial value of *x*<sup>2</sup> is *a* · *τ* + *b* · (1 − *τ*), where τ is the golden mean and the initial value of *a* and *b* is −π and π, respectively. *x*1, *x*2, *a*, and *b* dynamically are updated as shown in Algorithm 1.

**Algorithm 1:** Pseudo-code for partial parameter update of Gold-SA

```
/* F represents the current fitness value, G represents the global optimal value,
    random1 and random2 represent random numbers between [0, 1] */
Input: x1←a·(1 − τ) + b·τ; x2←a·τ + b·(1 − τ); a←−π; b←π;
Output: x1, x2
1: if (F < G) then
2: b ← x2; x2 ← x1; x1 ← a·τ + b·(1 − τ);
3: else
4: a ← x1; x1 ← x2; x2 ← a·(1 − τ) + b·τ;
5: end if
6: if (x1 = x2) then
7: a ← random1; b ← random2;
8: x1 ← a·τ + b·(1 − τ); x2 ← a·(1 − τ) + b·τ;
9: end if
```
#### 2.2.2. Introduction of Individual Optimal Strategies

By comparing the fitness values of individuals to the global optimal, the sparrow population investigators contribute to the coordination of global exploration and local search in the whole space search. Further, the idea of individual optimality in the particle swarm algorithm can strengthen the convergence ability of the algorithm [24], and the investigators' position update formula after introducing the optimal individual is as in Equation (5). 

$$\begin{cases} X\_{i,j}^{t+1} = \begin{cases} X\_{\text{best}}^{t} + \beta \cdot \left| X\_{pi,j}^{t} - X\_{\text{best}}^{t} \right| \text{ if } f\_{i} > f\_{\text{\ $}}\\ X\_{pi,j}^{t} + K \left( \frac{\left| X\_{pi,j}^{t} - X\_{\text{worst}}^{t} \right|}{(f\_{pi} - f\_{w}) + \varepsilon} \right) \text{ if } f\_{i} = f\_{\text{\$ }} \end{cases} \tag{5}$$

where *X<sup>t</sup> pi*,*<sup>j</sup>* represents the *j*-th dimensional value of the *i*-th sparrow's historical optimal individual.

#### 2.2.3. Gaussian Perturbation

The sparrow algorithm, like the majority of swarm intelligence optimization algorithms, suffer from the problem of falling into a local optimum, which results in poor searching accuracy. To address this issue, it is possible to use the fitness value of the population optimal individual to determine whether the algorithm is trapped in a local optimum [25]. If the global optimal individual's fitness value is less than a threshold in two consecutive iterations, the algorithm is said to be trapped in a local optimum. At this time, a Gaussian random wandering strategy can be introduced to apply perturbation to the optimal position to help the algorithm to jump out of the local optimum. The improved way of updating the position of the investigators is shown in Equations (6)–(8).

$$
\hat{X}\_{\mathbf{f}} = \text{Gaussian}(\mathbf{G}\_{\mathbf{f}}, \sigma) \tag{6}
$$

$$\sigma = \cos\left(\pi/2 \cdot (\mathbf{t}/T)^2\right) \cdot \left(\mathbf{G}\_t - X\_t^\*\right) \tag{7}$$

$$G\_t = \begin{cases} \begin{array}{l} \hat{\mathbf{X}}\_{t\prime} f(\overline{\mathbf{X}}\_t) < f(G\_t) \\ G\_{t\prime} \text{ otherwise} \end{array} \tag{8} \\ \tag{9}$$

where *Xt* is the position of the optimal individual after applying Gaussian perturbation at the *t*-th iteration, and *X*∗ *<sup>t</sup>* is a random individual of the population.

#### **3. Experimental Results and Analysis**

HHO [13], SSA [16], PSO [24], Gold-SA [25], and artificial gorilla troops optimizer (GTO) [26] are selected as the comparison algorithms to verify the search performance of ISSA. The population size of all algorithms is set to 30, and the number of iterations is set to 500. In the ISSA proposed in this paper, the ratio of generators is 0.6, the ratio of discoverers is 0.7, and the ratio of vigilantes is 0.2. Meanwhile, Gaussian perturbation is introduced when the global optimum value is below the threshold 1.00 × <sup>10</sup>−<sup>10</sup> in two consecutive iterations. The parameters of the other algorithms are consistent with their literature. The whole experiment is divided into four parts: convergence accuracy analysis, convergence speed analysis, rank sum test, and complexity analysis. The simulation experiment described in this article was conducted on a Windows 11 64-bit operating system with an AMD Ryzen 7 5800H processor with Radeon Graphics 3.20 GHz, 16 GB of RAM, and MATLAB 2014B. All experimental results are the average values after 30 runs.

In this section, 23 test functions are selected from CEC's benchmark functions, which are commonly used for optimization testing of meta-heuristic algorithms [16,27–29]. The functions *F*1–*F*<sup>7</sup> are unimodal functions, the functions *F*8–*F*<sup>13</sup> are multimodal functions, and the functions *F*14–*F*<sup>23</sup> are fixed-dimension functions. Table 1 lists the parameters of each function, including its expression, dimension, range, and optimal value.


**Table 1.**

Benchmark

 function.


**Table 1.** *Cont.*

#### *3.1. Convergence Accuracy Analysis*

Table 2 presents the average (AVG) and standard deviation (STD) of the average localization errors for the six meta-heuristics. Among them, R in Table 2 is explained in detail in Section 3.2.


**Table 2.** Comparison of benchmark function results.


**Table 2.** *Cont.*

According to Table 2, ISSA can find the optimal value in the unimodal functions *F*1~*F*4. This indicates that the introduction of the golden sine strategy significantly enhances the global optimality finding ability of the original SSSA. For the unimodal functions *F*5~*F*7, although the optimal values are not sought, their convergence accuracy is higher than that of the remaining four compared algorithms. For the multimodal functions *F*8, although the mean value of ISSA is the same as Gold-SA, HHO, and GTO, the standard deviation of ISSA is much lower than that of Gold-SA, HHO, and GTO. Among the multimodal functions *F*9~*F*11, the convergence accuracy of ISSA is the same as that of Gold-SA, HHO, GTO, and SSA, and they can all converge to the optimal value. However, the convergence accuracy of ISSA is the best in all cases in the multimodal functions *F*<sup>12</sup> and *F*13. For the fixed-dimension functions *F*14~*F*23, a total of seven best or tied best convergence accuracies were obtained for ISSA.

The ranking results of the six meta-heuristic algorithms are shown in Figure 2. The ranking is determined primarily by the average value of each algorithm; if the average value is identical, the ranking is determined by the standard deviation.

The radar plot shown in Figure 2 has 23 polar axes, and each polar axis represents 1 test function. Starting from the center point of the radar plot, there are six circles that expand outward in sequence, and the numbers on the circles represent the ranking of the algorithms. A closed polygon can be obtained by connecting the ranking points of each algorithm on the 23 test functions. The smaller the area of the polygon, the better the convergence performance of the algorithm. According to Figure 2, it can be seen that the area enclosed by ISSA is the smallest, which indicates that the convergence accuracy of ISSA is better than that of PSO, Gold-SA, HHO, GTO, and SSA.

**Figure 2.** Radar chart of sort of algorithm.

#### *3.2. Convergence Speed Analysis*

The convergence curves for the six meta-heuristic algorithms on the 23 benchmark functions are depicted in Figure 3.

**Figure 3.** *Cont*.

**Figure 3.** *Cont*.

**Figure 3.** *Cont*.

**Figure 3.** Convergence curve of benchmark function: (**a**) *F*1: Sphere; (**b**) *F*2: Schwefel 2.22; (**c**) *F*3: Schwefel 1.2; (**d**) *F*4: Schwefel 2.21; (**e**) *F*5: Rosenbrock; (**f**) *F*6: Step; (**g**) *F*7: Quartic; (**h**) *F*8: Schwefel 2.26; (**i**) *F*9: Rastrigrin; (**j**) *F*10: Ackley; (**k**) *F*11: Griewank; (**l**) *F*12: Penalized 1; (**m**) *F*13: Penalized 2; (**n**) *F*14: Foxholes; (**o**) *F*15: Kowalik; (**p**) *F*16: Six-Hump Gamel; (**q**) *F*17: Branin; (**r**) *F*18: Goldstein-price; (**s**) *F*19: Hartmann 3-D; (**t**) *F*20: Hartmann 6-D; (**u**) *F*21: Shekel 1; (**v**) *F*22: Shekel 2; (**w**) *F*23: Shekel 3.

As shown in Figure 3, the convergence speed of ISSA in the unimodal functions *F*1~*F*<sup>7</sup> is faster than that of the remaining five compared algorithms. In particular, ISSA can converge to the optimal value within 60 times in *F*1~*F*4. This indicates that the introduction of the individual optimal idea of PSO in the position update of the investigator greatly accelerates the convergence speed of the algorithm. For the multimodal functions *F*8~*F*13, the convergence speed of ISSA is also better than the remaining five comparison algorithms. Among the fixed-dimensional functions *F*14~*F*23, except for *F*14, the convergence rate of ISSA is not weaker than that of PSO, Gold-SA, HHO, GTO, and SSA. In summary, the improved ISSA has a faster convergence speed.

#### *3.3. Wilcoxon Rank Sum Test*

García et al. [30] proposed that it is not sufficient to evaluate the performance of metaheuristic algorithms using only the average and standard deviation. Therefore, it is essential to perform statistical tests. The Wilcoxon rank sum test, a binary hypothesis, determines whether there is a significant difference between two samples. It is assumed that H0: the overall difference between the two samples is not significant, and H1: there is a significant difference between the two samples. If the significance level between two samples is less than 0.05, then the H0 hypothesis can be rejected; otherwise, the H1 hypothesis is rejected. The significance level between such algorithms is denoted by *p*.

In this section, the results of one run of each meta-heuristic algorithm are taken as 1 element in the corresponding sample, and there are 30 elements in each set of samples. Combining the sample data of ISSA and the sample data of either metaheuristic algorithm can help to find the *p* value between them. NaN implies that the two samples are identical and cannot be tested with the Wilcoxon rank sum test, which indicates that the performance of the two sets of algorithms is the same. The *p* values on each benchmark function are shown in Table 2. The + located in parentheses after *p* represents a significant difference between the two samples, and − represents no significant difference. *p* combined with the AVG corresponding to the two algorithms can effectively determine the superiority of the algorithms. If AVG of ISSA is smaller than AVG of the comparison algorithm and H1 holds, it means that ISSA performs better than the comparison algorithm. If AVG of ISSA is larger than AVG of the comparison algorithm and H1 holds, the performance of ISSA is weaker than that of the comparison algorithm. If AVG of ISSA is the same as AVG of the

comparison algorithm and H1 holds, then the two algorithms have the same performance. In the remaining cases, it can be regarded as uncertain in terms of statistical significance.

Table 3 gives the statistical results of the performance advantages and disadvantages between ISSA and the remaining five algorithms on 23 benchmark functions, respectively. As shown in Table 3, the performance of ISSA is better than that of SSA on 13 benchmark test functions, while the performance of ISSA is the same as that of SSA on the remaining 10 benchmark functions. This indicates that the proposed algorithm in this paper has achieved some success in terms of the performance of the optimization search. Compared with the GTO proposed by Abdollahzadeh et al. in 2021 [31], ISSA outperforms each other on all six benchmark functions and has the same performance on 13 benchmark functions. This indicates that ISSA still has some advantages over the recently proposed meta-heuristic algorithm.

**Table 3.** Statistical results of the performance advantages and disadvantages between ISSA and each algorithm.


a/b/c/d a, b, c denote the number of benchmark functions whose ISSA performance is better than, weaker than, or equal to the comparison algorithm, respectively. d denotes the number of benchmark functions that cannot be judged in terms of statistical significance.

#### *3.4. Time Complexity Analysis*

Let the population size of SSA and ISSA be *N*, the number of iterations be *Tmax*, and the problem dimension be *D*. From Section 2, two algorithms differ only in the way they update different species of sparrows, and they have the same algorithmic complexity in the steps of population initialization, boundary check, position update, and sort update, all of which be *o*(*T*max · *N* · *D*). Assuming that the proportions of producers, scroungers, and investigators in the sparrow population are *r*1, *r*2, and *r*3, respectively, then in SSA, the time complexity of the location update of producers is *o*(*T*max · *r*<sup>1</sup> · *N* · *D*), the time complexity of the location update of scroungers is *o*(*T*max · *r*<sup>2</sup> · *N* · *D*), and the location update of investigators is *o*(*T*max · *N* · *D*). The total time complexity of SSA is *o*(*T*max · *N* · *D*).

In ISSA, the time complexity of the position update of the producer after introducing the golden sine strategy is *<sup>o</sup>*(*T*max · *<sup>r</sup>*<sup>1</sup> · *<sup>N</sup>*2), and the position update of the investigators after introducing the particle swarm optimal idea is still *o*(*T*max · *r*<sup>3</sup> · *N*). In addition, the time complexity added by introducing Gaussian perturbation to the optimal sparrow individuals is *o*(*T*max · *D*). The total time complexity of ISSA is *o*(*T*max · *N* · *D* · (1 + *r*<sup>1</sup> · *N*)).

In summary, the time complexity of ISSA is slightly higher than that of SSA for the same number of iterations. However, the convergence speed of ISSA is fast, and the time consumption of ISSA may be reduced if no more changes in the search results are taken as the end condition. ISSA is more suitable for engineering applications that require high search accuracy and have lower requirements for computation time.

#### **4. Application of ISSA in Node Location in HWSNs**

#### *4.1. Node Localization Problem in HWSNs*

The node localization of HWSNs can be divided into two stages: distance estimation and coordinate calculation. Once the distances from each anchor point to the unknown node are obtained in the distance estimation phase, the coordinate calculation of the unknown node can be transformed into an optimization problem, as shown in Equation (9). The dimension of this optimization problem is 2, and the theoretical optimal value is 0. Therefore, the problem can be solved by meta-heuristic algorithms.

$$F(\mathbf{x}) = \sum\_{j=1}^{Na} \left| \sqrt{(\mathbf{x}\_1 - a\_j)^2 + (\mathbf{x}\_2 - b\_j)^2} - d\_j \right| \tag{9}$$

where *Na* indicates the number of anchor nodes, (*aj*, *bj*) represents the coordinates of the *j*-th unknown node, and *dj* is the distance from the unknown node to the *j*-th anchor node.

#### *4.2. Network Model*

Figure 4a shows the network connection of 50 nodes in an area of 100 × 100 m2, in which the number of anchor nodes represented by red is 20, the number of unknown nodes represented by blue is 30, and the blue solid line represents the neighboring nodes can communicate with each other. In addition, the communication radius of each node is different. The node localization of HWSNs is to find out the coordinates of unknown nodes using the information of finite anchor nodes, such as the coordinates of each anchor point and the distance from each anchor point to the unknown node. The distance from each anchor node to the unknown node is shown in Figure 4b, in which blue dashed line indicates the distance from the anchor node to the unknown node.

**Figure 4.** Heterogeneous wireless sensor network model: (**a**) network connections; (**b**) distance from all anchor nodes to unknown node 21.

#### *4.3. Localization Steps*

In this section, we replace the least-squares method of the localization algorithm with the meta-heuristic algorithm. The specific steps of the localization algorithm are as follows.

Step 1. Calculating the intersection area of the incoming neighbor

As shown in Figure 5a, when the intersection region of anchor nodes 1, 2, and 3 is used as the search space for meta-heuristic algorithms, the searching range can be narrowed, thus accelerating the convergence speed. However, because the intersection region of anchor points is an irregular graph, performing the calculation is difficult. To simplify computations, the communication region of each anchor point can be treated as a square, as illustrated in Figure 5b, and the intersection region of the incoming neighbor can then be calculated.

Step 2. Calculating the distance from the unknown node to each anchor node

In this step, the idea of the DV-Hop localization algorithm is adopted, and the product of hop count and hop distance between nodes is used as the distance from an unknown node to an anchor node. The hop distance formula is shown in Equation (10), and the distance from the unknown node to the anchor node is calculated in Equation (11).

**Figure 5.** Impact of heterogeneity of communication radius on network communication: (**a**) original communication radius; (**b**) square communication radius.

$$Hopsize\_i = \frac{\sum\_{i \neq k} \sqrt{\left(x\_i - x\_k\right)^2 + \left(y\_i - y\_k\right)^2}}{\sum\_{i \neq k} h\_{ik}} \tag{10}$$

$$d\_{\rm ij} = \textit{Hos} \textit{size}\_{\rm ij} \cdot \textit{h}\_{\rm ij} \tag{11}$$

where (*xi*, *yi*), (*xk*, *yk*) are the coordinates of anchors *I* and *k*, respectively, and *hik* represents the number of hops between anchors *I* and *k*. *Hopsizei* is the hop distance of anchor *I*, *Hopsizeij* denotes the hop distance of anchor *I* with the least number of hops from unknown node *j*, and *dij* is the distance between unknown node *j* to anchor *i*.

Step 3. Calculating coordinates of unknown node

To begin with, the fitness function is defined as in Equation (12). Then, the swarm intelligence optimization algorithm is used to find the unknown node coordinates, with the coordinates with the lowest fitness values being selected as the unknown node coordinates. If the distance value in Step 2 is guaranteed to be constant, the higher the performance of the chosen swarm intelligence optimization algorithm, the more accurate the positioning.

$$fitness\_{j} = \sum\_{i=1}^{Na} \left| \sqrt{\left(\check{\mathbf{x}}\_{j} - \mathbf{x}\_{i}\right)^{2} + \left(\check{y}\_{j} - y\_{i}\right)^{2}} - d\_{ij} \right| \tag{12}$$

where *j* represents the unknown node numbers, *Na* is the total number of anchors, and (*xj*, *yj*) represents the coordinates of unknown nodes randomly generated.

#### *4.4. Performance Evaluation*

To verify the superiority of ISSA for node localization in HWSNs, we compared it with LS and 15 other meta-heuristic algorithms, including HHO [13], GWO [14], BOA [15], SSA [16], Gold-SA [23], PSO [24], GTO [26], sine cosine algorithm (SCA) [31], DE [32], Archimedes optimization algorithm (AOA) [33], whale optimization algorithm (WOA) [34], elephant herding optimization (EHO) [35], marine predators algorithm (MPA) [36], tunicate swarm algorithm (TSA) [37], and Coot optimization algorithm (COOT) [38]. All meta-heuristic algorithms have a population size of 30, an iteration number of 50, and other parameters that are consistent with their original literature. The area of HWSNs is <sup>100</sup> × 100 m2, the number of anchor nodes is 25, and the node communication radius is a

random value within [15,29], and the normalized root mean square error (NRMSE) is used to evaluate the positioning performance as shown in Equation (13).

$$NRMSE = \frac{1}{Nu} \sum\_{j=1}^{Nu} \frac{1}{r\_j} \sqrt{(x\_j - \tilde{x}\_j)^2 + (y\_j - \tilde{y}\_j)^2} \tag{13}$$

where *Nu* represents the number of unknown nodes and (*xi*, *yi*), and (*x*%*j*, *<sup>y</sup>*%*j*) are the true and estimated coordinates of the unknown node, respectively.

The NRMSE and the time required to compute the coordinates of unknown nodes using LS or different meta-heuristic algorithms when all nodes in HWSNs have the same coordinate positions are shown in Table 4. As far as the NRMSE is concerned, the average localization error of LS is higher than that of all other metaheuristic algorithms, reaching 55.57%. This indicates that using meta-heuristic algorithms to solve the coordinates of unknown nodes can indeed reduce the localization error of the nodes. The localization error obtained using different metaheuristic algorithms also varies. The best-performing ISSA has a 13.19% reduction in the average positioning error compared with the worst performing EHO. The top four average positioning errors are ISSA, GTO, SSA, and DE in order, and their average positioning errors are 41.38%, 41.51%, 41.68%, and 41.78%, respectively. Although the NEMSE of the top four meta-heuristic algorithms is not very different, there is a big difference in the positioning time required by each of them.

**Table 4.** Positioning results of 17 algorithms.


In terms of search time, LS takes the shortest time of 0.04 s, while GTO takes the longest time of 10.3 s. The times required for ISSA, GTO, SSA, and DE localization are 1.66 s, 10.3 s, 0.95 s, and 1.47 s, respectively. It can be seen that, when solving the node coordinates of HWSNs, it is difficult to achieve both accuracy and time optimality. However, the localization accuracy is the most critical index in the node localization of HWSNs. Within a certain range, the search time requirement can be reduced appropriately. In summary, the ISSA proposed in this paper is more suitable for node localization in HWSNs.

#### *4.5. Effect of Parameter Variation on Localization Accuracy in HWSNs*

Figure 6 depicts the effect on NRMSE when the number of nodes and the ratio of anchor points are varied in a deployment area of 100 × 100 m2. The 10–30 in Figure 6a,b indicates that the communication radius of the nodes is a random value within [10, 30], and 30–30 indicates that the communication radius of the nodes is all 30 m. The ratio of

anchor points in Figure 6a is 25, and the total number of nodes in Figure 6b is 100. ISSA is used to calculate the coordinates of unknown nodes with 50 iterations, and the number of individuals in the population is 30.

**Figure 6.** Effect of parameter variation on positioning accuracy: (**a**) change in number of nodes; (**b**) change of anchor node ratio.

From Figure 6, the NRMSE decreases as the heterogeneous range of node communication radius decreases, regardless of the number of nodes and the ratio of anchor points taken. From Figure 6a, the NRMSE decreases continuously when the number of nodes increases sequentially, but the magnitude of its decrease also becomes smaller. When the number of nodes increases to 160, the average variation of NRMSE on different communication intervals is less than 1%. From Figure 6b, the NRMSE also decreases as the proportion of anchor points increases, but the magnitude of its decrease also becomes slower. When the anchor point ratio is 25%, the reduction of NRMSE tends toward 0. Therefore, the optimal number of nodes is 160, and the optimal anchor point ratio is 25% when using the localization algorithm described in this section to solve the localization problem.

#### **5. Conclusions**

In this paper, for the node localization problem in HWSNs, we compared the localization accuracy and localization time consumption of 15 common meta-heuristic algorithms and found that the comprehensive performance of SSA is the best. Thus, a multi-strategy improved sparrow search algorithm is proposed to address the problems of SSA. To improve the global exploration capability of SSA, the golden sine strategy was introduced into the producer's position update method. Meanwhile, to accelerate the convergence speed of SSA, the idea of individual optimality of particle swarm was introduced to the location update method of the investigator. To prevent SSA from falling into local optimality, Gaussian perturbation was applied to the globally optimal sparrow individuals. A total of 23 benchmark functions and 5 meta-heuristics were chosen to evaluate ISSA's optimization performance. In terms of the average value and standard deviation of the search results, ISSA achieved first place or was tied for first place on a total of 20 benchmark functions. Except for F14, the convergence speed of ISSA is not weaker than the rest of the comparison algorithms. In addition, the Wilcoxon rank sum test and the average value were combined in order to evaluate the performance of the algorithms. From a statistical point of view, the number of benchmark functions in which ISSA outperforms PSO, Gold-SA, HHO, GTO, and SSA is 14, 10, 13, 6, and 13, respectively.

Finally, ISSA was used to solve the problem of solving unknown node coordinates for HWSNs coordinate calculation. Simulation experiments showed that the localization accuracy obtained using ISSA to compute node coordinates is the best among the 15 metaheuristic algorithms and LS, and it reduces the localization error by 14.19% compared with LS. Changing the internal parameters of HWSNs, it can be found that increasing the number of nodes and increasing the proportion of anchor nodes can improve the accuracy of the localization algorithm. However, when the number of nodes reaches 160 and the proportion of anchor nodes reaches 25%, the enhancement effect will be significantly weakened.

Unfortunately, applying the proposed ISSA to the node localization of HWSNs improves the localization accuracy but also increases the time required for localization. In our future work, we will further optimize the search mechanism of ISSA to reduce the search time during node localization. In addition, we will also focus on the distance estimation of the first stage of the HWSNs localization problem to further improve the localization accuracy.

**Author Contributions:** Conceptualization, H.Z. and J.Y.; formal analysis, J.Y.; investigation, W.W. and Y.F.; methodology, H.Z.; software, T.Q.; supervision, W.W.; validation, H.Z., W.W. and Y.F.; writing—original draft, H.Z.; writing—review and editing, H.Z., J.Y. and Z.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by the NNSF of China (No. 61640014, No. 61963009), the Industrial Project of Guizhou province (No. Qiankehe Zhicheng [2022]Yiban017, [2019] 2152), the Innovation Group of the Guizhou Education Department (No. Qianjiaohe KY [2021]012), the Science and Technology Fund of Guizhou Province (No. Qiankehejichu [2020]1Y266), the CASE Library of IOT (KCALK201708), and the IOT platform of the Guiyang National High technology industry development zone (No. 2015).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data are contained within the article. The data presented in this study can be requested from the authors.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Identification of Opinion Leaders and Followers—A Case Study of Green Energy and Low Carbons**

#### **Chun-Che Huang 1, Wen-Yau Liang 2,\*, Po-An Chen <sup>1</sup> and Yi-Chin Chan <sup>3</sup>**


Received: 21 October 2020; Accepted: 24 November 2020; Published: 26 November 2020

**Abstract:** In recent years, with the development of Web2.0, enterprises, government agencies, and traditional news media, which have been positively influenced by opinion leaders, have been dedicated to understanding leaders' opinions on the web in order to seek convergence. Specifically, with the increase of environmental awareness, the introduction of green energy and carbon reduction technology has become an important issue. Consequently, studies identifying opinion leaders and followers who are interested in green energy and low carbon have become important. This study aims to find a solution that can identify the characteristics of opinion leaders and followers that can be widely used, which will help certain public policies or issues to be more effectively disseminated in the future. To model the characteristics of opinion leaders and their influence on followers, this study uses a dual matrix. The interaction patterns are recognized among opinion leaders and followers, with the aim of developing public policy to promote green energy and low carbon emissions. A case is studied to validate the superiority of the proposed solution approach. With the proposed approach, a (business) organization can identify and access opinion leaders and their followers. Through communication, these organizations can absorb strain and preserve functions despite the presence of adversity. This study also clearly demonstrates its contribution and novelty through comparisons with the existing alternative method.

**Keywords:** green energy and low carbon; opinion leaders; followers; social media; matrix method; intelligent systems

#### **1. Introduction**

With the rise of communication technology, people are utilizing platforms such as content sharing sites, blogs, social networking, and wikis to create, modify, share, and discuss Internet content. Social media provides flexible platforms that play key roles in energizing collective action in movements [1]. This represents the social media phenomenon, which can significantly impact society and industry, e.g., firms' reputations, sales, and even survival [2]. Within the discussions on social media, certain individuals influence others and thus emerge as opinion leaders. Opinion leaders have great impacts and influence on social media. Organizations can take advantage of these predispositions through marketing research and public relations, nurturing opinion leaders or advocates, placing and creating advertisements, developing new products and lowering the cost-to-serve [3]. On the Internet, the power of these leaders is increasing larger and sequentially influencing entire societies through calls to protest, promotion of policy and decision-making, which was defined as the fifth right in the "Towards a Civil Society" seminar [4].

The world must confront the energy crisis and air pollution. Discussions about energy issues are increasing. These discussions range from nuclear energy, thermal power, hydropower, and other forms of green energy and low carbon technology, including wind, solar, tidal, and biomass geothermal energy issues. In Taiwan, whenever an energy crisis occurs, energy charges increase. Anti-nuclear positions and other energy issues are discussed broadly. Therefore, the Taiwan government tries to understand people's needs and questions.

On the Internet, the roles of opinion leaders and followers in the formation of these issues cannot be neglected. According to the "theory of two-step flow" [5] and Rosen's definition of opinion leaders' characteristics, "social media initially pass the information to opinion leaders, then opinion leaders spread the information to followers and influence their attitudes" [6]. Thus, when followers follow opinion leaders, the formers' judgments and attitudes will be influenced and changed by opinion leaders. This study defines opinion leaders as people or social media with high social status who are able to influence followers. This study defines followers as the users who follow certain issues, publish related discussions and add their own ideas. They spread, repost or blindly follow the behaviors of opinion leaders.

Most previous research of opinion leaders focuses on the commercial domain rather than on nonprofit-related policies such as energy policy [7]. In the promotion of many public policies through online postings, it is difficult to clearly identify opinion leaders and followers, which greatly reduces the effectiveness of communication. Based on the community attributes of opinion leaders and whether they can successfully resonate, this study aims at providing a method to try to identify who are opinion leaders or who are likely to become opinion leaders in social media, and who are followers. Relational matrix analysis is used to represent the relationship between opinion leaders and followers in social media and to identify the collection of opinion leaders and potential opinion leaders.

Furthermore, previous studies [8] have used quantitative methods of analysis. One example is the SuperedgeRank algorithm. However, this algorithm not only has difficulty identifying potential opinion leaders effectively but also neglects how opinion leaders influence followers and how relationships between opinion leaders and followers are characterized. It also ignores the increasingly important role played by intelligent systems such as algorithms.

Although the literature on green energy is rapidly increasing, many studies suggest that this problem needs to be dealt with by considering a broader perspective [9]. This study not only examines the issue from the perspective of intelligent systems such as algorithms but also identifies the roles of opinion leaders and followers on social media in relation to the introduction of green energy and carbon reduction technology, with the aim of developing public policy to promote green energy and low carbon emissions. This study is novel not only because it takes quantitative factors and tradition clustering approaches into account but because it also analyzes posts, poster characteristics and their interactive relationships on social media. This study reviews relevant literature in Section 2. In Section 3, we propose a method to identify opinion leaders and their followers based on their interactions on social networks. The interaction patterns are also identified. An energy case is studied in Section 4 to validate the proposed solution approach and enhance communication effectiveness between government policymakers and people's desires. The discussion is summarized in Section 5, and Section 6 concludes this study.

This research contributes to finding a solution to easily identify the characteristics of opinion leaders and followers in the case of online posts related to green energy and low-carbon policies. Once certain public policies need to be effectively disseminated, they can be widely used. Using the same model and the solution approach, the results of this study can be extended from the green energy low carbon issue to other social issues. Furthermore, this study provides a new perspective to deal with the effective identification of opinion leaders and followers, at the same time, promotes the "theory of two-step flow" to add another research perspective in the academic field.

#### **2. Literature Review**

#### *2.1. Green Energy and Low Carbon*

Global warming, unexpected climate change, dwindling energy resources and unprecedented amounts of air pollution have become critical problems. The United Nations' 2030 Sustainable Development Goals show that a sustainable modern electricity grid [10], reduction of CO2 emissions [11] and the carbon footprint of human mobility to a sustainable level [12] etc., are key parts. In addition, exhaustion of fossil fuel is viewed as a big challenge of human development [13] since energy is an expensive resource that is becoming more scarce with increasing population and demand [14]. Green energy could help governments reduce the dependency on energy importation, improve the variety of production resources and advance sustainable environmental development. Moreover, the usage of rich green energy could benefit economies significantly [15].

In the past, studies of green energy and low carbon have focused on issues of energy itself and energy systems, for example, integration of energy systems [16], reliability of the power distribution system [17], uptake of biomass energy [18], and so on. Few studies have focused on social opinions about green energy and low carbon. In recent years, due to awareness of the environment, the public has started to care more about the environment and quality of life [19]. Social media has strengthened community among people and emerged as a platform to spread messages quickly and powerfully. In the era of Web 2.0, massive public opinion is increasingly generated on the Internet [20]. Therefore, the study of the role of social media in green energy and low carbon social issues is very important.

#### *2.2. Opinion Leaders*

In the era of the Internet, opinion leaders enhance content sharing. In fact, almost all of the content is generated by opinion leaders (the 90–91% law) [21]. What makes opinion leaders so important on social networks is their ability to informally influence others' attitudes and behaviors [22–25]. Opinion leaders usually have access to far more information on a certain topic and have professional experience with the topic. Rosen defined the characteristics of opinion leaders proposed the acronym ACTIVE. ACTIVE stands for the six characters of opinion leaders: ahead in adoption, connected, travelers, information-hungry, vocal, and exposed to the media [6].

In a recent qualitative survey carried out through focus groups, Katz and Lazarsfeld proposed the "theory of two-step flow" in 1995 and pointed out that opinion leaders are situated between social media and the majority of people. The information first reaches the opinion leaders or influencers [26], who then introduce it to the wider population [5]. Followers are those who are affected and change their behaviors and attitudes when receiving the information [27]. The followers are enormously influenced by opinion leaders in terms of changing their attitudes and behaviors [25].

Based on the above study, we elaborate on the attributes of opinion leaders as follows: Their life experience and understanding of knowledge are rich and thorough and a majority of them are highly educated. Moreover, they have strong social skills, strong connections with the broad masses, and good reputations due to their professionalism and knowledge. They have great influence and appealing power. They exhibit sensitivity to information, willingness to accept new things and an innovative spirit.

#### *2.3. Opinion Leaders Identification*

Many theories have been put forward about social networks, but few address the issue of opinion leader identification [28]. According to a previous literature review, opinion leaders are simply determined based on some visible user activities, and other factors that allow a user to become an opinion leader are ignored [28,29]. Studies of Internet opinion leaders have also mainly focused on the role of Internet opinion leaders in spreading the news and in the Internet world of word mouth marketing [23]. Consensus has not yet been reached in the analysis of Internet opinion leaders. Few efforts have been taken to create a computer-based model to identify and analyze the

opinion leaders in an Internet community, and the studies that have been undertaken on this issue have failed to reach an in-depth level [30]. At present, studies related to the TwitterRank algorithm based on PageRank [29] and the contribution of information to InfluenceRank [31] and weighted Page-Rank [32] make use of the network construction of user interaction, but they neglect the users' inherent features [33].

In addition, from the perspective of data mining, the identification of opinion leaders is a cluster problem. However, the aforementioned studies consider the relationships of people to be a social network. Engagement is used as an effective degree to measure user interaction with an organization. Basic interactions include commenting on contents, sharing contents, or "liking" or "favoriting" content. A core KPI for social media is that engagement is high, as this would indicate that organizations are producing content that users find interesting enough to spend additional time on [34]. Unfortunately, when applying the cluster problem to social networks, previous studies have only taken quantitative factors, and traditional clustering approaches into account, e.g., support vector machines [35], k-means [36], partitioning around medoids [37], fuzzy c-means [38], and so on have been used to resolve quantitative clusters. However, qualitative characteristics are not yet considered, and only static data have been analyzed. To study the qualitative characteristics of opinion leaders and the impact of opinion leaders on followers, [39] evaluates whether every speaker in social media satisfies the characteristics of an opinion leader. By observing the relational matrix, the interacting relations between users in social media are analyzed, and opinion leaders and followers are identified. However, there are no theoretical background axioms implied in [39], specifically from the perspective of communication to validate the results.

#### *2.4. Opinion Leaders Identification Algorithms*

Ma and Liu [40] used the SuperedgeRank algorithm to analyze the attributes of three seed networks and identify opinion leaders on the Fukushima nuclear issue. In another study, Jiang et al. [41] designed and implemented a BBS opinion leader mining system based on an improved PageRank algorithm using MapReduce. Ziyi et al. [30] adopted the core algorithm of the Internet searching–PageRank model and, by combining the analysis of the influence of linguistic data and sentimental preference, put forward a method to identify Internet opinion leaders; they also verified the method by carrying out an empirical study. Cheng et al. [42] combined influence with sentimental analysis based on the content of posts and filtered opinion leaders by combining the PR values of the PageRank algorithm and recognition degree, abbreviating the IS Rank algorithm. Deng et al. [43] constructed a SINA Micro Blog APIs based Micro Blog crawling and analysis tool, and a node betweenness approximation computation method was proposed, offering better accuracy and less running time to detect core opinion leaders on Micro Blog graphs.

PageRank is an excellent sorting algorithm, but its running speed decreases significantly with the increase of the number of data nodes. Jing and Lizhen [33] proposed a hybrid data mining approach based on user features and interaction networks, which includes three parts: a way to analyze users' authority, activity and influence, a way to consider the orientation of sentiment in an interaction network and a combined method based on the HITS algorithm for identifying microblog opinion leaders [33]. Chu et al. [44] researched social networks to access the influence of tobacco opinion leaders on followers and found that followers are a vulnerable group. They are young and low educated. Followers are easily influenced by opinion leaders. Therefore, anti-smoking education to stay away from tobacco can educate them on social media. Obviously, opinion leaders on the Internet have considerable influence on followers, and opinion leaders are often used in marketing in the e-commerce industry. The research of Lin et al. [45] found that opinion leaders can use their influence to act as important promoters of products and services. It is recommended that companies or corporate managers choose to cooperate with opinion leaders of a certain type of forum to promote products or services. What is the impact of the levels of followers' trust in opinion leaders on the resulting influence? Zhao et al. [46] used opinion dynamics theory to study the influence of trust in opinion

leaders. His research found that followers' trust in opinion leaders determines opinion leader influence. It is suggested that if the communication effect of e-commerce is pursued, the key premise can increase the trust of opinion leaders.

Not only have many studies investigated opinion leaders in the e-commerce field, but the role and function of opinion leaders have attracted attention in politics and the public domain. Aleahmad et al. [43] examined the political field by proposing the effective OLfinder algorithm. The researchers found that this algorithm can not only find opinion leaders in social networks but also calculate their popularity. Many people are curious about why opinion leaders like to play the role of opinion leaders. Winter et al. [47] examined people who disseminate opinions about politics or public affairs on the Internet and identified these people as opinion leaders who try to influence the psychological motivations and personality characteristics of followers. This study found that opinion leaders have strong psychological motivations to actively express themselves and persuade others, making them like to play the role of opinion leaders. In addition, in social network analysis, centrality methods have been applied to measure the importance of nodes in a network whereby nodes with higher centrality can influence others more significantly [48,49].

All in all, past relevant research either used a certain social measurement method based on interview self-reports or questionnaire surveys or used quantitative clustering techniques to identify opinion leaders. Few studies have actually investigated online posts to identify opinion leaders and the social patterns of interaction between opinion leaders and followers. Thus, it is important to propose a method to analyze posts, posters' characteristics and their interactive relationships in social media.

#### **3. Methodology**

#### *3.1. Modeling Interaction between Opinion Leaders and Followers*

Between users, matrix *M* is a relational matrix.

$$M = \{T|I\}\tag{1}$$

We set the row index as *i* and the column index as *j* in the matrix. The n users in the set are represented by C = {1 ... n}. In the matrix *T*, the elements are composed of counts of responses and being responded to. Additionally, between users and social community support level, the elements in the matrix *I* are the influence factors

Matrix *T* shows us the counts of both responses and being responded to between n users. *T\_ij* refers to counts of responses of *useri\_i* to *userj\_j* where *i j*. When *T\_ij* refers to counts of total posts of *user\_i* where *i* = *j*. *i* refers to the index of the users who responds to other user's opinions, and j refers to the index of the users whose opinions are responded to.

Matrix I indicates the power of users to influence and be influenced. *I\_ij* refers to the influence of *user\_i* on *user\_j* where *i j*, *i* refers to the index of a user who influences other people, and *j* refers to the index of a user who is influenced by others. The influence power can be classified into three different patterns, including job position, professional knowledge and social community support level [39]. Each criterion has three levels, no influence (NI), general influence (GI), and high influence (HI) (Table 1). For a higher job title with more professional knowledge and a high social community support level, we define influence power as high influence (HI). For a general job title with popular professional knowledge and neutral social community support level, the overall influence power is general (GI). If the user has no job and inaccurate professional knowledge or social community support, we define that the influence level as having no influence (NI). However, some users have privacy settings or discuss issues anonymously, so our study could not gather their background information. Fortunately, the proposed approach can serve as a flexible model with the missing part left blank.


**Table 1.** Influence power table (NI: No Influence, GI: General Influence, HI: High Influence.).

In matrix *I*, *I\_ij* refers to a user's social community support level where *i* = *j*. In this study, social community support is divided into three levels. These three levels are relatively well-known, and well-followed social media that receives good attention are which are classified as having high social community support (H). Less-followed, less-known social media with low attention is classified as low (L). However, we were unable to gather users' social community support levels because of some users' anonymous discussions or privacy settings. Our study defines the social community support level of these users as missing (O).

A non–follower, a user with a negative speech count, is defined in this study. Meanwhile, when contents are responded to negatively, the user is listed as a non-opinion leader. The relationship between opinion leaders and followers refers to a mutual relationship between users. However, one user may not respond to or express ideas to others' speech content on social media. One opinion leader may not be an opinion leader of all users. Therefore, if there is no relationship between users, we cannot distinguish whether they are opinion leaders or followers. In this study, the groups of opinion leaders and followers are be judged base on interaction. The users with mutual influence are classified as one group to analyze whether there is an opinion leader and a follower in the group. If there is no mutual influence, no group is formed.

#### *3.2. Opinion Leaders and Followers' Social Patterns*

Six axioms are proposed to classify opinion leaders, influencers, followers and interaction patterns. Notations are presented in Table 2.

**Table 2.** Notations table.


#### **Table 2.** *Cont.*

#### **Notations:**


*SIol in fluence power* : The number of opinion leaders' influences (NI, GI, HI).

*SIf*{*NI*} : The number of opinion leaders' influences (NI, GI, HI).

*UNIi*: The number of *user i* s posts with NI influence power and responses in matrix *T* are zero.

The axioms formulated by the systematic rolling analysis of big data over the years. Experts determine the criteria and select thresholds for them, for example, the positive or negative degree of the counts of responses to and responded to. The following axioms are proposed to recognize opinion leaders, influencers, and followers.

Axiom1—Opinion leader.

If *Nrsp* < *URi* and *URi* > 2 × *Ng* and *UNI* < 1/2 × *Ng*, then the *useri* is an opinion leader.

To be an opinion leader, *useri*'s (*URi*) response total must be higher than the average total response (*Nrsp*), and *useri* must have more than twice the number of responses in the group (*Ng*). Moreover, *user i s NI* speech and the count in matrix *T*(*USi*) less than 1 cannot be more than half of the total number of members' responses in group *g*.

Axiom 2—Influencer.

If *Ng* < *URi* < 2 × *Ng* and *UNI* < *Ng*, then *useri* is an influencer. If the *user i s* responses (*URi)* are more than the average number of total responses in the group and lower than twice the average number of total responses in the group. Moreover, *user i s* speech which is NI and the count in matrix *T*(*USi*) which is less than 1 cannot more than the number of the member which is in the group. Influencers also have the power to influence others and have the potential to become opinion leaders. Hence, this study also offers a way to find influencers.

Axiom 3—Followers.

If *URi* > 0, then the user is a follower.

A follower needs to support or agree with someone, so the *useri* must have a positive count in matrix *T*. In other words, the user in *rowi* is the follower's leader.

Three social community patterns are classified: In the *criterion* pattern, the opinion leaders broadly influencing many followers usually obtain high social community support and posts professionally. The criterion social community, the most common pattern. In the argument pattern, the pattern's emergence is caused by the discussion space provided for users of social community platforms. Users can give specific advice to influence each other or influence other users. The bandwagon pattern arises when followers follow closely due to opinion leaders' personal charisma. In this pattern, followers usually do not care whether the content posted by opinion leaders is correct.

The following axioms are used to define three patterns.

Axiom 4—Criterion pattern.

According to *group\_k* with opinion leaders in *set*\_*ol*, (1) if the number of posts influencing followers who post to *set*\_*ol* in *group*\_*k* is more than half of the number (≥1/2 × *SI*\_*f*), and (2) if the number of posts influencing opinions directed to followers in *group*\_*k* is more than half of the number (*SI*\_*ol* {HI,GI} ≥ 1/2 × *SI*\_*ol*), then *group*\_*k* is a criterion pattern.

According to the criterion pattern, an enterprise can promote its products effectively, and the government can sharp public opinion in favor of a particular policy by utilizing the function of opinion leaders. Furthermore, finding opinion leaders and tracking them over the long term can prevent an explosion of potential issues. In the criterion pattern, the opinion leader is very professional. If a government or enterprise wants to negotiate or cooperate with them, the contractor must also be professional.

Axiom 5—Argument pattern.

After grouping with the CI Algorithm, we find different groups. According to *group\_k* with opinion leaders in *set\_ol*, (1) if the number of posts influencing {HI,GI} followers who post to set\_ol in group\_*k* is more than half of the number (if *SI\_f* {GI,HI} ≥ 1/2 × *SI\_f*) and (2) if the number of posts influencing {HI, GI} opinions directed to followers in group\_*k* is more than half of the number (*SI\_ol* {HI,GI} ≥ 1/2 × *SI\_ol*), then we recognize that group\_*k* can be characterized as an argument pattern.

On the basis of the argument pattern, the character of the interaction between opinion leaders and followers is not significant; consequently, the cost of marketing is high and may even have little impact on promotion. Moreover, in the argument pattern, the viewpoints are diverse. Thus, it is desirable to provide a platform and sufficient information and domain knowledge for uses to engage in dialog with each other.

Axiom 6—Bandwagon pattern.

After grouping with the CI Algorithm, then we have different groups. According to *group*\_*k* with opinion leaders in *set*\_*ol*, (1) if the number of posts influencing {NI} followers who post to *set*\_*ol* in *group*\_*k* is more than half of the number (if *SI*\_*f* {NI} ≥ 1/2 × *SI*\_*f*) and (2) if the number of posts influencing {NI} opinions directed at followers in *group*\_*k* is more than half of the number (*SI*\_*ol* {NI} >1/2 × *SI*\_*ol*), then we recognize that *group*\_*k* can be characterized as a bandwagon pattern.

According to the bandwagon pattern, opinion leaders and followers are not professional in most cases. Enterprises and governments can utilize social media to promote their products and public policy effectively by enhancing the roles of these opinion leaders and followers. If the government or an enterprise wants to negotiate or cooperate with them, the contactors need not be professional but must be a decision-maker who can promise to provide resources.

#### *3.3. Problem with Identification of Opinion Leaders and Followers*

The problem of identifying opinion leaders and followers is formulated as follows:

Decompose a user-user interaction matrix into mutually separable submatrices (modules) with (1) the minimum number of non-empty high-value entries outside the block-diagonal matrix *T*, and (2) the maximum number of strongly desired entries (HI) and the minimum number of strongly undesired entries (NI) included in the submatrices of the block diagonal matrix *I.*

Subject to the following constraints:

Constraint C1: Empty groups of users are allowed, and

Constraint C2: The number of users in a group cannot exceed the upper bound Nu.

Constraint C3: Satisfy the following assumptions:


In matrix *T*, we count input post, responses and being responded to. In the matrix *I*, (1) input the highest influence power HI. Moreover, (2) input the lowest influence power NI. Combined with observation results, identify the opinion leader set and follower set.

#### *3.4. Identification of Opinion Leader and Follower*

In this study, the relationship between opinion leaders and followers is the mutual relationship between users. When a user satisfies the characteristics of opinion leaders, our study defines the user as an opinion leader. Followers will change their own ideas and attitudes according to opinion leaders' characteristics, including social status, accuracy of post contents and social community support level. The algorithm is described as follows:

Step I. Collect data from social media, such as users, posts, and response information.

Step II. Compute the counts of total posts of *useri* in matrix [*Tii*]

Step III. For all users, put the responses which *useri* gives to *userj* in matrix [*Tij*] until there are no responses from *useri* to other users.

Step IV. If *colomni* and *rowi* in matrix *T* are NULL then remove the meaningless *useri* by deleting *rowi* and *colomni* in matrix *T* and *I* until there are no meaningless *useri*

Step V. According to the data of social media, matrix *T* and social community support level, the social community support level marked with *useri* at [*Iii*].

Step VI. For all users *i* and *j*, according to expert judgment, assign *useri* the influence power level [*Iij*] of *userj*.

Step VII. If [*T*(*i*−1)(*i*−1)] > [*Tii*], exchange the columns of *useri* and *useri*−1. until there is no [*Tii*] < [*T*(*i*−1)(*i*−1)] in matrix *T*.

Step VIII. The CI algorithm is applied to group users.

Step IX. If [*Iij*] is not NULL, then check whether the *useri* and *userj* is in the same group or not; if they are not in the same group, then put them in the same group matrix until all users in the matrix *I* have been checked.

Step X. In each group, sum up all positive responses to *Nrsp* and compute the average *Nrsp*.

Step XI. Count the responses of *useri* by *n j*=1 *Tij* and posts of *useri* by *n i*=1 *Tij* . Additionally, count the influence of each *useri*, If *useri* satisfies Axiom 1, then *useri* is identified as an opinion leader. If *useri* satisfies Axiom 2, then *useri* is identified as an influencer. If *useri* satisfies Axiom 3, then *useri* is identified as a follower. Continue until all users in matrix *T* have been checked.

Step XII. Check each group matrix. Count all opinion leaders' [*Iij*] and followers' [*Iij*]. Recognize the pattern based on Axioms 4–6.

Step XIII. When there are positive responses or influence between users, we classify these users in the same group.

#### **4. Case Study**

The ABC network platform is taken as an example to describe the application of our research in practice. The ABC network platform is a discussion platform created by the government to promote community communication. This platform was created as part of a public policy proposal to improve policy communication and make policy public.

This case study is taken from the National Energy Conference organized by the Energy Bureau of Taiwan's Ministry of Economic Affairs. However, there are still many disagreements when it comes to choosing opinions due to value divergence. To discuss and clarify issues with the public, the proposition, "Where does future electric power come from?" is open on the policy consultant forum (People Talk), with three sub-issue forums including, 'environment low carbon sustainable development', 'stable supply and open source' and 'reduce expenditures effectively'. In particular, 'stable supply and open source' is the focus of this case study.

The proposed solution approach is applied in this case.

Step I. Collect materials: Judging by the forum (posts, fan pages) on social media about green energy and low carbon, we collected materials, including text and response information. This study collected materials from users' discussion contents related to the "stable supply" issue on the ABC network platform between May 2019 and the end of 2019. The data collection is implemented with the Python-Jieba crawler program, which is particularly suitable for Chinese text analysis automatically. The collected materials are listed below: Post users: 36; total posts: 205 (total posts have been deducted from the number of administrator responses and consecutive posts); effective responses: 61; effective count of being responded to 47.

Step II. Input matrix elements: Input elements in matrix *T*. The elements include general posts, counts of responses and being responded to as judged by experts.

Step III. Remove meaningless users: Remove users whose posts are never responded to.

Step IV. Tag the user's category: Input social community support

Step V. Influence analysis: We analyze a user's influence by comparing the levels of influence power according to three characteristics and input the influence into the matrix *I*.

Step VI. If the count of users' posts, responses, and influence power, reaches a certain level of relevance, then move forward. If the users have greater counts of responses or respondents, list them in front. (Figure 1).

**Figure 1.** Matrix element conversion. Note1: the criterion of influence power: NI = no influence; GI = general influence; HI = high influence. Note2: The level of social community support: H: high social community support; L: low social community support; O: missing social community support.

Step VII. According to matrix *I* and *T* based on Equation (1), group the users by their relationships. Group A {1, 2, 3, 4, 5, 6, 18, 20, 21, 24}; Group B {2, 16, 24, 26, 27, 34, 36}; Group C {17, 18, 19}; Group D {19, 20, 21, 36}; Group E {2, 27, 28, 29, 30, 36}; Group F {7, 20, 26}; Group G {2, 9, 27, 31, 34}; Group H {7, 27}; Group I {7, 8, 15}; Group J {21, 22, 23}; Group K {25, 26}; Group L {34, 35}.

Step VIII. Identification: Determine the interaction between users, followers, and opinion leaders, according to the definition of each group of opinion leaders.

In Group A, opinion leader 1 is recognized by Axiom 1, and the influencers are Users 3 and 5. This group is identified as an argument group based on Axiom 5.

In Group B, User 16 s post contents are usually meaningless. According to Axiom 1, User 16 is not an opinion leader. Moreover, the group does not belong to any pattern.

Group C is an argument pattern. Users 18 and 19 are influencers. In this pattern, no opinion leaders and followers are identified. The influencers influence each other without focusing on any particular key person.

In group D, the response of User 19 has a great influence on Users 20 and 21. However, the influence of the posts responded to by Users 20 and 21 is not great. There is a discussion relationship between Users 19 and 20, so it is a social pattern.

In Group E, the post contents of User 36 are valuable. However, other users' responses are not good. Thus, User 36 is an opposing opinion leader.

In Group F, according to Axiom 6, User 26 is an influencer, and the group represents an argument pattern. User 20 is the follower of User 26.

In Group G, Users 27 and 34 are the influencers, and it is a bandwagon pattern. Moreover, User 2 is the follower.

In Group H, Users 27 and 7 have a discussion relationship, and both of them are influencers.

In Group I, the social community support level of the group is high. They should be opinion leaders, in theory. However, in this case, study, since they do not play the role of opinion leaders, they cannot be recognized as opinion leaders. User 7 is an opinion leader, and this is an argument pattern.

In Group J, there are no opinion leaders or followers. User 21 is an influencer.

In Group K, User 26 is opposed to the opinions of User 25. There are no opinion leaders or followers in this group.

Group L: In this group, there are no opinion leaders or followers.

According to the summary of group analysis, Users 1 and 36 are obviously opinion leaders. The five groups A, C, D, F, and H can all be characterized as argument patterns, which shows that in the forum, most post contents influence other users through discussion.

The case was also analyzed with a traditional network approach, i.e., the Ward method, named after its creator, focuses on the allocation of profiles to groups equally. Ward [50] pointed out that grouping in this manner makes it easier to consider and understand relations in large collections. The principle of this method is to minimize heterogeneity, and the important goal is to find the greatest similarity. The comparison between the proposed approach and Ward's approach is shown in Table 3. The results show that Users 1, 7, 36 are identified as opinion leaders. However, User 19 has not been identified through the traditional method due to the threshold.



The identification of opinion leaders by Ward's method only identifies opinion leaders who participate in the whole conversation. However, in the proposed approach, this study uses two identification methods: the whole conversation and group conversation. The latter can clarify which user is the group's opinion leader. In addition, the proposed approach can discover different patterns. Although the traditional network approach of Ward's is considered to be the best one among the hierarchical clustering methods [51–53], it cannot identify these patterns.

Through the perception of social community patterns among users, this study successfully distinguished opinion leader and defined social patterns in the complex social communities, which contains highly controversial users and many of them are anonymous, where few persons are involved in the discussion and users' support level could not be obtained because users disagree with each other.

After identifying opinion leaders and followers, in a criterion pattern, opinion leaders have a higher degree of professionalism than followers. In that case, if green energy and low-carbon related policies are to be disseminated through opinion leaders, it is necessary to send personnel with a certain degree of professionalism. After contacting and negotiating with them, you must first obtain the approval of the opinion leaders before you can persuade them to influence followers through their platforms or social media. It is expected that they will achieve rapid dissemination, higher dissemination effect, and avoid costly but ineffective dissemination. In addition, a follower may also become another opinion leader, generating multiple diffusion of innovations.

Second, in the argument pattern, due to the comparably equal status between opinion leaders and followers, issues are quite diverse, and it is not easy to focus on specific issues. If opinion leaders wanted to disseminate relevant policies on energy conservation and low carbon to influence followers, the dissemination effect would be poor. Therefore, in order to make the issue of green energy and low-carbon attract more attention, opinion leaders can package the issue into lifestyles and features, thereby achieving a higher diffusion effect (Diffusion of innovations) on followers.

In addition, in the bandwagon pattern, because opinion leaders and followers are less professional, they are more vulnerable to each other. In order to disseminate green energy and low-carbon policies, policies can be packaged as simple, interesting or lifestyle issues, while social media or platforms are often used by opinion leaders or followers to achieve better diffusion of innovations.

#### **5. Discussion**

In this study, three interactive patterns and their characteristics are identified, which can help how to find opinion leaders more effectively and grasp the characteristics of opinion leaders and followers when want to spread (Diffusion of innovation) new policies or marketing new products. Opinion leaders and followers both have different levels of knowledge, social community support, and influence power. Therefore, this study summarizes the interactions on social media into three patterns, and the characteristics of three patterns have also been explored. Furthermore, based on the characteristics of users in these patterns, it can be used to provide opinion leaders with specific and clear topics/issues to influence their followers, thereby obtaining effective dissemination or commercial marketing purposes in the green energy domain.

In addition, the results of this study can also be applied to the political dissemination of democratic elections or the shaping of the opinion climate, which can more efficiently lead the electoral issues and win elections. In other words, the issues or political opinions that candidates are trying to market can be differentiated based on different communication modes so that the information can be segmented, and the impact of effective agenda-setting goals can be achieved. This study not only has the possibility of expanding and deeper research, but it is also the relative value of this research.

Most previous studies used different algorithms or improved algorithms to identify opinion leaders [28–31,33,37]. In other words, most of the above-mentioned studies only used various algorithms to identify opinion leaders or followers, and consequently, apply them to political communication and commercial marketing related fields. There has recently been an exploration of the psychological motivation of actively acting as opinion leaders to understand which users are active communicators or passive recipients of social issues. However, the related study on the interaction patterns between the opinion leaders and followers and their characteristics have not been explored.

In addition, the opinion dynamics of current popular research are interested. The classic model of opinion dynamics is derived from the research of DeGroot's and Friedkin–Johnsen's models of opinion dynamics, which aims at the integration and consistency of opinions in social networks, carried out very enlightening modeling and exploration [54]. DeGroot's model describes the process of reaching consensus in social networks, while Friedkin–Johnsen's model further introduces the degree of "stubborn individual" to explain the phenomenon of inconsistent opinions in social networks. The models clearly depict the dynamic process of opinion integration and consistency, as well as the obstacles caused by "stubborn individual" factors to the process of opinion integration [55,56]. However, the two models are very instructive to explore how individuals (or Internet users) should be controlled if they are affected by certain characteristics or stubbornness in the process of opinion integration. Our study more specifically explores how opinion leaders and followers can find out the characteristics of users and the patterns of interaction between them in the process of consensus,

which can be applied to precision marketing in actual operation, and even give opinion leaders with different characteristics use differentiated topic content to increase the influence of consensus. Since the research of DeGroot's and Friedkin–Johnsen's models of opinion dynamics are in conceptual level only, some difficulties in practical application are challenged [57].

This study extends the idea of [39] to the domain of green energy and low carbons, where roughly qualitative characteristics of opinion leaders and matrix of interaction between users are considered. To make this study more solid and applicable, the theories of "two-step flow", "bandwagon effect", "agenda-setting" and "innovation diffusion theory" from the theoretical perspective of communication and the axioms are used to validate the results. In addition, this study does not focus on a single discipline only but a cross-disciplinary study of the fields of green energy and low carbons, intelligent systems, and communication to provide numerous management implications discussed in this section. The core novelty and contribution is shown in that the solid theoretical part makes this study applicable to other social media and industry sectors.

#### **6. Conclusions**

Nowadays, networks are the most important media among broad masses, and almost everyone is closely related to networks, which results in many social issues as virtual networks are reflected in the real world. Opinion leaders play a very important role in spreading media for many issues. Previous research used some social measurement methods based on interview self-reports or questionnaire surveys [23,24,32,33] or used quantitative clustering techniques to identify opinion leaders [30,40–43]. In fact, few studies have investigated online posts to determine the social patterns of opinion leaders and interactions between opinion leaders and followers. Therefore, it is valuable that this study proposes a method to analyze the characteristics of social media posts, posters and their interaction relationships.

Furthermore, this study identifies opinion leaders on the issue of green power in social communities based on social community support level and influence power level. As a result of users cannot express positive and negative opinions on the issue; it is not enough to consider only the user's posts. For example, when a user has a large number of posts, if the user cannot get support from others, he/she cannot be classified as an opinion leader.

Using the same model and the solution approach, the results of this study can be extended from the green energy low carbon issue to other social issues, e.g., the domain of marketing to make better CRM. It also provides an efficient and operable application model for online marketing or public issue communication in practical applications, which can more easily identify opinion leaders and followers. Furthermore, this study not only examines from the perspective of intelligent systems such as different algorithms but provides a new perspective to deal with the effective identification of opinion leaders and followers, at the same time, promotes the "theory of two-step flow" [5] to add another research perspective in the academic field.

The level of community support and influence proposed in this study uses a relational matrix to analyze the relationship and the community pattern between users in this study. If it is applied to analyze social media with higher data volume and discussion volume, it should consider the computation speed, but which is usually not an issue in the current IT world. In addition, this study uses static data for analysis. This model can also be added or removed from dynamic data and evolution modeling for analysis in the future, and it is expected that it can be identified as more timely and faster. For example, the advantages of DeGroot and Friedkin–Johnsen models are taken into consideration for further study

However, this research's reproducibility to other industry sectors is interested and requires further investigation. In addition, this study focuses on a single case study of a country's green energy and low-carbon policy in Taiwan. If the results of this study are used to infer whether there will be differences in other countries with different levels of development, knowledge and education, it is

worthwhile to explore further. In any case, despite the above negotiable points, it does not detract from the valuable results obtained in this study.

**Author Contributions:** Conceptualization, W.-Y.L.; methodology development, C.-C.H.; coding, P.-A.C.; data preparation and analysis, Y.-C.C.; writing—review and editing, W.-Y.L. and C.-C.H.; data collection and annotation, P.-A.C. and Y.-C.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the National Science Foundation—Ministry of Science and Technology of Taiwan, grant number MOST-104-3113-F-260-001 and MOST 107-2410-H-260 -014 -MY3.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Applied Sciences* Editorial Office E-mail: applsci@mdpi.com www.mdpi.com/journal/applsci

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel: +41 61 683 77 34

ISBN 978-3-0365-5876-9