Embedded Learning Approaches in the Whale Optimizer to Solve Coverage Combinatorial Problems

Becerra-Rozas, Marcelo; Cisternas-Caneo, Felipe; Crawford, Broderick; Soto, Ricardo; García, José; Astorga, Gino; Palma, Wenceslao

doi:10.3390/math10234529

Open AccessArticle

Embedded Learning Approaches in the Whale Optimizer to Solve Coverage Combinatorial Problems

by

Marcelo Becerra-Rozas

^1,*

,

Felipe Cisternas-Caneo

¹

,

Broderick Crawford

^1,*

,

Ricardo Soto

¹

,

José García

²

,

Gino Astorga

³

and

Wenceslao Palma

¹

Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Valparaíso 2362807, Chile

²

Escuela de Ingeniería de Construcción y Transporte, Pontificia Universidad Católica de Valparaíso, Avenida Brasil 2147, Valparaíso 2362807, Chile

³

Escuela de Negocios Internacionales, Universidad de Valparaíso, Viña del Mar 2572048, Chile

^*

Authors to whom correspondence should be addressed.

Mathematics 2022, 10(23), 4529; https://doi.org/10.3390/math10234529

Submission received: 6 September 2022 / Revised: 17 November 2022 / Accepted: 22 November 2022 / Published: 30 November 2022

(This article belongs to the Section E1: Mathematics and Computer Science)

Download

Browse Figures

Versions Notes

Abstract

When we face real problems using computational resources, we understand that it is common to find combinatorial problems in binary domains. Moreover, we have to take into account a large number of possible candidate solutions, since these can be numerous and make it complicated for classical algorithmic techniques to address them. When this happens, in most cases, it becomes a problem due to the high resource cost they generate, so it is of utmost importance to solve these problems efficiently. To cope with this problem, we can apply other methods, such as metaheuristics. There are some metaheuristics that allow operation in discrete search spaces; however, in the case of continuous swarm intelligence metaheuristics, it is necessary to adapt them to operate in discrete domains. To perform this adaptation, it is necessary to use a binary scheme to take advantage of the original moves of the metaheuristics designed for continuous problems. In this work, we propose to hybridize the whale optimization algorithm metaheuristic with the Q-learning reinforcement learning technique, which we call (the QBWOA). By using this technique, we are able to realize an smart and fully online binarization scheme selector, the results have been statistically promising thanks to the respective tables and graphs.

Keywords:

binarization; metaheuristic; binary; whale optimization algorithm; combinatorial problem

MSC:

90C27

1. Introduction

In recent years, metaheuristics have proven their effectiveness in solving complex problems, especially combinatorial problems. Numerous examples can be found in biology [1], logistics [2], civil engineering [3,4], transit [5] and machine learning [6]. Within these complex problems, discrete domain or binary problems are getting more and more attention with increasing numbers to be solved. In this sense, the use of exact methods may not be the right choice because of the long computational time, therefore, approximate methods such as metaheuristics have become more popular in the scientific community, because although they do not guarantee optimality, they achieve solutions closer to the optimum in a reasonable computational time. It is necessary to add that there are metaheuristics specifically designed to address these problems with binary domain, however, it has been shown that metaheuristics designed to work in continuous domains assisted by a discretization scheme outperform the classical approaches based on binarization [7]. The classical design includes transforming domains through two-step techniques, despite the higher efficiency, and partly because many combinatorial problems are very large, it is critical to maintain the robustness of metaheuristic techniques.

Among the main approaches to integrate metaheuristics, some hybrid learning-based approaches have been discovered [8,9,10,11], in which the focus is on improving the transformation process and performance. Mathematization is another intriguing hybrid approach [12], which mixes mathematical programming techniques with metaheuristic algorithms. Ref. [13] examined the truck routing problem using mixed integer linear programming and metaheuristic approaches. In general, these methods do not make use of the auxiliary data generated by the metaheuristics to produce more accurate conclusions. Metaheuristics provide useful auxiliary data during the solution search process that can be used to inform machine learning algorithms. Artificial intelligence, and in particular machine learning, has gained importance in recent years, with applications in a variety of fields [14,15,16]. The combination of machine learning techniques and metaheuristic algorithms is a relatively new area of research that has gained popularity in recent years [9]. For example, in [17], the authors addressed several approaches to combine machine learning and metaheuristic techniques. To name a few; in [18], they make use of the Q-learning technique for the Cooperative Water Wave Optimization algorithm, called CWWO, to address the Distributed Assembly No-Idle Flow-shop Scheduling Problem (DANIFSP) and they minimize the maximum assembly completion time. In [19], Q-Learning (QL) in Hyper Simulated Annealing (HSA) is used to select the appropriate heuristic to address the mixed-model sequencing problem with stochastic processing times (MMSPSP). In both cases, the results stand out compared to traditional deterministic approaches.

Three main areas in which machine learning algorithms use metaheuristic data are presented in [9], they are: item Low-level integrations, High-level integrations and Optimization problems. In this work, we focus on low-level integrations. This group corresponds to general binarization techniques in which the metaheuristic moves are not modified, but after their execution, the binarization of the solutions is applied. Putting these concepts into practice, we specifically engaged 2 activities. (1) The authors in [20,21] proposed SARSA as a dynamic binarization scheme selector in the continuous metaheuristic. Motivated by the first point and seeking to answer our research question: “Does an increase of the transfer functions to be used in the binarization scheme selector positively affect the optimization process?” is that we propose a reimplementation and instance of the algorithm where we added new actions to the pool of SARSA selection possibilities, and (2) We use other dynamic binarization selector called Q-Learning [22] with the same actions pool that SARSA. This proposed approach combines the original metaheuristic moves and their continuous nature solutions with reinforcement learning to obtain a robust and competitive binary version. The proposed algorithm was applied to the set covering problem. The research contributions of this work are as follows:

A new smart binarization scheme selector is proposed.
The Q-learning technique, proposed in [21], is used to binarize the whale optimization algorithm.
A selector with a much wider repertoire in its actions is obtained from the literature.

The following is a summary of the contents of this study: The Section 2 addresses with the set covering problem and some of its applications in different areas. Section 3 outlines the techniques used and re-implemented from Q-learning and SARSA. The basics of the whale optimization algorithm and its binary version are covered in Section 4 and Section 5 respectively. In Section 6, the details of the numerical experiments and comparisons are developed. Finally, conclusions and possible future lines of research are discussed in Section 7.

2. Set Covering Problem

The set covering problem (SCP) is a well-known optimization problem that may be used to represent a variety of different applications and domains, such as assignment problems, and transport networks. This is an NP-hard problem [23] that entails finding the set of elements with the lowest cost that fulfills a certain number of requirements. The Set Covering Problem’s mathematical model is represented as follows:

A = (\begin{matrix} a_{11} & a_{12} & \dots & a_{1 n} \\ a_{21} & a_{22} & \dots & a_{2 n} \\ \dots & \dots & \dots & \dots \\ a_{m 1} & a_{m 2} & \dots & a_{m n} \end{matrix})

(1)

M i n i m i z e Z = \sum_{j = 1}^{n} c_{j} x_{j}

(2)

Subject to the following restrictions:

\begin{matrix} \sum_{j = 1}^{n} a_{i j} x_{j} \geq 1 \forall i \in I \\ x_{j} \in {0, 1} \forall j \in J \end{matrix}

(3)

where A is a binary matrix of size m-rows and n-columns and

a_{i j} \in {0, 1}

is value of each cell in the matrix A. The i and j are size m-rows and n-columns. In the event that column j satisfies a row i then

a_{i j}

is equal to 1, otherwise, it is 0. In addition has an associated a cost

c \in C

, where

C = {c_{1}, c_{2}, \dots, c_{n}}

together with that

i = {1, 2, \dots, m}

and

j = {1, 2, \dots, n}

are the sets of rows and columns, respectively. Finally, x corresponds to the area to be covered.

In order to understand how the SCP operates and as an example, we have implemented the following practical explanation. Intentionally, we have selected an instance of SCP with

m = 11

and

n = 11

to represent it graphically in Figure 1 and Figure 2 and by Equations (4) to (15).

M i n i m i z e = \sum_{j = 1}^{11} c_{j} x_{j}

(4)

Subject to:

\begin{matrix} A R E A_{1} : & x_{1} + x_{2} + x_{3} + x_{4} & \geq 1 \end{matrix}

(5)

\begin{matrix} A R E A_{2} : & x_{1} + x_{2} + x_{3} + x_{5} & \geq 1 \end{matrix}

(6)

\begin{matrix} A R E A_{3} : & x_{1} + x_{2} + x_{3} + x_{4} + x_{5} + x_{6} & \geq 1 \end{matrix}

(7)

\begin{matrix} A R E A_{4} : & x_{1} + x_{3} + x_{4} + x_{6} + x_{7} & \geq 1 \end{matrix}

(8)

\begin{matrix} A R E A_{5} : & x_{2} + x_{3} + x_{5} + x_{6} + x_{8} + x_{9} & \geq 1 \end{matrix}

(9)

\begin{matrix} A R E A_{6} : & x_{3} + x_{4} + x_{5} + x_{6} + x_{7} + x_{8} & \geq 1 \end{matrix}

(10)

\begin{matrix} A R E A_{7} : & x_{4} + x_{6} + x_{7} + x_{8} & \geq 1 \end{matrix}

(11)

\begin{matrix} A R E A_{8} : & x_{5} + x_{6} + x_{7} + x_{8} + x_{9} + x_{10} & \geq 1 \end{matrix}

(12)

\begin{matrix} A R E A_{9} : & x_{5} + x_{8} + x_{9} + x_{10} + x_{11} & \geq 1 \end{matrix}

(13)

\begin{matrix} A R E A_{10} : & x_{8} + x_{9} + x_{10} + x_{11} & \geq 1 \end{matrix}

(14)

\begin{matrix} A R E A_{11} : & x_{9} + x_{10} + x_{11} & \geq 1 \end{matrix}

(15)

The SCP allows the modeling of real-life optimization problems such as the location of gas detectors for industrial plants [24], the location of electric vehicle charging points in California [25], the optimal UAV locations for the purpose of generating wireless communication networks in disaster areas [26], the optimal location of first aid centers in Japan [27] and the location of stations to cover a network of reels [28]. These studies allow us to appreciate the importance of solving this problem with optimization techniques that guarantee good results.

3. Reinforcement Learning

Reinforcement learning is a subsection of Machine Learning in which an agent performs several actions in various states with the purpose of determining which action is the best for each state based on the consequences of completing each of the actions. Some of the classic examples are QL [29], Monte Carlo RL [30] and SARSA [31].

3.1. Q-Learning

QL was proposed by Watkins et al. [29] in 1992 and is one of the most well-known reinforcement learning approaches. The primary goal is to maximize the cumulative reward of an action in a given state and to discover the optimal action for that state.

The agent goes through several states, and in each one, an action is experienced instantly obtaining a reward or a penalty, and the point at which an action is done in a given condition is referred to as an episode.

The Q-learning method attempts to determine the amount of cumulative reward the agent will receive for each pair of action states in the long run. The action-state function is denoted by

Q (s, a)

, which returns the reward that the agent will receive while performing an action in state s, assuming that the agent will continue to follow the Q-function’s policy until the conclusion of the episode; this value is referred to as the Q-Value [30]. These Q-Values are recorded in the Q-Table, a matrix with rows representing states and columns representing actions. For the mathematical model, please see the article [29].

3.2. SARSA

SARSA has been explained in the work of [20]; however, we want to give an explanation in this work to have all the concepts within reach that are related to reinforcement learning and the techniques used or reimplemented.

Therefore, SARSA is a slight variation of the popular Q-learning algorithm. It is necessary to mention that for a learning agent in any reinforcement learning algorithm, its policy can be of two types: (1) on-policy and (2) off-policy. For (1) the learning agent learns the value function according to the current action derived from the policy currently in use, while for (2) the agent learns the value function according to the action derived from another policy.

The SARSA technique is on-policy and uses the action performed by the current policy to learn the q-value; that is, for its update, it will depend on the current state (s), the current action (a), the reward obtained (r), the next state (s’) and the next action (a’), referring to the tuple (s, a, r, s’, a’). On the other hand, the Q-learning technique is an off-policy technique and uses the greedy approach to learn the q-value.

4. Whale Optimization Algorithm: Fundamentals

The Whale Optimization Algorithm (WOA) is inspired by the hunting behavior of humpback whales, in particular how they search (Section 4.1), encircle (Section 4.2) and attack (Section 4.3) their prey with a technique known as “bubble nets”. Mirjalili and Lewis devised this algorithm in 2015 [32]. The WOA metaheuristic has also been used in the following works [33,34,35].

The WOA metaheuristic begins with a set of randomly generated solutions. At the end of each iteration, the whales’ locations are updated in relation to a randomly picked whale or the best whale acquired thus far.

4.1. Identifying the Prey

Humpback whales start by searching for and identifying the target prey; this prey is algorithmically assumed to be the current best search agent; therefore, the whales update their position toward this agent in the course of nomerous iterations. This behavior can be reflected mathematically as follows:

\begin{matrix} {\vec{X}}_{i}^{t + 1} & = \vec{X_{r a n d}^{t}} - \vec{A} \cdot \vec{D} \\ \vec{D} & = | \vec{C} \cdot \vec{X_{r a n d}^{t}} - {\vec{X}}_{i}^{t} | \end{matrix}

(16)

where

\vec{A}

and

\vec{C}

are coefficient vectors and t denotes the current iteration,

\vec{X_{r a n d}^{t}}

is a random whale of the population. The coefficient vectors

\vec{A}

and

\vec{C}

can be computed according to Equations (17) and (18):

\vec{A} = 2 \vec{a} \cdot \vec{r} - \vec{a}

(17)

\vec{C} = 2 \cdot \vec{r}

(18)

where

\vec{a}

decreases linearly from 2 to 0 over the course of the iterations (both in the exploration and exploitation phases) and

\vec{r}

corresponds to a random vector of values between

[0, 1]

.

The main goal of the Equations presented in (17) and (18) is to balance the exploration and exploitation of the search space. r, being a random value and same for both Eqs., provides us with the stochastic behavior in updating the future position of the population. Therefore, thanks to Equation (17), the exploration is present when the value of

\vec{A} \geq 1

and exploitation when

\vec{A} < 1

. Finally, thanks to Equation (18), parameter

\vec{C}

, by taking random values between

[0, 2]

, helps us to reduce the probability of being permanently trapped in local solutions. In this way, we can also enhance exploration of the exploitation at any stage of the optimization.

4.2. Encircling the Prey

Once the prey is identified, humpback whales move to close to their prey, forcing them to swim upward and making it impossible for them to escape. Mathematically, it is represented by the following equation

\begin{matrix} {\vec{X}}_{i}^{t + 1} & = \vec{X^{*}} - \vec{A} \cdot \vec{D} \\ \vec{D} & = | \vec{C} \cdot \vec{X^{*}} - {\vec{X}}_{i}^{t} | \end{matrix}

(19)

where

\vec{X^{*}}

is the best whale obtained so far and

{\vec{X}}_{i}^{t}

is the i-th whale in the t-th iteration. The vector

\vec{A}

and

\vec{C}

are calculated as in Equations (17) and (18). It is worth mentioning that

\vec{X^{*}}

must be updated at each iteration if a better solution exists.

4.3. Bubble Netting Technique

The way in which they force their prey to climb to the top and make it impossible for them to escape is because, during the entire course of the encirclement, humpback whales expel bubbles forming a spiral that becomes narrower and narrower until it finally traps its prey. For WOA, this behavior is achieved by decreasing the linear value of a. The manner in which this behavior is imitated is as follows:

\begin{matrix} {\vec{X}}_{i}^{t + 1} & = \vec{D^{'}} \cdot e^{b l} \cdot cos (2 π l) + \vec{X^{*}} \\ \vec{D^{'}} & = | \vec{X^{*}} - {\vec{X}}_{i}^{t} | \end{matrix}

(20)

where

\vec{D^{'}}

is the distance of the i-th whale from the prey (the best solution obtained so far), b is a constant to define the shape of the logarithmic spiral. l is a random number between

[- 1, 1]

.

Humpback whales swim around the prey within a shrinking circle and along a spiral trajectory simultaneously. To model this simultaneous behavior, there is a 50% probability of choosing between the encircling prey mechanism (Section 4.2) or the spiral model (Section 4.3) to update the position of the whales during optimization. The mathematical model is as follows:

\begin{matrix} {\vec{X}}_{i}^{t + 1} = \{\begin{matrix} \vec{X^{*}} - \vec{A} \cdot \vec{D} & If p < 0.5 \\ \vec{D^{'}} \cdot e^{b l} \cdot cos (2 π l) + \vec{X^{*}} & If p \geq 0.5 \end{matrix} \end{matrix}

(21)

In summary, the pseudocode of the WOA is illustrated in Algorithm 1.

Algorithm 1 Whale Optimization Algorithm

Input: The population

X = {X_{1}, X_{2}, \dots, X_{i}}

Output: The updated population

X^{'} = {X_{1}^{'}, X_{2}^{'}, \dots, X_{i}^{'}}

and

X^{*}

1:: Initialize the whale population $X_{i}$ ( $i = 1, 2 \dots, n$ )
2:: Calculate the fitness of each whale
3:: Define the initial state
4:: $X^{*} \leftarrow$ The best whale
5:: while $t \leq$ Maximum number of iterations do
6:: for each whale do
7:: Update $a, A, C, l$ and p
8:: if ( $p < 0, 5$ ) then
9:: if ( $| A | < 1$ ) then
10:: Apply the encircling the prey movement to the current whale
11:: else
12:: Select a random whale ( $X_{r a n d})$
13:: Apply the search for prey movement to the current whale
14:: end if
15:: else ( $p \geq 0, 5$ )
16:: Apply the bubble net attack movement to the current whale
17:: end if
18:: end for
19:: Check if any whale goes beyond the search space and modify it
20:: Calculate the fitness of each whale
21:: Update $X^{*}$ if there is a better solution
22:: $t \leftarrow t_{+ 1}$
23:: end while
24:: Return ( $X^{*}$ )

5. Whale Optimization Algorithm: Q-Binary Version

The WOA was designed to solve continuous optimization problems. To solve a combinatorial problem such as the SCP with WOA, we need to translate the continuous solutions generated by WOA to the binary domain [7]. One of the most widely used techniques in the literature is the two-step technique [36]. As the name implies, it transfers the continuous solutions to the binary domain in two steps. First, it transfers the continuous solutions to the

[0, 1]

domain using transfer functions and then discretizes the solutions in the

[0, 1]

domain using binarization functions. The transfer functions and binarization techniques used in this work can be visualized in Table 1 and Table 2.

The Authors propose in [37] to hybridize Q-learning with WOA to dynamically select binarization schemes. In these cases the actions to be selected are the different combinations between the transfer functions and binarization functions. Specifically, 40 combinations or actions obtained from the 8 transfer functions (4 S-Shaped and 4 V-Shaped) and 5 binarization functions (Standard, Complement, Static Probability, Elitist and Elitist Roulette) were used. This hybrid can be seen in Algorithm 2.

Table 1. Transfer Functions.

Type	Transfer Function
S1 [38,39]	$T (d_{w}^{j}) = \frac{1}{1 + e^{- 2 d_{w}^{j}}}$
S2 [39,40]	$T (d_{w}^{j}) = \frac{1}{1 + e^{- d_{w}^{j}}}$
S3 [38,39]	$T (d_{w}^{j}) = \frac{1}{1 + e^{\frac{- d_{w}^{j}}{2}}}$
S4 [38,39]	$T (d_{w}^{j}) = \frac{1}{1 + e^{\frac{- d_{w}^{j}}{3}}}$
V1 [39,41]	$T (d_{w}^{j}) = \|e r f (\frac{\sqrt{π}}{2} d_{w}^{j})\|$
V2 [39,41]	$T (d_{w}^{j}) = \|t a n h (d_{w}^{j})\|$
V3 [38,39]	$T (d_{w}^{j}) = \|\frac{d_{w}^{j}}{\sqrt{1 + {(d_{w}^{j})}^{2}}}\|$
V4 [38,39]	$T (d_{w}^{j}) = \|\frac{2}{π} a r c t a n (\frac{π}{2} d_{w}^{j})\|$
X1 [42,43]	$T (d_{w}^{j}) = \frac{1}{1 + e^{2 d_{w}^{j}}}$
X2 [42,43]	$T (d_{w}^{j}) = \frac{1}{1 + e^{d_{w}^{j}}}$
X3 [42,43]	$T (d_{w}^{j}) = \frac{1}{1 + e^{\frac{d_{w}^{j}}{2}}}$
X4 [42,43]	$T (d_{w}^{j}) = \frac{1}{1 + e^{\frac{d_{w}^{j}}{3}}}$
Z1 [44,45]	$T (d_{w}^{j}) = \sqrt{1 - 2^{d_{w}^{j}}}$
Z2 [44,45]	$T (d_{w}^{j}) = \sqrt{1 - 5^{d_{w}^{j}}}$
Z3 [44,45]	$T (d_{w}^{j}) = \sqrt{1 - 8^{d_{w}^{j}}}$
Z4 [44,45]	$T (d_{w}^{j}) = \sqrt{1 - 20^{d_{w}^{j}}}$

Table 2. Binarization Functions.

Type	Binarization
Standard	$X_{n e w}^{j} = \{\begin{matrix} 1 & i f r a n d \leq T (d_{w}^{j}) \\ 0 & e l s e . \end{matrix}$
Complement	$X_{n e w}^{j} = \{\begin{matrix} X_{w}^{j} & i f r a n d \leq T (d_{w}^{j}) \\ 0 & e l s e . \end{matrix}$
Static Probability	$X_{n e w}^{j} = \{\begin{matrix} 0 & i f T (d_{w}^{j}) \leq α \\ X_{w}^{j} & i f α < T (d_{w}^{j}) \leq \frac{1}{2} (1 + α) \\ 1 & i f T (d_{w}^{j}) \geq \frac{1}{2} (1 + α) \end{matrix}$
Elitist	$X_{n e w}^{j} = \{\begin{matrix} X_{B e s t}^{j} & i f r a n d < T (d_{w}^{j}) \\ 0 & e l s e . \end{matrix}$
Roulette Elitist	$X_{n e w}^{j} = \{\begin{matrix} P [X_{n e w}^{j} = ζ_{j}] = \frac{f (ζ)}{\sum_{δ \in Q_{g}} f (δ)} & if rand \leq T (d_{w}^{j}) \\ P [X_{n e w}^{j} = 0] = 1 & e l s e . \end{matrix}$

Algorithm 2 Q-Binary Whale Optimization Algorithm

Input: The population

X = {X_{1}, X_{2}, \dots, X_{i}}

Output: The updated population

X^{'} = {X_{1}^{'}, X_{2}^{'}, \dots, X_{i}^{'}}

and

X^{*}

1:: Initialize the whale population $X_{i}$ ( $i = 1, 2, \dots, n$ )
2:: Initialize Q-Table
3:: Calculate the fitness of each whale
4:: Define the initial state
5:: $X^{*} \leftarrow$ The best whale
6:: while $t \leq$ Maximum number of iterations do
7:: for each whale do
8:: a ← Selection of binarization schemes from Q-Table
9:: Update $a, A, C, l$ and p
10:: if ( $p < 0, 5$ ) then
11:: if ( $| A | < 1$ ) then
12:: Apply the encircling the prey movement to the current whale
13:: else
14:: Select a random whale ( $X_{r a n d})$
15:: Apply the search for prey movement to the current whale
16:: end if
17:: else
18:: Apply the bubble net attack movement to the current whale
19:: end if
20:: end for
21:: Binarization of population with binarization scheme a
22:: Check if any whale goes beyond the search space and modify it
23:: Calculate the fitness of each whale
24:: Define the next state
25:: Update Q-Table
26:: Update $X^{*}$ if there is a better solution
27:: $t \leftarrow t_{+ 1}$
28:: end while
29:: Return ( $X^{*}$ )

5.1. Exploration and Exploitation of Actions

As explained previously, reinforcement learning algorithms seek to find the best action for a particular state. Therefore, it is necessary to efficiently define the pool of actions to be considered so that the algorithm can have different alternatives to consider in a given state. With this in mind, our proposal seeks to demonstrate that by increasing the alternatives to consider when selecting actions, Q-learning performs better. That is why we will use the same approach proposed in [37], where they use as actions the combination between the transfer functions and the binarization functions of the two-step technique. The authors proposed in [37] 8 transfer functions (4 S-Shaped and 4 V-Shaped) and 5 binarization functions (Standard, Complement, Probalistic Static, Elitist and Elitist Roulette), thus obtaining a total of 40 actions. In our proposals, we consider these 40 actions and add 9 transfer functions, thus obtaining 40 new actions and having a total of 80 actions to select. The transfer functions used can be seen in Table 1 and the binarization functions used can be seen in Table 2.

5.2. Feasibility and Repair Heuristic

When working with stochastic algorithms such as metaheuristics, there is a possibility that perturbing the solutions may generate a solution that does not satisfy all the constraints of the problem. The probability of generating infeasible solutions increases in our proposal since we transfer continuous solutions to the binary domain when applying binarization schemes. In this session we will discuss the repair heuristic applied in our proposal and the solution feasibility verification heuristic.

First, we need to know whether the solution is feasible or not. To this end, we use the incidence matrix A of the Set Covering Problem instance and the solution to be checked. Algorithm 3 outlines all steps for feasibility validation. The objective of this algorithm is to identify constraints that are not covered by any dimension of the solution. As a result we get two things. First, a boolean that tells us whether the solution is feasible or not. Secondly, we get the number of columns that cover a given constraint. If we get a 0 for a constraint it indicates that we have an infeasible solution.

Algorithm 3 Feasibility Heuristic

Input: Matrix A of incidence and solution

Output: coverage, is number of columns covering a row and feasibility

1:: feasibility = True
2:: for i = 1 to M do
3:: for j = 1 to N do
4:: if $a_{i, j} = = 1$ and $s o l u t i o n_{j} = = 1$ then
5:: sum number of columns satisfying the $c o n s t r a i n t_{i}$
6:: end if
7:: end for
8:: if there are no columns satisfying the $c o n s t r a i n t_{i}$ then
9:: feasibility = False
10:: end if
11:: end for
12:: Return coverage and feasibility

After validating feasibility, we must repair solutions that are not feasible. Repair consists of turning infeasible solutions into feasible ones by applying certain particular criteria to the problem. The detail of the repair used in our proposal is shown in Algorithm 4. The repair consists of calculating a percentage based on the cost of column j over the sum of all the rows of the constraint matrix covered by a column j. This equation is as follows:

\begin{matrix} t r a d e o f f = \frac{c_{j}}{\sum_{i = 1}^{M} a_{i, j}} \end{matrix}

(22)

where

c_{j}

is the cost of the j-th column and

a_{i, j}

is the value of the incidence matrix.

Unfeasible solutions are repaired by summing the columns of the solution that had the lowest percentage. Once we have activated a column, we run the feasibility test again. This process is repeated until a feasible solution is obtained.

Algorithm 4 Repair heuristics

Input: Matrix A of incidence and unfeasible solution

Output: feasible solution

1:: Carry out feasibility test by Algorithm 3
2:: while unfeasible solution do
3:: if restriction not covered by a column then
4:: calculate column tradeoff using Equation (22)
5:: position = position of the minimum tradeoff argument
6:: $s o l u t i o n_{p o s i t i o n} = 1$ ▹ Repaired restriction
7:: end if
8:: Carry out feasibility test by Algorithm 3
9:: end while
10:: Return feasible solution

5.3. Complexity Analysis

A very critical area in software development is the computation time of our solutions. The Big-O asymptotic notation helps us to quantify the algorithmic complexity in terms of worst-case execution time. To determine the algorithm competence of our proposal, we divide the implementation into 3 phases. The first is the initialization phase, the second is the perturbation phase and the third is the repair phase.

For the initialization of solutions we have a complexity of

O (N)

where N denotes the number of individuals in the population. For the initialization of the Q-Table used in Q-Learning and SARSA we have a complexity of

O (A \cdot S)

where A indicates the number of actions and S the number of states. To determine the initial state of the population we need to calculate the diversity. The complexity of calculating the population diversity is

O (N \cdot D)

where D is the number of decision variables in the problem. Thus, the complexity of the initialization phase is

O ((A \cdot S) + (N \cdot D))

.

For solution perturbation, solution binarization, diversity calculation and fitness calculation at each iteration we have a complexity of

O (N \cdot D \cdot T)

where T indicates the total number of iterations executed. For the selection of actions and the updating of the Q-Values in the Q-Table we have a complexity of

O (T)

. For the perturbation phase we obtain an algorithmic complexity of

O (T ((N \cdot D) 1)))

.

For the feasibility of solutions we obtain a complexity of

O (N \cdot R \cdot D)

, where R is the number of constraints of the problem. For the repair heuristic we have a complexity of

O ({(N \cdot R \cdot D)}^{2})

. To this end, the complexity of the repair phase is of

O ({(N \cdot R \cdot D)}^{2})

).

Taking into account the initialization phase, the perturbation phase and the repair phase, we obtain the following algorithm complexity for our proposal:

\begin{matrix} O ((A \cdot S) + (T \cdot N \cdot D) + {(N \cdot R \cdot D)}^{2}) \end{matrix}

(23)

6. Experimentation Results

To validate the performance of our proposal, we used 4 different instances of WOA. The first, called QBWOA, incorporates Q-Learning to dynamically select the 80 binarization schemes. The second, called BSWOA, we incorporated SARSA to dynamically select the 80 binarization schemes. The third, called BCL, is a version with binarization scheme based on the recommendation of the authors in their work [46], the fixed scheme is V4-Elitist. The fourth, called MIR, is a version with fixed binarization scheme based on the recommendation of the authors in their work, the fixed scheme is V4-Complement [39].

The benchmark instances of the Set Covering Problem solved are those proposed in Beasley’s OR-Library [47]. These instances are widely known and used to test the performance of different algorithms when solving a combinatorial optimisation problem such as the Set Covering Problem. In particular, we solved 45 instances delivered in this library.

The code was coded in Python 3.8 and executed using the free Google Colaboratory [48] services. The results were stored and processed from databases provided by the Google Cloud Platform. The authors in [46] suggest making 40000 calls to the objective function. To this end, we use 40 population individuals and 1000 iterations for all instances WOA run. Thirty-one independent runs were performed for each instance of WOA executed. All parameters used for WOA, Q-Learning and SARSA are detailed in Table 3.

Table 4 shows the results obtained in this work. To read the table it is necessary to understand what each acronym means, so we proceed to explain it. In the first column (Inst.) is the name of each of the instances. The next four columns are replicated for each technique; thus, for QL, SARSA, BCL and MIR, we have (Opt.) is the known optimum for each of the instances, (Best) represents the best results of the 31 independent runs, and the mean of the results is in (Avg). We have also considered the average relative percentage deviation (RPD) as a quality metric according to Equation (24) which quantifies how close a solution is to the optimum, and finally, the last row is the sum of each column. As for the colors used, we have used a light shade of gray with bold text to represent and imply those that obtained the best results in the corresponding instance. As mentioned above, the comparison is a reimplementation and extension of the proposed algorithm and named BSWOA in [20]; by extension we mean the number of actions or the combinatorics existing between the transfer functions (16) plus the binarization techniques (5) that can be taken by the smart selector.

RPD = \frac{100 \cdot (B e s t - O p t)}{O p t} .

(24)

We have also studied whether there are significant differences between the different techniques used in this work. As the samples are independent and the data do not follow a normal distribution, we assumed the Wilcoxon-Mann-Whitney test. The p-values obtained were analyzed considering a significance level of 0.05. From this analysis and for each set of cases, the Table 5 presents the total average of the cases analyzed.

The table is composed as follows: the first column presents the techniques used, the following columns is the competition, explained otherwise, for the second row of the first column [QBWOA] is compared with the first of [BSWOA], [BCL], [MIR], [40QBWOA] and finally with [40BSWOA], these last two are comparisons of the original version of 40 actions proposed in [20]. If the value of this comparison is greater than the significance value of 0.05 it is presented as “≥”, for the comparison between itself the symbol “-” and finally the values, which have been approximated to the third decimal place. The color shade is maintained as in the previous table of results.

Analyses of Exploration and Exploitation

It is widely known that metaheuristics perform the search process in two important phases: exploration and exploitation. Therefore, it is of great interest to evaluate the behavior of metaheuristics in terms of exploration and exploitation.

Morales-Castañeda et al. in [49] propose some Equations which can obtain a percentage of exploration (XPL%) and a percentage of exploitation (XPT%) based on the diversity of the population. These Equations are as follows:

X P L % = (\frac{D i v_{t}}{D i v_{m a x}}) \times 100, X P T % = (\frac{| D i v_{t} - D i v_{m a x} |}{D i v_{m a x}}) \times 100

(25)

where

D i v_{t}

corresponds to the current diversity in the t-th iteration and

D i v_{m a x}

corresponds to the maximum diversity obtained in the entire search process.

Authors in [37,50,51] use Dimensional-Hussan Diversity [52] in order to calculate the diversity of the population at each iteration. Dimensional-Hussain Diversity is a diversity based on central measurements and is defined as:

D_{d h} (X) = \frac{1}{l \cdot n} \sum_{d = 1}^{l} \sum_{i = 1}^{n} | m e a n (x^{d}) - x_{i}^{d} |

(26)

where

D_{d} h (X)

corresponds to the Dimensional-Hussan Diversity of the population X,

m e a n (x^{d})

is average of the d-th dimension, n is the number of search agents in the population X and l is the number of dimension of the optimization problem.

Based on the above and following these forms of measurement, we have obtained the graphs presented in Figure 3 denominated as “Exploration and exploitation graphs”, where the X axis is the number of total iterations, while the Y axis is the percentage of exploration and exploitation, results obtained by applying the Equations (25) and (26).

These graphs allow us to better understand the results obtained. The algorithms where we included a dynamic selection technique of binarization schemes had a better exploration and exploitation balance. The algorithms that executed fixed binarization techniques had unbalanced behaviors. On the one hand, the algorithm called MIR had an explorative behavior being unable to find the promising regions within the search space. On the other hand, the algorithm called BCL had an exploitative behavior leading it to drop to local optima prematurely.

7. Conclusions

In this work, we have implemented QL and reimplemented SARSA as smart binarization scheme selectors in the metaheuristic whale optimization algorithm, a new and novel way of binarizing metaheuristics that specializes in working on continuous domains. We solved 45 instances of the OR library of the set covering problem. When compared with its closest version BSWOA, it is observed that QBWOA obtains better results.

On average we can conclude thanks to the statistical tests (Table 5) that the versions of QL, SARSA and BCL compared to each other do not present any significant statistical difference, however, when we compare these 3 with the version of MIR (V4-Complement) we can observe a different behavior, in this case, there are statistically significant better solutions concluding that the versions of QL, SARSA and BCL (V4-Elitist) have statistically better results.

This also helps us to answer the research question, implying that, despite having more transfer functions, or, in other words, more actions, at first glance it seems that there is an improvement, however, we realize that despite obtaining competitive results and solutions these are not statistically significant. Therefore, this implies that the path of increasing the number of actions does not necessarily lead to better performance (at least for this research, i.e., number of actions and problem solved), i.e., it is valid to continue exploring other combinations of actions, to try the same actions in another problem or to decrease the pool of actions.

When observing the behavior of the exploration and exploitation percentages (Figure 3) for the whale metaheuristic, it can be clearly observed that for the cases there are 4 different exploration and exploitation behaviors, where for the case of QBWOA, we can notice that only the first iterations it begins to explore and gradually goes on to exploit until ending with high values during the rest of the iterations. While in the case of BSWOA, it happens that the exploration percentage starts high and begins to decrease until a break point to become exploited in the rest of iterations. For the BCL version (V4-Supplement) a clear beginning of exploration is noticed in the first iterations, while the exploitation increases until maintaining high values during the rest of the iterations, this behavior together with QBWOA and BSWOA is the one recommended by [49]. Finally, the behavior similar to random searches is reflected by the MIR version (V4-Complement), where the exploration percentage remains at high values during all iterations.

As for transfer functions, V-shaped or Z-shaped functions are suitable for solving limited search or exploit problems and S-shaped or X-shaped functions are suitable for solving large search space or explore problems. As the problem size increases, discretization techniques with a better exploration-to-exploitation ratio outperform the others. We recommend the use of Standard, Elitist Roulette and Elitist to solve all problems. Finally, it is especially significant to select a binarization approach suitable for solving large problems, it is for this reason, that we recommend BQWOA as a possible testing bench for future work, as they reduce tuning times by not having to evaluate the combinations of the different binarization schemes present in the literature, facilitating the implementation of these techniques. As future lines of research, it would be interesting to consider other metaheuristics, including some that give the best results solving the SCP in the current literature and solve other problems.

Author Contributions

M.B.-R.: Project administration, Writing—original draft, Investigation, Formal analysis, Conceptualization, Methodology, Writing—review & editing. F.C.-C.: Writing—original draft, Investigation, Formal analysis, Conceptualization, Methodology, Writing—review & editing. B.C.: Validation, Resources, Writing—review & editing, Supervision, Funding acquisition. R.S.: Validation, Resources, Supervision, Funding acquisition. J.G.: Validation, Resources G.A.: Investigation, Validation, Resources, Visualization. W.P.: Validation, Resources, Funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

Crawford, Ricardo Soto, Gino Astorga and Wenceslao Palma are supported by Grant ANID/FONDECYT/REGULAR/1210810. Marcelo Becerra-Rozas is supported by National Agency for Research and Development (ANID)/Scholarship Program/DOCTORADO NACIONAL/2021-21210740. Felipe Cisternas-Caneo is supported by Beca INF-PUCV.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The code used can be found in: https://github.com/imaberro/BSS-QL-80actions (accessed on 4 September 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Guo, H.; Liu, B.; Cai, D.; Lu, T. Predicting protein–protein interaction sites using modified support vector machine. Int. J. Mach. Learn. Cybern. 2018, 9, 393–398. [Google Scholar] [CrossRef]
Korkmaz, S.; Babalik, A.; Kiran, M.S. An artificial algae algorithm for solving binary optimization problems. Int. J. Mach. Learn. Cybern. 2018, 9, 1233–1247. [Google Scholar] [CrossRef]
Penadés-Plà, V.; García-Segura, T.; Yepes, V. Robust design optimization for low-cost concrete box-girder bridge. Mathematics 2020, 8, 398. [Google Scholar] [CrossRef]
Crawford, B.; Soto, R.; Lemus-Romani, J.; Astorga, G.; Matus de la Parra, S.; Peña-Fritz, A.; Valenzuela, M.; García, J.; de la Fuente-Mella, H.; Castro, C.; et al. Investigating the efficiency of swarm algorithms for bridge strengthening by conversion to tied-arch: A numerical case study on San Luis bridge. Iran. J. Sci. Technol. Trans. Civ. Eng. 2021, 45, 2345–2357. [Google Scholar] [CrossRef]
Lemus-Romani, J.; Alonso, B.; Moura, J.L.; Crawford, B.; Soto, R.; González, F. Limited Stop Services Design Considering Variable Dwell Time and Operating Capacity Constraints. IEEE Access 2021, 9, 30359–30373. [Google Scholar] [CrossRef]
Al-Madi, N.; Faris, H.; Mirjalili, S. Binary multi-verse optimization algorithm for global optimization and discrete problems. Int. J. Mach. Learn. Cybern. 2019, 10, 3445–3465. [Google Scholar] [CrossRef]
Crawford, B.; Soto, R.; Astorga, G.; García, J.; Castro, C.; Paredes, F. Putting continuous metaheuristics to work in binary search spaces. Complexity 2017, 2017, 8404231. [Google Scholar] [CrossRef]
Talbi, E.G. Combining metaheuristics with mathematical programming, constraint programming and machine learning. Ann. Oper. Res. 2016, 240, 171–215. [Google Scholar] [CrossRef]
Talbi, E.G. Machine learning into metaheuristics: A survey and taxonomy. ACM Comput. Surv. (CSUR) 2021, 54, 1–32. [Google Scholar] [CrossRef]
García, J.; Crawford, B.; Soto, R.; Castro, C.; Paredes, F. A k-means binarization framework applied to multidimensional knapsack problem. Appl. Intell. 2018, 48, 357–380. [Google Scholar] [CrossRef]
García, J.; Moraga, P.; Valenzuela, M.; Crawford, B.; Soto, R.; Pinto, H.; Peña, A.; Altimiras, F.; Astorga, G. A Db-Scan Binarization Algorithm Applied to Matrix Covering Problems. Comput. Intell. Neurosci. 2019, 2019, 3238574. [Google Scholar] [CrossRef] [PubMed]
Caserta, M.; Voß, S. Metaheuristics: Intelligent problem solving. In Matheuristics; Springer: Berlin/Heidelberg, Germany, 2009; pp. 1–38. [Google Scholar]
Schermer, D.; Moeini, M.; Wendt, O. A matheuristic for the vehicle routing problem with drones and its variants. Transp. Res. Part Emerg. Technol. 2019, 106, 166–204. [Google Scholar] [CrossRef]
Roshani, M.; Phan, G.; Roshani, G.H.; Hanus, R.; Nazemi, B.; Corniani, E.; Nazemi, E. Combination of X-ray tube and GMDH neural network as a nondestructive and potential technique for measuring characteristics of gas-oil–water three phase flows. Measurement 2021, 168, 108427. [Google Scholar] [CrossRef]
Roshani, S.; Jamshidi, M.B.; Mohebi, F.; Roshani, S. Design and modeling of a compact power divider with squared resonators using artificial intelligence. Wirel. Pers. Commun. 2021, 117, 2085–2096. [Google Scholar] [CrossRef]
Nazemi, B.; Rafiean, M. Forecasting house prices in Iran using GMDH. Int. J. Hous. Mark. Anal. 2020. [Google Scholar] [CrossRef]
Karimi-Mamaghan, M.; Mohammadi, M.; Meyer, P.; Karimi-Mamaghan, A.M.; Talbi, E.G. Machine Learning at the service of Meta-heuristics for solving Combinatorial Optimization Problems: A state-of-the-art. Eur. J. Oper. Res. 2022, 296, 393–422. [Google Scholar]
Zhao, F.; Zhang, L.; Cao, J.; Tang, J. A cooperative water wave optimization algorithm with reinforcement learning for the distributed assembly no-idle flowshop scheduling problem. Comput. Ind. Eng. 2021, 153, 107082. [Google Scholar] [CrossRef]
Mosadegh, H.; Ghomi, S.F.; Süer, G.A. Stochastic mixed-model assembly line sequencing problem: Mathematical modeling and Q-learning based simulated annealing hyper-heuristics. Eur. J. Oper. Res. 2020, 282, 530–544. [Google Scholar] [CrossRef]
Becerra-Rozas, M.; Lemus-Romani, J.; Crawford, B.; Soto, R.; Cisternas-Caneo, F.; Embry, A.T.; Molina, M.A.; Tapia, D.; Castillo, M.; Misra, S.; et al. Reinforcement Learning Based Whale Optimizer. In Proceedings of the International Conference on Computational Science and Its Applications, Cagliari, Italy, 13–16 September 2021; pp. 205–219. [Google Scholar]
Becerra-Rozas, M.; Lemus-Romani, J.; Crawford, B.; Soto, R.; Cisternas-Caneo, F.; Embry, A.T.; Molina, M.A.; Tapia, D.; Castillo, M.; Rubio, J.M. A New Learnheuristic: Binary SARSA-Sine Cosine Algorithm (BS-SCA). In Proceedings of the International Conference on Metaheuristics and Nature Inspired Computing, Marrakech, Morocco, 27–30 October 2021; pp. 127–136. [Google Scholar]
Cisternas-Caneo, F.; Crawford, B.; Soto, R.; de la Fuente-Mella, H.; Tapia, D.; Lemus-Romani, J.; Castillo, M.; Becerra-Rozas, M.; Paredes, F.; Misra, S. A Data-Driven Dynamic Discretization Framework to Solve Combinatorial Problems Using Continuous Metaheuristics. In Innovations in Bio-Inspired Computing and Applications; Abraham, A., Sasaki, H., Rios, R., Gandhi, N., Singh, U., Ma, K., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 76–85. [Google Scholar] [CrossRef]
Garey, M.R.; Johnson, D.S. Computers and Intractability; Freeman: San Francisco, CA, USA, 1979; Volume 174. [Google Scholar]
Vianna, S.S. The set covering problem applied to optimisation of gas detectors in chemical process plants. Comput. Chem. Eng. 2019, 121, 388–395. [Google Scholar] [CrossRef]
Zhang, L.; Shaffer, B.; Brown, T.; Samuelsen, G.S. The optimization of DC fast charging deployment in California. Appl. Energy 2015, 157, 111–122. [Google Scholar] [CrossRef]
Park, Y.; Nielsen, P.; Moon, I. Unmanned aerial vehicle set covering problem considering fixed-radius coverage constraint. Comput. Oper. Res. 2020, 119, 104936. [Google Scholar] [CrossRef]
Alizadeh, R.; Nishi, T. Hybrid set covering and dynamic modular covering location problem: Application to an emergency humanitarian logistics problem. Appl. Sci. 2020, 10, 7110. [Google Scholar] [CrossRef]
Vieira, B.S.; Ferrari, T.; Ribeiro, G.M.; Bahiense, L.; Orrico Filho, R.D.; Abramides, C.A.; Júnior, N.F.R.C. A progressive hybrid set covering based algorithm for the traffic counting location problem. Expert Syst. Appl. 2020, 160, 113641. [Google Scholar] [CrossRef]
Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
Lazaric, A.; Restelli, M.; Bonarini, A. Reinforcement learning in continuous action spaces through sequential monte carlo methods. In Proceedings of the 20th International Conference on Neural Information Processing Systems 2007, Vancouver, BC, Canada, 3–6 December 2007; Volume 20. [Google Scholar]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Dewi, S.K.; Utama, D.M. A new hybrid whale optimization algorithm for green vehicle routing problem. Syst. Sci. Control. Eng. 2021, 9, 61–72. [Google Scholar] [CrossRef]
Utama, D.M.; Widodo, D.S.; Ibrahim, M.F.; Hidayat, K.; Baroto, T.; Yurifah, A. The hybrid whale optimization algorithm: A new metaheuristic algorithm for energy-efficient on flow shop with dependent sequence setup. J. Phys. Conf. Ser. IOP Publ. 2020, 1569, 022094. [Google Scholar] [CrossRef]
Utama, D. Minimizing Number of Tardy Jobs in Flow Shop Scheduling Using A Hybrid Whale Optimization Algorithm. J. Phys. Conf. Ser. IOP Publ. 2021, 1845, 012017. [Google Scholar] [CrossRef]
Lanza-Gutierrez, J.M.; Caballe, N.; Crawford, B.; Soto, R.; Gomez-Pulido, J.A.; Paredes, F. Exploring further advantages in an alternative formulation for the set covering problem. Math. Probl. Eng. 2020, 2020, 5473501. [Google Scholar] [CrossRef]
Crawford, B.; Soto, R.; Lemus-Romani, J.; Becerra-Rozas, M.; Lanza-Gutiérrez, J.M.; Caballé, N.; Castillo, M.; Tapia, D.; Cisternas-Caneo, F.; García, J.; et al. Q-learnheuristics: Towards data-driven balanced metaheuristics. Mathematics 2021, 9, 1839. [Google Scholar] [CrossRef]
Crawford, B.; Soto, R.; Olivares-Suarez, M.; Palma, W.; Paredes, F.; Olguin, E.; Norero, E. A binary coded firefly algorithm that solves the set covering problem. Rom. J. Inf. Sci. Technol. 2014, 17, 252–264. [Google Scholar]
Mirjalili, S.; Lewis, A. S-shaped versus V-shaped transfer functions for binary particle swarm optimization. Swarm Evol. Comput. 2013, 9, 1–14. [Google Scholar] [CrossRef]
Geem, Z.W.; Kim, J.H.; Loganathan, G.V. A new heuristic optimization algorithm: Harmony search. Simulation 2001, 76, 60–68. [Google Scholar] [CrossRef]
Rajalakshmi, N.; Padma Subramanian, D.; Thamizhavel, K. Performance enhancement of radial distributed system with distributed generators by reconfiguration using binary firefly algorithm. J. Inst. Eng. (India) Ser. B 2015, 96, 91–99. [Google Scholar] [CrossRef]
Ghosh, K.K.; Singh, P.K.; Hong, J.; Geem, Z.W.; Sarkar, R. Binary social mimic optimization algorithm with x-shaped transfer function for feature selection. IEEE Access 2020, 8, 97890–97906. [Google Scholar] [CrossRef]
Beheshti, Z. A novel x-shaped binary particle swarm optimization. Soft Comput. 2021, 25, 3013–3042. [Google Scholar] [CrossRef]
Guo, S.s.; Wang, J.s.; Guo, M.w. Z-shaped transfer functions for binary particle swarm optimization algorithm. Comput. Intell. Neurosci. 2020, 2020, 6502807. [Google Scholar] [CrossRef]
Sun, W.Z.; Zhang, M.; Wang, J.S.; Guo, S.S.; Wang, M.; Hao, W.K. Binary Particle Swarm Optimization Algorithm Based on Z-shaped Probability Transfer Function to Solve 0-1 Knapsack Problem. IAENG Int. J. Comput. Sci. 2021, 48, 294–303. [Google Scholar]
Lanza-Gutierrez, J.M.; Crawford, B.; Soto, R.; Berrios, N.; Gomez-Pulido, J.A.; Paredes, F. Analyzing the effects of binarization techniques when solving the set covering problem through swarm optimization. Expert Syst. Appl. 2017, 70, 67–82. [Google Scholar] [CrossRef]
Beasley, J.; Jörnsten, K. Enhancing an algorithm for set covering problems. Eur. J. Oper. Res. 1992, 58, 293–300. [Google Scholar] [CrossRef]
Bisong, E. Google Colaboratory. In Building Machine Learning and Deep Learning Models on Google Cloud Platform: A Comprehensive Guide for Beginners; Apress: Berkeley, CA, USA, 2019; pp. 59–64. [Google Scholar] [CrossRef]
Morales-Castañeda, B.; Zaldivar, D.; Cuevas, E.; Fausto, F.; Rodríguez, A. A better balance in metaheuristic algorithms: Does it exist? Swarm Evol. Comput. 2020, 54, 100671. [Google Scholar] [CrossRef]
Crawford, B.; Soto, R.; Cisternas-Caneo, F.; Tapia, D.; de la Fuente-Mella, H.; Palma, W.; Lemus-Romani, J.; Castillo, M.; Becerra-Rozas, M. A Comparison of Learnheuristics Using Different Reward Functions to Solve the Set Covering Problem. In Optimization and Learning; Dorronsoro, B., Amodeo, L., Pavone, M., Ruiz, P., Eds.; Springer International Publishing: Cham, Switzerlan, 2021; pp. 74–85. [Google Scholar] [CrossRef]
Lemus-Romani, J.; Becerra-Rozas, M.; Crawford, B.; Soto, R.; Cisternas-Caneo, F.; Vega, E.; Castillo, M.; Tapia, D.; Astorga, G.; Palma, W.; et al. A Novel Learning-Based Binarization Scheme Selector for Swarm Algorithms Solving Combinatorial Problems. Mathematics 2021, 9, 2887. [Google Scholar] [CrossRef]
Hussain, K.; Zhu, W.; Salleh, M.N.M. Long-term memory Harris’ hawk optimization for high dimensional and optimal power flow problems. IEEE Access 2019, 7, 147596–147616. [Google Scholar] [CrossRef]

Figure 1. An example of SCP.

Figure 2. Solution to the practical example of SCP.

Figure 3. Percentage of exploration and exploitation of algorithms resolving the 41 instance.

Table 3. Parameters setting.

Parameter	Value
Independent runs	31
Number of populations	40
Number of iterations	1000
parameter a of WOA	decreases linearly from 2 to 0
parameter b of WOA	1
parameter $α$ of Q-Learning and SARSA	0.1
parameter $γ$ of Q-Learning and SARSA	0.4

Table 4. Comparison of the metaheuristics WOA.

	QBWOA				BSWOA				BCL				MIR				40BQWOA				40BSWOA
Inst.	Opt	Best	Avg	RPD	Opt	Best	Avg	RPD	Opt	Best	Avg	RPD	Opt	Best	Avg	RPD	Opt	Best	Avg	RPD	Opt	Best	Avg	RPD
41	429	431.0	781.87	0.47	429	430.0	610.0	0.23	429	489.0	607.55	13.99	429	638.0	715.87	48.72	429	435	439.48	1.4	429	430	434.6	0.23
42	512	522.0	839.19	1.95	512	519.0	872.87	1.37	512	679.0	852.94	32.62	512	1079.0	1181.94	110.74	512	538	546.44	5.08	512	523	535.6	2.15
43	516	519.0	912.03	0.58	516	520.0	979.13	0.78	516	739.0	884.48	43.22	516	1222.0	1293.61	136.82	516	537	543.78	4.07	516	525	532.7	1.74
44	494	500.0	789.13	1.21	494	503.0	776.26	1.82	494	596.0	779.06	20.65	494	954.0	1052.03	93.12	494	519	526.33	5.06	494	497	508.0	0.61
45	512	517.0	913.68	0.98	512	520.0	1032.16	1.56	512	755.0	874.68	47.46	512	1067.0	1204.13	108.4	512	537	541.89	4.88	512	523	528.4	2.15
46	560	565.0	1007.32	0.89	560	564.0	1028.68	0.71	560	815.0	981.26	45.54	560	1389.0	1483.13	148.04	560	573	580.33	2.32	560	567	570.3	1.25
47	430	433.0	710.55	0.7	430	434.0	645.0	0.93	430	570.0	682.42	32.56	430	854.0	931.84	98.6	430	440	445.29	2.33	430	435	439.5	1.16
48	492	496.0	950.55	0.81	492	497.0	836.39	1.02	492	713.0	875.42	44.92	492	1148.0	1236.16	133.33	492	505	507.83	2.64	492	496	499.6	0.81
49	641	659.0	1297.0	2.81	641	664.0	1422.0	3.59	641	922.0	1141.55	43.84	641	1532.0	1706.1	139.0	641	686	690.8	7.02	641	671	677.6	4.68
410	514	515.0	825.61	0.19	514	517.0	812.45	0.58	514	696.0	847.0	35.41	514	1046.0	1180.97	103.5	514	530	532.4	3.11	514	521	524.1	1.36
51	253	259.0	418.9	2.37	253	259.0	473.1	2.37	253	347.0	411.1	37.15	253	543.0	614.74	114.62	253	262	267.71	3.56	253	258	262.7	1.98
52	302	314.0	617.1	3.97	302	316.0	628.74	4.64	302	468.0	577.35	54.97	302	762.0	913.61	152.32	302	326	332.17	7.95	302	319	325.3	5.63
53	226	228.0	410.39	0.88	226	229.0	325.48	1.33	226	313.0	377.84	38.5	226	506.0	558.06	123.89	226	232	233.5	2.65	226	230	230.8	1.77
54	242	246.0	503.58	1.65	242	247.0	405.26	2.07	242	318.0	394.97	31.4	242	540.0	580.19	123.14	242	250	252.5	3.31	242	247	249.4	2.07
55	211	212.0	356.87	0.47	211	213.0	355.42	0.95	211	294.0	339.65	39.34	211	397.0	427.84	88.15	211	216	218.83	2.37	211	212	214.5	0.47
56	213	213.0	369.68	0.0	213	214.0	335.81	0.47	213	334.0	389.71	56.81	213	507.0	544.58	138.03	213	227	229.0	6.57	213	218	221.3	2.35
57	293	297.0	538.74	1.37	293	298.0	548.81	1.71	293	387.0	504.06	32.08	293	642.0	710.52	119.11	293	311	313.2	6.14	293	302	304.6	3.07
58	288	290.0	534.9	0.69	288	290.0	395.81	0.69	288	399.0	497.35	38.54	288	663.0	745.32	130.21	288	298	299.33	3.47	288	291	293.9	1.04
59	279	282.0	602.74	1.08	279	281.0	562.42	0.72	279	404.0	497.03	44.8	279	673.0	740.35	141.22	279	284	287.4	1.79	279	282	284.3	1.08
510	265	267.0	461.35	0.75	265	266.0	402.58	0.38	265	390.0	470.13	47.17	265	594.0	669.58	124.15	265	277	278.33	4.53	265	267	273.1	0.75
61	138	140.0	480.94	1.45	138	140.0	654.55	1.45	138	283.0	403.26	105.07	138	736.0	836.52	433.33	138	144	146.68	4.35	138	141	144.1	2.17
62	146	146.0	603.32	0.0	146	146.0	619.52	0.0	146	320.0	561.77	119.18	146	1100.0	1211.81	653.42	146	154	155.83	5.48	146	147	152.3	0.68
63	145	145.0	479.03	0.0	145	147.0	696.0	1.38	145	337.0	537.19	132.41	145	912.0	1125.97	528.97	145	149	150.4	2.76	145	147	148.4	1.38
64	131	132.0	308.0	0.76	131	131.0	376.81	0.0	131	246.0	366.19	87.79	131	652.0	722.42	397.71	131	132	134.17	0.76	131	131	133.1	0.0
65	161	162.0	842.87	0.62	161	162.0	652.77	0.62	161	357.0	531.74	121.74	161	1020.0	1155.81	533.54	161	180	181.5	11.8	161	163	172.2	1.24
a1	253	260.0	785.84	2.77	253	260.0	800.94	2.77	253	455.0	662.71	79.84	253	1243.0	1352.03	391.3	253	263	266.84	3.95	253	260	263.2	2.77
a2	252	259.0	886.87	2.78	252	263.0	910.58	4.37	252	452.0	651.16	79.37	252	1150.0	1241.0	356.35	252	266	269.83	5.56	252	261	264.0	3.57
a3	232	239.0	608.71	3.02	232	239.0	582.29	3.02	232	436.0	601.58	87.93	232	1066.0	1185.84	359.48	232	244	245.6	5.17	232	240	243.4	3.45
a4	234	237.0	794.77	1.28	234	237.0	672.42	1.28	234	467.0	595.97	99.57	234	1080.0	1161.45	361.54	234	251	251.8	7.26	234	238	242.5	1.71
a5	236	241.0	800.0	2.12	236	240.0	702.77	1.69	236	447.0	618.65	89.41	236	1139.0	1191.23	382.63	236	242	247.33	2.54	236	241	244.2	2.12
b1	69	69.0	1085.1	0.0	69	69.0	890.42	0.0	69	309.0	561.94	347.83	69	1344.0	1441.68	1847.83	69	70	71.68	1.45	69	69	70.5	0.0
b2	76	76.0	630.23	0.0	76	76.0	548.48	0.0	76	337.0	547.39	343.42	76	1265.0	1426.32	1564.47	76	78	79.5	2.63	76	76	77.2	0.0
b3	80	80.0	912.77	0.0	80	80.0	648.65	0.0	80	378.0	689.84	372.5	80	1737.0	1848.74	2071.25	80	82	82.17	2.5	80	81	81.7	1.25
b4	79	79.0	978.26	0.0	79	79.0	1268.35	0.0	79	334.0	631.39	322.78	79	1514.0	1644.03	1816.46	79	83	83.83	5.06	79	79	81.3	0.0
b5	72	72.0	785.74	0.0	72	72.0	959.45	0.0	72	299.0	541.81	315.28	72	1372.0	1467.52	1805.56	72	73	74.33	1.39	72	72	72.9	0.0
c1	227	234.0	812.35	3.08	227	233.0	794.55	2.64	227	523.0	717.65	130.4	227	1488.0	1610.13	555.51	227	243	247.81	7.05	227	234	238.4	3.08
c2	219	224.0	1061.39	2.28	219	226.0	1099.06	3.2	219	474.0	737.55	116.44	219	1654.0	1779.23	655.25	219	234	238.83	6.85	219	229	232.3	4.57
c3	243	252.0	1560.45	3.7	243	248.0	1298.35	2.06	243	629.0	919.1	158.85	243	1970.0	2126.94	710.7	243	258	260.83	6.17	243	249	254.2	2.47
c4	219	226.0	1105.35	3.2	219	225.0	1148.52	2.74	219	570.0	769.13	160.27	219	1582.0	1734.13	622.37	219	232	233.83	5.94	219	227	229.0	3.65
c5	215	220.0	894.19	2.33	215	221.0	918.71	2.79	215	496.0	754.65	130.7	215	1541.0	1686.84	616.74	215	229	231.33	6.51	215	221	225.4	2.79
d1	60	60.0	918.32	0.0	60	61.0	951.23	1.67	60	419.0	801.39	598.33	60	1950.0	2080.39	3150.0	60	63	64.97	5.0	60	61	62.3	1.67
d2	66	66.0	1271.84	0.0	66	67.0	799.77	1.52	66	480.0	806.1	627.27	66	2261.0	2333.68	3325.76	66	68	69.0	3.03	66	67	67.4	1.52
d3	72	73.0	647.81	1.39	72	72.0	1528.39	0.0	72	506.0	880.23	602.78	72	2445.0	2581.48	3295.83	72	76	77.33	5.56	72	73	74.8	1.39
d4	62	62.0	1464.52	0.0	62	62.0	1201.81	0.0	62	312.0	755.48	403.23	62	2025.0	2107.97	3166.13	62	62	63.4	0.0	62	62	62.2	0.0
d5	61	61.0	750.81	0.0	61	61.0	1044.03	0.0	61	344.0	792.87	463.93	61	1930.0	2093.29	3063.93	61	63	64.33	3.28	61	61	62.6	0.0
Average	-	257.33	784.68	1.21	-	257.73	782.6	1.36	-	463.07	653.83	152.83	-	1176.27	1280.82	778.69	-	264.93	267.99	4.27	-	258.76	262.44	1.73

Table 5. Averages Statistical Test Comparison.

	QBWOA	BSWOA	BCL	MIR	40BQWOA	40BSWOA
QBWOA	-	≥0.05	≥0.05	0.00	≥0.05	≥0.05
BSWOA	≥0.05	-	≥0.05	0.00	≥0.05	≥0.05
BCL	≥0.05	≥0.05	-	0.00	≥0.05	≥0.05
MIR	≥0.05	≥0.05	≥0.05	-	≥0.05	≥0.05
40BQWOA	≥0.05	≥0.05	≥0.05	0.01	-	≥0.05
40BSWOA	≥0.05	≥0.05	≥0.05	0.00	≥0.05	-

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Becerra-Rozas, M.; Cisternas-Caneo, F.; Crawford, B.; Soto, R.; García, J.; Astorga, G.; Palma, W. Embedded Learning Approaches in the Whale Optimizer to Solve Coverage Combinatorial Problems. Mathematics 2022, 10, 4529. https://doi.org/10.3390/math10234529

AMA Style

Becerra-Rozas M, Cisternas-Caneo F, Crawford B, Soto R, García J, Astorga G, Palma W. Embedded Learning Approaches in the Whale Optimizer to Solve Coverage Combinatorial Problems. Mathematics. 2022; 10(23):4529. https://doi.org/10.3390/math10234529

Chicago/Turabian Style

Becerra-Rozas, Marcelo, Felipe Cisternas-Caneo, Broderick Crawford, Ricardo Soto, José García, Gino Astorga, and Wenceslao Palma. 2022. "Embedded Learning Approaches in the Whale Optimizer to Solve Coverage Combinatorial Problems" Mathematics 10, no. 23: 4529. https://doi.org/10.3390/math10234529

APA Style

Becerra-Rozas, M., Cisternas-Caneo, F., Crawford, B., Soto, R., García, J., Astorga, G., & Palma, W. (2022). Embedded Learning Approaches in the Whale Optimizer to Solve Coverage Combinatorial Problems. Mathematics, 10(23), 4529. https://doi.org/10.3390/math10234529

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Embedded Learning Approaches in the Whale Optimizer to Solve Coverage Combinatorial Problems

Abstract

1. Introduction

2. Set Covering Problem

3. Reinforcement Learning

3.1. Q-Learning

3.2. SARSA

4. Whale Optimization Algorithm: Fundamentals

4.1. Identifying the Prey

4.2. Encircling the Prey

4.3. Bubble Netting Technique

5. Whale Optimization Algorithm: Q-Binary Version

5.1. Exploration and Exploitation of Actions

5.2. Feasibility and Repair Heuristic

5.3. Complexity Analysis

6. Experimentation Results

Analyses of Exploration and Exploitation

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI