A Reinforcement Learning-Based Bi-Population Nutcracker Optimizer for Global Optimization

Li, Yu; Zhang, Yan

doi:10.3390/biomimetics9100596

Open AccessArticle

A Reinforcement Learning-Based Bi-Population Nutcracker Optimizer for Global Optimization

by

Yu Li

¹ and

Yan Zhang

^2,*

¹

School of Aeronautics and Astronautics, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518000, China

²

School of Electronics and Communication Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518000, China

^*

Author to whom correspondence should be addressed.

Biomimetics 2024, 9(10), 596; https://doi.org/10.3390/biomimetics9100596

Submission received: 23 August 2024 / Revised: 28 September 2024 / Accepted: 29 September 2024 / Published: 1 October 2024

Download

Browse Figures

Versions Notes

Abstract

The nutcracker optimizer algorithm (NOA) is a metaheuristic method proposed in recent years. This algorithm simulates the behavior of nutcrackers searching and storing food in nature to solve the optimization problem. However, the traditional NOA struggles to balance global exploration and local exploitation effectively, making it prone to getting trapped in local optima when solving complex problems. To address these shortcomings, this study proposes a reinforcement learning-based bi-population nutcracker optimizer algorithm called RLNOA. In the RLNOA, a bi-population mechanism is introduced to better balance global and local optimization capabilities. At the beginning of each iteration, the raw population is divided into an exploration sub-population and an exploitation sub-population based on the fitness value of each individual. The exploration sub-population is composed of individuals with poor fitness values. An improved foraging strategy based on random opposition-based learning is designed as the update method for the exploration sub-population to enhance diversity. Meanwhile, Q-learning serves as an adaptive selector for exploitation strategies, enabling optimal adjustment of the exploitation sub-population’s behavior across various problems. The performance of the RLNOA is evaluated using the CEC-2014, CEC-2017, and CEC-2020 benchmark function sets, and it is compared against nine state-of-the-art metaheuristic algorithms. Experimental results demonstrate the superior performance of the proposed algorithm.

Keywords:

nutcracker optimizer algorithm; reinforcement learning; bi-population; optimization

Graphical Abstract

1. Introduction

Metaheuristic algorithms, as a class of optimization techniques, are specifically engineered to address complex optimization problems that are challenging or infeasible to solve using traditional methods [1,2]. These algorithms are inspired by natural phenomena, biological processes, physical systems, or social behaviors, offering flexible frameworks for identifying solutions within a reasonable time frame [3,4]. Their advantages, such as simple structure, ease of implementation, and robustness to initial values, have led to their widespread application across various fields, such as power system optimization [5,6], industrial design [7,8], path planning [9,10], and parameter optimization [11,12].

Metaheuristic algorithms rely on two fundamental concepts: exploration and exploitation [13]. These concepts are essential for effectively navigating the search space to identify optimal or near-optimal solutions for complex optimization problems [14]. Exploration involves the algorithm’s capability to explore the broader search space and uncover new regions that may harbor promising solutions [15]. Exploration aims to avoid local optima and ensure a broad investigation of various areas within the search space. In contrast, exploitation involves focusing the search on specific regions that have previously shown promise, with the goal of refining solutions and converging toward the optimal solution by thoroughly searching near high-quality solutions [16]. A significant challenge in the design of metaheuristic algorithms is achieving an appropriate balance between exploration and exploitation [17].

In recent years, numerous metaheuristic algorithms have been proposed, including the grey wolf optimizer (GWO) [18], snake optimizer (SO) [19], white shark optimizer (WSO) [20,21], reptile search algorithm (RSA) [22], crested porcupine optimizer (CPO) [23], and nutcracker optimizer algorithm (NOA) [24]. Among these, the NOA mimics the search, caching, and recovery behaviors of nutcrackers, incorporating two exploration strategies and two exploitation strategies that enhance its fast convergence and robust search capabilities. However, in NOA, the transition between search strategies is governed by random numbers. When applied to complex problems, the NOA encounters limitations, such as an inadequate balance between exploration and exploitation and a propensity to become trapped in local optima.

Several techniques have been adopted to improve the performance of metaheuristic algorithms. The local search method focuses on exploring the neighborhood of a solution to find improvements, enabling the metaheuristic algorithms to escape local optima and continue the search for a global optimum. The authors of ref. [25] proposed a novel local search strategy to improve the particle swarm optimization (PSO) algorithm. After optimizing the population in each iteration, a local search strategy is introduced to enhance the present individuals in the population to accelerate the searching process and prevent becoming trapped in local optima. To improve the population diversity and convergence ability, ref. [26] proposed a variant of GWO with the fusion of a stochastic local search technique, evolutionary operators, and a memory mechanism. The stochastic local search can check the neighborhood of each individual to promote GWO’s exploitation performance. The authors of ref. [27] presented a local search and chaos mapping-based binary group teaching optimization algorithm called BGTOALC. Local search was introduced to increase exploitation. The authors of ref. [28] proposed an oppositional chaotic local search strategy to improve the aquila optimizer. Local search techniques play a critical role in refining solutions within metaheuristic algorithms. However, their embedding may cause the optimizer to perform more exploitation operations during the iterative process. This could exacerbate the imbalance between exploitation and exploration in metaheuristic algorithms.

An elite mechanism is a technique used to preserve the best-performing individuals across iterations for metaheuristic algorithms. The authors of ref. [29] proposed an elite symbiotic organism search algorithm called Elite-SOS. The global convergence ability was enhanced by using the evolutionary information of elite individuals. The authors of ref. [30] built an elite gene pool to guide the reproduction operator and acquire superior offspring. To improve the optimization performance of PSO, ref. [31] built three types of elite archives to save elite individuals with different ranks. Elite individuals could be retained directly during the iteration process, which can make full use of the whole population’s information. The authors of ref. [32] introduced an elite-guided hierarchical mutation strategy to improve the performance of the differential evolution (DE) algorithm. Elite individuals were scheduled for a local search, and the remaining individuals performed a global search guided by the former. The elite mechanism speeds up convergence by ensuring the information of the best solutions persist across generations. However, by focusing on the best solutions, the algorithm might overly emphasize exploitation at the cost of exploration. This imbalance can result in the algorithm getting trapped in local optima.

Incorporating supervised learning into metaheuristic algorithms is an emerging area of research that uses training knowledge to assist in the acquisition of optimal solutions in the iterative process. The authors of ref. [33] proposed a kernelized autoencoder that can learn from past search experiences to speed up the optimization process. The authors of ref. [34] presented autoencoding to predict the moving of the optimal solutions. To solve the problems of parameter setting and strategy selection, ref. [35] proposed an adaptive distributed DE algorithm. The individual and population parameters were updated adaptively based on the best solutions and historically successful experience. The authors of ref. [36] introduced a learning-aided evolutionary optimization framework that learns knowledge from the historical optimization process by using artificial neural networks. The learned knowledge can help metaheuristic algorithms to better approach the global optimum. While supervised learning can guide the search process more effectively, the training phase requires additional computational resources and is not suitable for time-constrained problems. In addition, the generalization of supervised learning also limits the application scenarios of this kind of strategy.

Reinforcement learning (RL) is a subfield of machine learning in which an agent learns to make decisions by taking actions within an environment to maximize cumulative rewards [37,38]. Due to its strong environmental interaction capabilities, RL has been increasingly employed by researchers to guide the selection of search strategies in metaheuristic algorithms. The authors of ref. [39] introduced an inverse reinforcement learning-based moth-flame optimization algorithm, IRLMFO, to solve large-scale optimization problems. RL was utilized to select effective search strategies based on historical data from the strategy pool established by IRLMFO. To overcome the drawbacks of getting trapped in local optima easily, ref. [40] presented a reinforcement learning-based RSA known as RLNSA, where RL managed the switching between exploration and exploitation strategies. Additionally, refs. [41,42] applied RL to address mutation strategy selection within the evolutionary process of differential evolution algorithms. The authors of ref. [43] embedded RL in the teaching–learning-based optimization algorithm (RLTLBO) to solve optimization problems. The authors of ref. [44] proposed a reinforcement learning-based memetic particle swarm optimization algorithm called RLMPSO. The selection of five search operations is controlled by the RL algorithm. The authors of ref. [45] designed a reinforcement learning-based comprehensive learning grey wolf optimizer (RLCGWO) to adaptively adjust strategies. Although the introduction of RL can enable metaheuristic algorithms to adaptively select exploration and exploitation strategies, it does not always effectively enhance algorithm performance. Typically, exploration is enhanced through methods such as large step sizes, random perturbations, or probabilistic jumps, which enable the algorithm to search beyond the current solutions [46]. Consequently, in RL-based metaheuristic algorithms, exploration strategies often receive rewards mainly during the early optimization stages, leading RL to favor exploitation strategies as optimization progresses. This tendency can cause existing RL-based metaheuristic algorithms to struggle in escaping local optima, as exploration strategies are less frequently selected.

To overcome the aforementioned problems, this paper introduces an RL-based bi-population NOA called RLNOA. The RLNOA introduces a bi-population mechanism to better balance exploration and exploitation in the optimization process. At the beginning of each iteration, the population is divided into the exploration sub-populations and the exploitation sub-populations. Individuals with poor fitness in the raw population form the exploration sub-population. A random opposition-based learning (ROBL)-based foraging method is proposed as the update strategy for the exploration sub-population to avoid local optima. The remaining outstanding individuals of the raw population formed the exploitation sub-populations, which use Q-learning within RL to adaptively select between the NOA’s two exploitation strategies (storage and recovery) to accelerate convergence and improve generalization. The division of these sub-populations is based on fitness ranking and optimization progress. Experimental results show that the RLNOA achieves superior optimization performance compared to current state-of-the-art algorithms. The primary contributions of this paper are as follows:

An RL-based bi-population nutcracker optimizer algorithm (RLNOA) is developed to solve complex optimization problems;
The foraging strategy of the NOA is enhanced using ROBL, improving its ability to search for feasible solutions;
Q-learning is utilized to control the selection of the most appropriate exploitation strategy for each iteration, dynamically improving the refinement of the optimal solution.

The remainder of this paper is organized as follows. Section 2 describes the NOA and RL methods. Section 3 explains the detailed implementation of the proposed RLNOA. The comparison experiments are finished in Section 4. Section 5 summarizes the conclusions.

2. Preliminaries

2.1. Nutcracker Optimization Algorithm

The NOA is a metaheuristic algorithm inspired by the natural behavior of nutcrackers [24]. It solves optimization problems by simulating the nutcracker’s behavior in collecting, storing, and searching for food. The optimization process in the NOA is carried out through four strategies: foraging, storage, cache search, and recovery. Table A1 summarizes the nomenclature of this study.

2.1.1. Foraging and Storage Strategies

During the foraging phase, individuals start searching for potential food sources within the search space. This behavior is mathematically modeled as follows:

x_{i, j}^{t + 1 (F S n e w_{1})} = \{\begin{cases} x_{i, j}^{t}, i f r a n d_{1} < r a n d_{2} \\ {x_{i, j}^{t}}^{'}, o t h e r w i s e \end{cases}

(1)

{x_{i, j}^{t}}^{'} = \{\begin{cases} x_{m, j}^{t} + ε \cdot (x_{A, j}^{t} - x_{B, j}^{t}) + μ \cdot (r_{1}^{2} \cdot U_{j} - L_{j}) & , i f t < T_{\max} / 2 \\ x_{C, j}^{t} + μ \cdot (x_{A, j}^{t} - x_{B, j}^{t}) + μ \cdot (r_{2} < δ) \cdot (r_{3}^{2} \cdot U_{j} - L_{j}) & , o t h e r w i s e \end{cases}

(2)

where

x_{i, j}^{t + 1 (F S n e w_{1})}

is the new position of the ith individual generated in the foraging phase;

x_{i, j}^{t}

is the jth dimension of the ith individual in the iteration

t

;

x_{m, j}^{t}

is the mean position of the jth dimensions for the current population in the iteration

t

;

T_{\max}

indicates the maximum generations;

L_{j}

and

U_{j}

are the lower and upper bounds of the optimization problem in the jth dimension; A, B, and C are three different integers randomly selected in the range of [0, NP]; NP is the population size;

ε

is a parameter generated by the levy flight; the values of

r a n d_{1}

,

r a n d_{2}

,

r_{1}

,

r_{2}

, and

r_{3}

are random numbers selected within the range [0, 1];

δ

is a control parameter; and

μ

is a parameter chosen among

τ_{1}

(chosen randomly between zero and one),

τ_{2}

(the normal distribution), and

τ_{3}

(levy flight), as follows:

μ = \{\begin{cases} τ_{1}, i f r a n d_{1} < r a n d_{2} \\ τ_{2}, i f r a n d_{2} < r a n d_{3} \\ τ_{3}, i f r a n d_{1} < r a n d_{3} \end{cases}

(3)

where

r a n d_{1}

,

r a n d_{2}

, and

r a n d_{3}

are random numbers selected within the range [0, 1].

At the storage phase, individuals store foods as follows:

x_{i}^{t + 1 (F S n e w_{2})} = \{\begin{cases} x_{i}^{t} + μ \cdot (x_{b e s t}^{t} - x_{i}^{t}) \cdot |λ| + r_{1} \cdot (x_{A}^{t} - x_{B}^{t}) & , i f r a n d_{1} < r a n d_{2} \\ x_{b e s t}^{t} + μ \cdot (x_{A}^{t} - x_{B}^{t}) & , i f r a n d_{1} < r a n d_{3} \\ x_{b e s t}^{t} \cdot ξ & , o t h e r w i s e \end{cases}

(4)

where

x_{i}^{t + 1 (F S n e w_{2})}

is the new position of the ith individual generated in the storage phase;

λ

is a parameter generated by the levy flight; and

r_{1}

,

r a n d_{1}

,

r a n d_{2}

, and

r a n d_{3}

are random numbers selected within the range [0, 1].

ξ

is a parameter that linearly decreased from 1 to 0 during the optimization process.

The exchange between the foraging and storage strategies is used to balance exploration and exploitation phases as follows:

x_{i}^{t + 1 (F S)} = \{\begin{cases} x_{i}^{t + 1 (F S n e w_{1})}, i f r a n d_{1} < P_{a 1} \\ x_{i}^{t + 1 (F S n e w_{2})}, o t h e r w i s e \end{cases}

(5)

where

r a n d_{1}

is random numbers selected within the range [0, 1] and

P_{a 1}

is a parameter that linearly decreased from 1 to 0 during the optimization process.

2.1.2. Cache Search and Recovery Strategies

At the cache search phase, individuals locate their caches through two reference points:

R P_{i, 1}^{t} = \{\begin{cases} x_{i}^{t} + α \cdot \cos (θ) \cdot (x_{A}^{t} - x_{B}^{t}) + β \cdot R P_{i, r a n d i_{1 \leftrightarrow 2}}^{t}, i f θ = \frac{π}{2} \\ x_{i}^{t} + α \cdot \cos (θ) \cdot (x_{A}^{t} - x_{B}^{t}), o t h e r w i s e \end{cases}

(6)

R P_{i, 2}^{t} = \{\begin{cases} x_{i}^{t} + (α \cdot \cos (θ) ((U - L) \cdot r_{1} + L) + α \cdot R P_{i, r a n d i_{1 \leftrightarrow 2}}^{t}) \cdot ξ, i f θ = \frac{π}{2} \\ x_{i}^{t} + α \cdot \cos (θ) \cdot ((U - L) \cdot r_{1} + L) \cdot ξ, o t h e r w i s e \end{cases}

(7)

and

ξ = \{\begin{cases} 1, i f r a n d_{1} < P_{r p} \\ 0, o t h e r w i s e \end{cases}

(8)

where

θ

is a parameter chosen in the range of [0,

π

];

r a n d i_{1 \leftrightarrow 2}

is an integer chosen randomly between zero and one;

r_{1}

,

r a n d_{1}

are random numbers selected within the range [0, 1];

P_{r p}

is a global exploration threshold; and

β

is a convergence parameter and can be acquired as follows:

β = \{\begin{cases} {(1 - \frac{t}{T_{\max}})}^{2 \frac{t}{T_{\max}}} & , i f r a n d_{1} < r a n d_{2} \\ {(\frac{t}{T_{\max}})}^{\frac{2}{t}} & , o t h e r w i s e \end{cases}

(9)

where

r a n d_{1}

and

r a n d_{2}

are random numbers selected within the range [0, 1]. The new position of the individual during the cache search phase can be acquired as follows:

x_{i, j}^{t + 1 (C R n e w_{1})} = \{\begin{cases} x_{i, j, 1}^{t}, i f r a n d_{1} < r a n d_{2} \\ x_{i, j, 2}^{t}, o t h e r w i s e \end{cases}

(10)

x_{i, j, 1}^{t + 1} = \{\begin{cases} x_{i, j}^{t} & , i f r a n d_{1} < r a n d_{2} \\ x_{i, j}^{t} + r_{1} \cdot (x_{b e s t, j}^{t} - x_{i, j}^{t}) + r_{2} \cdot (R P_{i, j, 1}^{t} - x_{C, j}^{t}) & , o t h e r w i s e \end{cases}

(11)

x_{i, j, 2}^{t + 1} = \{\begin{cases} x_{i, j}^{t} & , i f r a n d_{1} < r a n d_{2} \\ x_{i, j}^{t} + r_{1} \cdot (x_{b e s t, j}^{t} - x_{i, j}^{t}) + r_{2} \cdot (R P_{i, j, 2}^{t} - x_{C, j}^{t}) & , o t h e r w i s e \end{cases}

(12)

where

R P_{i, j, 1}^{t}

and

R P_{i, j, 2}^{t}

are the jth position of

R P_{i, 1}^{t}

and

R P_{i, 2}^{t}

, respectively;

r a n d_{1}

,

r a n d_{2}

,

r_{1}

, and

r_{2}

are random numbers selected within the range [0, 1]; and C is the index of a solution selected randomly from the population.

During the recovery phase, nutcrackers find the hidden caches and retrieve the buried pine seeds. The new position of a nutcracker is obtained using the following equation:

x_{i, j}^{t + 1 (C R n e w_{2})} = \{\begin{cases} R P_{i, 1}^{t}, i f f (R P_{i, 1}^{t}) < f (R P_{i, 2}^{t}) a n d f (R P_{i, 1}^{t}) < f (x_{i}^{t}) \\ R P_{i, 1}^{t}, i f f (R P_{i, 2}^{t}) < f (R P_{i, 1}^{t}) a n d f (R P_{i, 2}^{t}) < f (x_{i}^{t}) \\ x_{i}^{t}, o t h e r w i s e \end{cases}

(13)

Finally, the exchange between the cache search and recovery strategies is applied according to the following formula:

x_{i}^{t + 1 (C R)} = \{\begin{cases} x_{i}^{t + 1 (C R n e w_{1})}, i f r a n d_{1} > P_{a 2} \\ x_{i}^{t + 1 (C R n e w_{2})}, o t h e r w i s e \end{cases}

(14)

where

r a n d_{1}

is random numbers selected within the range [0, 1] and

P_{a 2}

represents a probability value.

2.2. Reinforcement Learning

RL has been applied across various domains due to its effectiveness in problem-solving [47,48]. In RL, the agent interacts with the environment to learn how to perform optimal actions. As the representative method of RL, Q-learning defines the Q-table to control an agent’s actions. The Q-table is an m × n matrix, where m represents the number of states and n represents the number of actions available to the agent. By making a decision in the current state based on the Q-table values, the agent ultimately maximizes its reward. The Q-table is dynamically updated as follows:

Q (s_{t + 1}, a_{t + 1}) = Q (s_{t}, a_{t}) + α [R_{t + 1} + γ \max_{a} Q (s_{t + 1}, a) - Q (s_{t}, a_{t})]

(15)

where

s_{t}

and

s_{t + 1}

represent the current and next states, respectively;

a_{t}

and

a_{t + 1}

represent the current and next actions, respectively;

R_{t + 1}

is the reward acquired after performing action

a_{t}

;

α

is the learning rate;

γ

is the discount factor; and

Q (\cdot)

represents the corresponding value in the Q-table.

3. The Development of the Proposed Algorithm

3.1. Overview

The traditional NOA is limited to specific problems and is prone to getting trapped in local optima. To overcome these limitations, this paper introduces a hybrid strategy called the RLNOA. In each iteration of the RLNOA, the population is segmented into two groups based on fitness ranking: the exploration sub-population and the exploitation sub-population. The exploration sub-population consists of individuals with poor fitness in the current population. A ROBL-based foraging strategy is employed as the update strategy for individuals in the exploration sub-population, with a sine-based perturbation introduced to adjust the size of the exploration sub-population to ensure convergence. The remaining individuals form the exploitation sub-population, which implements two types of exploitation strategies: storage and recovery. The selection of the exploitation strategy is governed by Q-learning.

Furthermore, the RLNOA utilizes a single Q-table to map individual states to actions. States in the RLNOA are encoded by relative changes in fitness value and local diversity, while actions correspond to exploitation strategies. The exploitation sub-population updates based on the strategy selected by each individual, generating offspring. Rewards are assigned based on the selection process outcomes, and the Q-table is subsequently updated for the next generation. Figure 1 illustrates the flowchart of the RLNOA, with the main steps detailed in Algorithm 1.

Algorithm 1: The pseudocode of the RLNOA

Input : Population size NP, Maximum number of generations T_{\max}

Learning rate

λ

Discount factor

γ

,

, P_{r p}

,

δ

Output : The best solution x_{b e s t}

and corresponding fitness value f (x_{b e s t})

.
Set the initial Q-table: Q(s, a) = 0
Set t = 1

Initialize population position : x_{i}^{t}, i = 1, 2, \dots, N P

Calculate the fitness values for each individual : f (x_{i}^{t})

Calculate the local diversity for each individual : D_{i}^{t}

Set D_{i}^{t - 1} = D_{i}^{t}

, f (x_{i}^{t - 1}) = f (x_{i}^{t})

While t < T_max do
Acquire sub-population by Equation (19)

For each individual x_{i}^{t}

If x_{i}^{t}

belong to exploration sub-population

Perform ROBL - based foraging strategy on x_{i}^{t}

by Equation (16)
       Else
          Determine the state of the exploitation sub-population by Equations (20) and (21)
          Choose the best a for the current s from Q-table
          Switch action
              Case 1: Storage

Perform storage strategy on x_{i}^{t}

by Equation (4)
Case 2: Recovery

Perform recovery strategy on x_{i}^{t}

by Equation (13)
          End Switch
          Set the reward by Equation (25)
       End if

Calculate the fitness f (x_{i}^{t})

of x_{i}^{t}

Update the position of x_{i}^{t}

if its fitness is improved
    End for
    Calculate the relative changes of fitness and local diversity for the population
    Update Q-table by the exploitation sub-population
    t = t + 1
End While
Return results
Terminate

3.2. ROBL-Based Foraging Strategy

In the RLNOA, based on the ROBL method, an improved foraging strategy is introduced to construct the exploration behavior for the exploration sub-population [18]. The offspring in the exploration sub-population can be generated as follows:

x_{i, j}^{t + 1 (n e w_{1})} = \{\begin{cases} {x_{i, j}^{t}}^{'}, i f t < T_{\max} / 2 \\ {x_{i, j}^{t}}^{″}, o t h e r w i s e \end{cases}

(16)

{x_{i, j}^{t}}^{'} = \{\begin{cases} x_{m, j}^{t} + ε \cdot (x_{A, j}^{t} - x_{B, j}^{t}) + μ \cdot (r_{1}^{2} \cdot U_{j} - L_{j}) & , i f r a n d_{1} < r a n d_{2} \\ L_{j} + U_{j} - r_{1} \cdot x_{i . j}^{t} & , o t h e r w i s e \end{cases}

(17)

{x_{i, j}^{t}}^{″} = \{\begin{cases} x_{C, j}^{t} + μ \cdot (x_{A, j}^{t} - x_{B, j}^{t}) + μ \cdot (r_{2} < δ) \cdot (r_{3}^{2} \cdot U_{j} - L_{j}) & , i f r a n d_{1} < r a n d_{2} \\ L_{j} + U_{j} - r_{2} \cdot x_{i . j}^{t} & , o t h e r w i s e \end{cases}

(18)

where

x_{i, j}^{t + 1 (n e w_{1})}

is the new position of the ith individual generated in the foraging phase;

x_{i, j}^{t}

is the jth dimension of the ith individual in the iteration

t

;

x_{m, j}^{t}

is the mean position of the jth dimensions for the current population in the iteration

t

;

T_{\max}

indicates the maximum generations;

L_{j}

and

U_{j}

are the lower and upper bounds of the optimization problem in the jth dimension; A, B, and C are three different integers randomly selected in the range of [0, NP]; NP is the population size;

ε

is a parameter generated by the levy flight;

r a n d_{1}

,

r a n d_{2}

,

r_{1}

,

r_{2}

, and

r_{3}

are random numbers selected within the range [0, 1];

δ

is a control parameter; and

μ

is a parameter generated based on Equation (3). The size of the exploration sub-population can be calculated as follows:

N_{e x p l o r a t i o n} = r o u n d [\frac{N p}{2} {(1 - \sin (\frac{π}{2} \sqrt{\frac{t}{T_{\max}}}))}^{ζ}]

(19)

where

ζ

is the control parameter. At the beginning of each iteration,

N_{e x p l o r a t i o n}

individuals with poor fitness values are chosen from the total population to form the exploration sub-population. For example, assuming that

T_{\max} = 10

,

N P = 20

and

ζ = 8

. The variation of

N_{e x p l o r a t i o n}

is illustrated in Figure 2. To ensure population diversity and prevent premature convergence, the value of

N_{e x p l o r a t i o n}

decreases slowly in the early stages of the optimization process. In the later stages,

N_{e x p l o r a t i o n}

decays rapidly, allowing most individuals to focus on local exploitation.

3.3. Q-Learning-Based Exploitation Behavior

To ensure dynamic optimization of benefits at different stages for solving the optimization problem, Q-learning is employed as the selector for the exploitation sub-population to control the switch between storage (Equation (4)) and recovery (Equation (13)) strategies. The settings for Q-learning are specified as follows:

3.3.1. State Encoding

The state of each individual is encoded as the relative changes of local diversity and fitness values, which are defined as follows:

l d_{i}^{t} = D_{i}^{t} / D_{i}^{t - 1}

(20)

f i t_{i}^{t} = f (x_{i}^{t}) / f (x_{i}^{t - 1})

(21)

where

l d_{i}^{t}

is the relative changes in local diversity;

f i t_{i}^{t}

is the relative changes in fitness value; and

D_{i}^{t}

represents the local diversity of individual

x_{i}^{t}

and can be calculated as follows:

D_{i}^{t} = \frac{1}{k} \sum_{l = 1}^{k} \sqrt{\sum_{j = 1}^{d} {(x_{l, j}^{t} - {\bar{x}}_{l, j}^{t})}^{2}}

(22)

{\bar{x}}_{l, j}^{t} = \frac{1}{k} \sum_{l = 1}^{k} x_{l, j}^{t}

(23)

where

{\{x_{l}^{t}\}}_{l = 1}^{k}

is the neighborhood set of

x_{i}^{t}

;

d

is the dimension of the search space; and

k

is the number of near neighbors. As shown in Figure 3, the exploitation population consists of 32 states in total. Among these, the dimension of

l d

is divided into eight states: [0, 0.25), [0.25, 0.5), [0.5, 0.75), [0.75, 1.0), [1, 1.5), [1.5, 2), [2, 3), and [3,

+ \infty

). Moreover, the dimension of

f i t^{t}

is divided into four states: [0, 0.25), [0.25, 0.5), [0.5, 0.75), and [0.75, 1.0].

3.3.2. Action Options

The action of each individual is encoded as the selection of exploitation strategies. The probability of selecting each action is computed using the SoftMax function:

π_{q} (s_{t}, a_{j}) = \frac{\exp (Q_{t} (s_{t}, a_{j}))}{\sum_{j = 1}^{n} \exp (Q_{t} (s_{t}, a_{j}))}

(24)

where

Q_{t} (s_{t}, a_{j})

is the value in the Q-table in the iteration

t

;

a_{j}

is the jth action; and

n

is the total number of actions. Each individual sample has an optimization strategy based on the probability of each action.

3.3.3. Reward Options

The reward is determined based on the selection results of each generation, reflecting the performance of the current optimization strategy. If the fitness value of the new position

x_{i}^{t + 1}

is better than the old position

x_{i}^{t}

, the individual is rewarded with 1. Otherwise, the individual is punished with −1. The reward settings are defined as follows:

R = \{\begin{cases} 1 & , i f f (x_{i}^{t + 1}) < f (x_{i}^{t}) \\ - 1 & , o t h e r w i s e \end{cases}

(25)

Based on the above settings, the Q-table can be updated by Equation (15). The pseudocode of the RLNOA is shown in Algorithm 1.

3.4. Time Complexity

As can be seen from the pseudocode of the RLNOA in Algorithm 1, the proposed algorithm mainly consists of the following parts.

(1) Initializing the population and updating the fitness values and local diversity for each individual, with a time complexity of

O (N P \times d)

.

(2) Acquiring the sub-populations, with a time complexity of

O (T_{\max} \times N P)

.

(3) Updating the position for the exploration sub-population and exploitation sub-population, with a time complexity of

O (T_{\max} \times N P \times d)

.

(4) Calculating the relative changes in fitness and local diversity for the populations, with a time complexity of

O (T_{\max} \times N P \times d)

.

(5) Updating the Q-table using the exploitation sub-population, with a time complexity of

O (T_{\max} \times (N P - N_{e x p l o r a t i o n}) \times d)

.

Therefore, the maximum computing complexity of the RLNOA is

O (T_{\max} \times N P \times d)

, which is the same as that of the NOA.

4. Experimental Results

In this section, we perform a series of experiments on publicly available benchmark problems to assess the effectiveness of the RLNOA. The results are compared and analyzed against other state-of-the-art methods that have shown promising performances in the literature.

4.1. Test Conditions

The performance of the proposed RLNOA was tested on three global optimization test suites, including CEC-2014, CEC-2017, and CEC-2020 [49]. These test suites consist of unimodal, multimodal, hybrid, and composition functions, each with only one global optimum. The proposed RLNOA was compared with the NOA [24], SO [19], RSA [22], crested porcupine optimizer (CPO) [23], GWO [10], PSO [3], RLTLBO [43], RLMPSO [44], and RLCGWO [45]. The PSO and GWO are classical algorithms, while the NOA, SO, RSA, and CPO are well-known and recently proposed algorithms. The RLTLBO, RLMPSO, and RLCGWO are RL-based and recently proposed algorithms.

The common parameters for the experimental algorithms are presented in Table 1. The maximum number of iterations is set to 1000. To assess the experimental results, several performance metrics are used, including the average (Ave) and standard deviation (Std) of fitness values from 30 independent runs, and a ranking metric to assess the order of each method according to its average fitness value. Additionally, to highlight the significant differences between the RLNOA’s results and those of competing algorithms, convergence curves and box plots are utilized. The experiments were implemented in MATLAB R2024a on a device with Intel(R) Core(TM) i7-14700KF CPU @ 3.40 GHz and 64 GB RAM.

4.2. Comparison over CEC-2014

We performed optimization experiments on the CEC-2014 test suite to verify the effectiveness of the proposed RLNOA. The CEC-2014 test suite is a diverse collection of 30 test functions, including three unimodal functions, 13 multimodal functions, six hybrid functions, and eight composite functions. Each category is designed to test different aspects of an optimization algorithm’s capability, such as its ability to handle multiple local optima, locate the global optimum, and efficiently explore and exploit the search space. These functions are characterized by various levels of difficulty, dimensionality, and complexity, making them comprehensive tools for assessing the robustness, efficiency, and accuracy of optimization algorithms.

Table 2 shows the experimental results of various algorithms applied to the CEC-2014 test suite. As shown in the table, except for F3 and F30, the proposed RLNOA ranks first among all comparative algorithms in the remaining functions. Additionally, the second-to-last row in Table 2 confirms that the RLNOA achieves the highest average ranking, with a value of 1.0345. The second highest is the NOA, with a value of 2.1724, while the SO algorithm performs the worst.

Figure 4 illustrates the convergence curves of different algorithms applied to the CEC-2014 test suite. It can be seen from the figure that the convergence speed of the RLNOA is generally not the fastest. This is primarily because RL allows the population to learn the best exploitation strategy in the current state during the early stages of the search. As a result, the convergence speed of the RLNOA in the early stages is not the best, but its overall convergence performance remains competitive compared to the other algorithms. Moreover, due to RL combined with a dynamic exploration mechanism, the RLNOA can avoid local optima. Particularly for functions F5, F6, F8, F9, F10, F11, and F16, the RLNOA achieves better convergence results. Figure 5 shows the box plots of different algorithms applied to the CEC-2014 test suite, where the RLNOA achieves superior results.

4.3. Comparison over CEC-2017

In this experiment, the ability of the RLNOA to solve the optimization problems within the CEC-2017 test suite is evaluated, with the results presented in Table 3. The CEC-2017 benchmark suite consists of 30 test functions, categorized into unimodal functions, basic multimodal functions, expanded multimodal functions, hybrid functions, and composition functions. These categories encompass a wide range of optimization challenges, assessing the ability of algorithms to locate global optima, avoid local optima, and effectively explore the search space. It is also noted that the CEC-2017-F2 function was excluded from the test suite due to its unstable behavior.

As shown in Table 3, the proposed RLNOA obtained the most optimal values, ranking first overall. For the five functions where the RLNOA did not achieve the optimal value, it ranked second. The RSA performed the worst when applied to the CEC-2017 test suite, ranking last. Figure 6 illustrates the convergence curves of the different algorithms applied to the CEC-2017 test suite. It can be seen that the RLNOA converges faster than other methods for functions such as F16, F20, and F24. In most cases, its convergence speed surpasses that of algorithms such as the SO and RSA. Figure 7 presents the box plots of different algorithms on the CEC-2017 test suite, indicating that the RLNOA consistently achieved superior results.

4.4. Comparison over CEC-2020

To verify the effectiveness of the proposed algorithm for problems with enhanced complexity and realism, we performed optimization experiments on the CEC-2020 test suite. This suite comprises 10 benchmark functions and places a greater emphasis on dynamic and noisy functions, reflecting the evolving nature of real-world problems. Such a focus allows for a comprehensive evaluation of an algorithm’s performance under more variable and unpredictable conditions.

Table 4 presents the results of different algorithms applied to the CEC-2020 test suite. It can be seen that the RLNOA ranks first in eight out of the 10 functions. For functions F4 and F9, the RLNOA ranks second, but the gap between its results and the first-ranked results is minimal. The average ranking and final ranking of each algorithm across all test functions are shown in the last two rows of Table 4. The RLNOA performs the best, with an average ranking of 1.2222 and a final ranking of 1. Figure 8 and Figure 9, respectively, show the convergence curves and box plots of different algorithms applied to the CEC-2020 test suite. These figures confirm that the RLNOA consistently demonstrates superior performance across all benchmark functions.

4.5. Analysis of the Q-Table

We take test function F1 in CEC2014 as an example to illustrate the Q-table update process. As shown in Figure 10, the state

[s_{1}, s_{2}, \dots, s_{32}]

represents different stages of the relative changes in local diversity and fitness values. Actions

a_{1}

and

a_{2}

represent the storage strategy and recovery strategy, respectively. The Q-table is initialized as a zero matrix. After five iterations, the Q-values for different states in the Q-table change. The individual will execute the action corresponding to the highest Q-value in the current state. For example, the Q-value of the storage strategy (action

a_{1}

) in state

s_{26}

is −0.93, which is much lower than the Q-value of the recovery strategy (action

a_{2}

), which is 1.32. When the individual is in state

s_{26}

during the next iteration, it will execute the recovery strategy. In the later stages of the optimization process, when the population has converged close to the global optimum, selecting any exploitation strategy is unlikely to yield better results. As a result, almost all Q-values in the Q-table become negative.

4.6. Analysis of the RLNOA’s Parameters

The RLNOA contains some parameters that affect its performance, similar to other metaheuristic algorithms. Here are some suggestions for selecting and tuning these parameters.

(1) Population size NP

A larger population size generally allows for better exploration of the solution space, as more diverse solutions are maintained. However, increasing the population size typically leads to higher computational costs, as more solutions need to be evaluated during each iteration. This can slow down the algorithm, especially for complex or large-scale problems. Therefore, population sizes often range from a few dozen to several hundred individuals, depending on the problem’s complexity and the specific metaheuristic used. Reasonable values are within [20, 200].

(2) Maximum iterations

T_{\max}

More iterations allow the algorithm to refine and improve solutions gradually. However, there may be diminishing returns after a certain number of iterations, where further improvements become minimal. Reasonable values are within [100, 2000].

(3) Learning rate

α

This parameter determines how much new information overrides old information. A high learning rate means the agent learns quickly, but it may also make the learning process unstable. A low learning rate ensures stability but can slow down the learning process. Reasonable values are within [0.1, 0.9].

(4) Discount factor

γ

This factor determines the importance of future rewards. A value close to 0 makes the agent prioritize immediate rewards, while a value close to 1 encourages the agent to consider long-term rewards. The discount factor helps balance immediate versus future gains, influencing the agent’s overall strategy. Reasonable values are within [0.1, 0.9].

We also studied the sensitivity of the RLNOA’s partial parameters. This analysis helps determine the impact of small changes in these parameters on the performance of the proposed algorithm. The functions used in these experiments are F1, F4, F17, and F23 from CEC-2014, which represent unimodal, multimodal, hybrid, and composite functions, respectively. The maximum number of iterations is set to 1000. The population size is set to 100. The average (Ave) of fitness values from 30 independent runs is used to acquire the sensitivity analysis results. Experimental results are as follows:

Global exploration threshold $P_{r p}$ : to verify the effect of $P_{r p}$ on the efficiency of the RLNOA, experiments are performed for several values of $P_{r p}$ , taken as 0.2, 0.4, 0.6, and 0.8, while other parameters are unchanged. As shown in Table 5, the RLNOA is insensitive to this parameter. The results of F17 indicate that the RLNOA performs best when $P_{r p} = 0.2$ is set to a specific value.
Control parameter $δ$ : experiments are performed for several values of $δ$ , taken as 0.05, 0.1, 0.2, and 0.5, while other parameters are unchanged. As shown in Table 6, the RLNOA is not sensitive to small changes in the parameter $δ$ .
Number of near neighbors $k$ : Table 7 shows the results of the RLNOA with different values for the parameter $k$ . It is evident from Table 7 that the RLNOA is not sensitive to small changes in the parameter $k$ .
Control parameter $ζ$ : to explore the sensitivity of the RLNOA to the parameter $ζ$ , experiments are caried out for different values of $ζ$ , as shown in Table 8. It is apparent that the RLNOA is sensitive to $ζ$ . This is primarily because $ζ$ controls the variation trend of the exploration sub-population during the optimization process. Table 8 also shows that the RLNOA acquires the best results when the value of $ζ$ is set to 1.

5. Conclusions

In this paper, we propose an RL-based bi-population nutcracker optimizer algorithm. We developed a bi-population mechanism that uses fitness ranking to separate the raw population into exploration and exploitation sub-populations at the start of each iteration. The exploration sub-population, comprising individuals with lower fitness, employs a foraging strategy based on ROBL to maintain diversity. The exploitation sub-population includes two strategies, storage and recovery, with the selection of strategy controlled by Q-learning in RL. Experiments were conducted on the CEC-2014, CEC-2017, and CEC-2020 benchmark suites. The results, including optimization performance, convergence curves, and box plots, demonstrate that the proposed algorithm outperforms nine other comparative algorithms.

However, the proposed algorithm still has limitations. First, the exploration strategy needs to be further enhanced to improve the algorithm’s ability to escape local optima. Second, there is some redundancy in boundary states within the Q-learning process. In the future, based on the existing RLNOA framework, the development of search strategies and the encoding of states will be further studied to improve optimization performance. Additionally, complex engineering applications will be introduced to test future work.

Author Contributions

Conceptualization, Y.L. and Y.Z.; methodology, Y.L.; software, Y.L.; validation, Y.L. and Y.Z.; formal analysis, Y.L.; investigation, Y.L.; resources, Y.L.; data curation, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, Y.L. and Y.Z.; visualization, Y.L.; supervision, Y.Z.; project administration, Y.Z.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. The nomenclature of this study.

Indices		A, B, and C	Integers randomly selected in the range of [0, NP]
$i$	Index of individuals in the population	$L_{j}$	The lower bound of the optimization problem in the jth dimension
$j$	Index of dimensions	NP	The population size
$l$	Index of individuals in the neighborhood set	$P_{a 2}$	A probability value
$t$	Index of iteration	$P_{r p}$	A global exploration threshold
Sets		$T_{\max}$	The maximum iterations
${\{x_{i}^{t}\}}_{i = 1}^{N P}$	The population in the iteration $t$	$U_{j}$	The upper bound of the optimization problem in the jth dimension
${\{x_{l}^{t}\}}_{l = 1}^{k}$	The neighborhood set of $x_{i}^{t}$	Variables
Parameters		$a_{t}$ and $a_{t + 1}$	The current and next actions respectively
$α$	The learning rate	$f i t_{i}^{t}$	The relative changes of fitness value
$γ$	The discount factor	$l d_{i}^{t}$	The relative changes of local diversity
$δ$ and $ζ$	Control parameter	$s_{t}$ and $s_{t + 1}$	The current and next states respectively
$ε$ , $λ$ , and $τ_{3}$	A parameter generated by the levy flight	$x_{i, j}^{t + 1 (F S n e w_{1})}$	The new position of the ith individual generated in the foraging phase
$θ$	A parameter chosen in the range of [0, $π$ ]	$x_{i, j}^{t}$	The jth dimension of the ith individual in the iteration $t$
$ξ$ and $P_{a 1}$	A parameter that linearly decreased from 1 to 0	$x_{i}^{t + 1 (F S n e w_{2})}$	The new position of the ith individual generated in the storage phase
$τ_{1}$	A random number selected within the range [0, 1]	$x_{m, j}^{t}$	The mean position of the jth dimensions for current population in the iteration $t$
$τ_{2}$	A random number generated based on a normal distribution	$D_{i}^{t}$	The local diversity of individual $x_{i}^{t}$
$d$	The dimension of the search space	$N_{e x p l o r a t i o n}$	The size of the exploration sub-population
$k$	The number of near neighbors	$Q (\cdot)$	The corresponding value in the Q-table
$r a n d$ and $r$	Random numbers selected within the range [0, 1]	$R_{t + 1}$	The reward acquired after performing action $a_{t}$
$r a n d i_{1 \leftrightarrow 2}$	An integer chosen randomly between zero and one

References

Hubálovsky, S.; Hubálovská, M.; Matousová, I. A New Hybrid Particle Swarm Optimization-Teaching-Learning-Based Optimization for Solving Optimization Problems. Biomimetics 2024, 9, 8. [Google Scholar] [CrossRef] [PubMed]
Wang, R.T.; Zhang, S.S.; Zou, G.Y. An Improved Multi-Strategy Crayfish Optimization Algorithm for Solving Numerical Optimization Problems. Biomimetics 2024, 9, 361. [Google Scholar] [CrossRef] [PubMed]
Pardo, X.C.; González, P.; Banga, J.R.; Doallo, R. Population based metaheuristics in Spark: Towards a general framework using PSO as a case study. Swarm Evol. Comput. 2024, 85, 101483. [Google Scholar] [CrossRef]
Tatsis, V.A.; Parsopoulos, K.E. Reinforcement learning for enhanced online gradient-based parameter adaptation in metaheuristics. Swarm Evol. Comput. 2023, 83, 101371. [Google Scholar] [CrossRef]
Wang, Y.; Xiong, G.J.; Xu, S.P.; Suganthan, P.N. Large-scale power system multi-area economic dispatch considering valve point effects with comprehensive learning differential evolution. Swarm Evol. Comput. 2024, 89, 101620. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, H.F.; Tian, Y.Z.; Li, C.W.; Yue, D. Cooperative constrained multi-objective dual-population evolutionary algorithm for optimal dispatching of wind-power integrated power system. Swarm Evol. Comput. 2024, 87, 101525. [Google Scholar] [CrossRef]
Feng, X.; Pan, A.Q.; Ren, Z.Y.; Hong, J.C.; Fan, Z.P.; Tong, Y.H. An adaptive dual-population based evolutionary algorithm for industrial cut tobacco drying system. Appl. Soft Comput. 2023, 144, 110446. [Google Scholar] [CrossRef]
Luo, T.; Xie, J.P.; Zhang, B.T.; Zhang, Y.; Li, C.Q.; Zhou, J. An improved levy chaotic particle swarm optimization algorithm for energy-efficient cluster routing scheme in industrial wireless sensor networks. Expert Syst. Appl. 2024, 241, 122780. [Google Scholar] [CrossRef]
Qu, C.Z.; Gai, W.D.; Zhang, J.; Zhong, M.Y. A novel hybrid grey wolf optimizer algorithm for unmanned aerial vehicle (UAV) path planning. Knowl.-Based Syst. 2020, 194, 105530. [Google Scholar] [CrossRef]
Qu, C.Z.; Gai, W.D.; Zhong, M.Y.; Zhang, J. A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (UAVs) path planning. Appl. Soft Comput. 2020, 89, 106099. [Google Scholar] [CrossRef]
Qu, C.; Zhang, Y.; Ma, F.; Huang, K. Parameter optimization for point clouds denoising based on no-reference quality assessment. Measurement 2023, 211, 112592. [Google Scholar] [CrossRef]
Chauhan, D.; Yadav, A. An archive-based self-adaptive artificial electric field algorithm with orthogonal initialization for real-parameter optimization problems. Appl. Soft Comput. 2024, 150, 111109. [Google Scholar] [CrossRef]
Li, G.Q.; Zhang, W.W.; Yue, C.T.; Wang, Y.R. Balancing exploration and exploitation in dynamic constrained multimodal multi-objective co-evolutionary algorithm. Swarm Evol. Comput. 2024, 89, 101652. [Google Scholar] [CrossRef]
Ahadzadeh, B.; Abdar, M.; Safara, F.; Khosravi, A.; Menhaj, M.B.; Suganthan, P.N. SFE: A Simple, Fast, and Efficient Feature Selection Algorithm for High-Dimensional Data. IEEE Trans. Evol. Comput. 2023, 27, 1896–1911. [Google Scholar] [CrossRef]
Fu, S.; Huang, H.; Ma, C.; Wei, J.; Li, Y.; Fu, Y. Improved dwarf mongoose optimization algorithm using novel nonlinear control and exploration strategies. Expert Syst. Appl. 2023, 233, 120904. [Google Scholar] [CrossRef]
Li, J.; Li, G.; Wang, Z.; Cui, L. Differential evolution with an adaptive penalty coefficient mechanism and a search history exploitation mechanism. Expert Syst. Appl. 2023, 230, 120530. [Google Scholar] [CrossRef]
Hu, C.; Zeng, S.; Li, C. A framework of global exploration and local exploitation using surrogates for expensive optimization. Knowl.-Based Syst. 2023, 280, 111018. [Google Scholar] [CrossRef]
Chang, D.; Rao, C.; Xiao, X.; Hu, F.; Goh, M. Multiple strategies based Grey Wolf Optimizer for feature selection in performance evaluation of open-ended funds. Swarm Evol. Comput. 2024, 86, 101518. [Google Scholar] [CrossRef]
Hashim, F.A.; Hussien, A.G. Snake Optimizer: A novel meta-heuristic optimization algorithm. Knowl.-Based Syst. 2022, 242, 108320. [Google Scholar] [CrossRef]
Braik, M.; Hammouri, A.; Atwan, J.; Al-Betar, M.A.A.; Awadallah, M.A. White Shark Optimizer: A novel bio-inspired meta-heuristic algorithm for global optimization problems. Knowl.-Based Syst. 2022, 243, 108457. [Google Scholar] [CrossRef]
Kumar, S.; Sharma, N.K.; Kumar, N. WSOmark: An adaptive dual-purpose color image watermarking using white shark optimizer and Levenberg-Marquardt BPNN. Expert Syst. Appl. 2023, 226, 120137. [Google Scholar] [CrossRef]
Abualigah, L.; Abd Elaziz, M.; Sumari, P.; Geem, Z.W.; Gandomi, A.H. Reptile Search Algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Syst. Appl. 2022, 191, 116158. [Google Scholar] [CrossRef]
Abdel-Basset, M.; Mohamed, R.; Abouhawwash, M. Crested Porcupine Optimizer: A new nature-inspired metaheuristic. Knowl.-Based Syst. 2024, 284, 111257. [Google Scholar] [CrossRef]
Abdel-Basset, M.; Mohamed, R.; Jameel, M.; Abouhawwash, M. Nutcracker optimizer: A novel nature-inspired metaheuristic algorithm for global optimization and engineering design problems. Knowl.-Based Syst. 2023, 262, 110248. [Google Scholar] [CrossRef]
Qaraad, M.; Amjad, S.; Hussein, N.K.; Farag, M.A.; Mirjalili, S.; Elhosseini, M.A. Quadratic interpolation and a new local search approach to improve particle swarm optimization: Solar photovoltaic parameter estimation. Expert Syst. Appl. 2024, 236, 121417. [Google Scholar] [CrossRef]
Ahmed, R.; Rangaiah, G.P.; Mahadzir, S.; Mirjalili, S.; Hassan, M.H.; Kamel, S. Memory, evolutionary operator, and local search based improved Grey Wolf Optimizer with linear population size reduction technique. Knowl.-Based Syst. 2023, 264, 110297. [Google Scholar] [CrossRef]
Khosravi, H.; Amiri, B.; Yazdanjue, N.; Babaiyan, V. An improved group teaching optimization algorithm based on local search and chaotic map for feature selection in high-dimensional data. Expert Syst. Appl. 2022, 204, 117493. [Google Scholar] [CrossRef]
Ekinci, S.; Izci, D.; Abualigah, L.; Abu Zitar, R. A Modified Oppositional Chaotic Local Search Strategy Based Aquila Optimizer to Design an Effective Controller for Vehicle Cruise Control System. J. Bionic Eng. 2023, 20, 1828–1851. [Google Scholar] [CrossRef]
Xiao, J.; Wang, Y.J.; Xu, X.K. Fuzzy Community Detection Based on Elite Symbiotic Organisms Search and Node Neighborhood Information. IEEE Trans. Fuzzy Syst. 2022, 30, 2500–2514. [Google Scholar] [CrossRef]
Zhu, Q.L.; Lin, Q.Z.; Li, J.Q.; Coello, C.A.C.; Ming, Z.; Chen, J.Y.; Zhang, J. An Elite Gene Guided Reproduction Operator for Many-Objective Optimization. IEEE Trans. Cybern. 2021, 51, 765–778. [Google Scholar] [CrossRef]
Zhang, Y.Y. Elite archives-driven particle swarm optimization for large scale numerical optimization and its engineering applications. Swarm Evol. Comput. 2023, 76, 101212. [Google Scholar] [CrossRef]
Zhong, X.X.; Cheng, P. An elite-guided hierarchical differential evolution algorithm. Appl. Intell. 2021, 51, 4962–4983. [Google Scholar] [CrossRef]
Zhou, L.; Feng, L.; Gupta, A.; Ong, Y.S. Learnable Evolutionary Search Across Heterogeneous Problems via Kernelized Autoencoding. IEEE Trans. Evol. Comput. 2021, 25, 567–581. [Google Scholar] [CrossRef]
Feng, L.; Zhou, W.; Liu, W.C.; Ong, Y.S.; Tan, K.C. Solving Dynamic Multiobjective Problem via Autoencoding Evolutionary Search. IEEE Trans. Cybern. 2022, 52, 2649–2662. [Google Scholar] [CrossRef]
Zhan, Z.H.; Wang, Z.J.; Jin, H.; Zhang, J. Adaptive Distributed Differential Evolution. IEEE Trans. Cybern. 2020, 50, 4633–4647. [Google Scholar] [CrossRef]
Zhan, Z.H.; Li, J.Y.; Kwong, S.; Zhang, J. Learning-Aided Evolution for Optimization. IEEE Trans. Evol. Comput. 2023, 27, 1794–1808. [Google Scholar] [CrossRef]
Zabihi, Z.; Moghadam, A.M.E.; Rezvani, M.H. Reinforcement Learning Methods for Computation Offloading: A Systematic Review. Acm Comput. Surv. 2024, 56, 1–41. [Google Scholar] [CrossRef]
Wang, D.; Gao, N.; Liu, D.; Li, J.; Lewis, F.L. Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications. IEEE-CAA J. Autom. Sin. 2024, 11, 18–36. [Google Scholar] [CrossRef]
Zhao, F.; Wang, Q.; Wang, L. An inverse reinforcement learning framework with the Q-learning mechanism for the metaheuristic algorithm. Knowl.-Based Syst. 2023, 265, 110368. [Google Scholar] [CrossRef]
Ghetas, M.; Issa, M. A novel reinforcement learning-based reptile search algorithm for solving optimization problems. Neural Comput. Appl. 2023, 36, 533–568. [Google Scholar] [CrossRef]
Li, Z.; Shi, L.; Yue, C.; Shang, Z.; Qu, B. Differential evolution based on reinforcement learning with fitness ranking for solving multimodal multiobjective problems. Swarm Evol. Comput. 2019, 49, 234–244. [Google Scholar] [CrossRef]
Tan, Z.; Li, K. Differential evolution with mixed mutation strategy based on deep reinforcement learning. Appl. Soft Comput. 2021, 111, 107678. [Google Scholar] [CrossRef]
Wu, D.; Wang, S.; Liu, Q.; Abualigah, L.; Jia, H. An Improved Teaching-Learning-Based Optimization Algorithm with Reinforcement Learning Strategy for Solving Optimization Problems. Comput. Intell. Neurosci. 2022, 2022, 1535957. [Google Scholar] [CrossRef] [PubMed]
Samma, H.; Lim, C.P.; Saleh, J.M. A new Reinforcement Learning-Based Memetic Particle Swarm Optimizer. Appl. Soft Comput. 2016, 43, 276–297. [Google Scholar] [CrossRef]
Hu, Z.P.; Yu, X.B. Reinforcement learning-based comprehensive learning grey wolf optimizer for feature selection. Appl. Soft Comput. 2023, 149, 110959. [Google Scholar] [CrossRef]
Li, J.; Dong, H.; Wang, P.; Shen, J.; Qin, D. Multi-objective constrained black-box optimization algorithm based on feasible region localization and performance-improvement exploration. Appl. Soft Comput. 2023, 148, 110874. [Google Scholar] [CrossRef]
Wang, Z.; Yao, S.; Li, G.; Zhang, Q. Multiobjective Combinatorial Optimization Using a Single Deep Reinforcement Learning Model. IEEE Trans. Cybern. 2024, 54, 1984–1996. [Google Scholar] [CrossRef]
Huang, L.; Dong, B.; Xie, W.; Zhang, W. Offline Reinforcement Learning with Behavior Value Regularization. IEEE Trans. Cybern. 2024, 54, 3692–3704. [Google Scholar] [CrossRef]
Abdel-Basset, M.; El-Shahat, D.; Jameel, M.; Abouhawwash, M. Exponential distribution optimizer (EDO): A novel math-inspired algorithm for global optimization and engineering problems. Artif. Intell. Rev. 2023, 56, 9329–9400. [Google Scholar] [CrossRef]

Figure 1. Illustration of the RLNOA.

Figure 2. The variation of

N_{e x p l o r a t i o n}

when

T_{\max} = 10

,

N P = 20

, and

ζ = 8

.

Figure 2. The variation of

N_{e x p l o r a t i o n}

when

T_{\max} = 10

,

N P = 20

, and

ζ = 8

.

Figure 3. The illustration of states in the exploitation population.

Figure 4. The convergence curves of different algorithms applied to the CEC-2014 test suite.

Figure 5. The box plots of different algorithms applied to the CEC-2014 test suite.

Figure 6. The convergence curves of different algorithms applied to the CEC-2017 test suite.

Figure 7. The box plots of different algorithms applied to the CEC-2017 test suite.

Figure 8. The convergence curves of different algorithms applied to the CEC-2020 test suite.

Figure 9. The box plots of different algorithms applied to the CEC-2020 test suite.

Figure 10. Q-table update process for the F1 function.

Table 1. The common parameters of the experimental algorithms.

Algorithm	Specifications	Population Size NP
RLNOA	$Learning rate α = 0.5,$ $discount factor γ = 0.5,$ $P_{r p} = 0.2,$ $δ = 0.05,$ $k = 20,$ $ζ = 1$	100
NOA	$P_{a 1}$ $decreases linearly from 2 to 0, P a_{2} = 0.2,$ $P_{r p} = 0.2,$ $δ = 0.05$	100
SO	$c_{1} = 0.5,$ $c_{2} = 0.05,$ $c_{3} = 2$	100
RSA	$α = 0.1,$ $β = 0.005$	100
CPO	$The number of cycles T = 2,$ $the convergence rate α = 0.2,$ $T_{f} = 0.8,$ $N_{\min} = 20$	100
GWO	Convergence constant $a$ decreases linearly from 2 to 0	100
PSO	$ω = 1,$ $c_{1} = 1.5,$ $c_{2} = 2$	100
RLTLBO	$Learning rate α = 0.5,$ $discount factor γ = 0.5$	33 because this algorithm has three main stages
RLMPSO	$Learning rate α = 0.5,$ $discount factor γ = 0.5,$ $ω = 0.9$ $for exploration, ω = 0.4$ $for convergence, c_{1} = 2.5$ $for exploration, c_{2} = 0.5$ $for convergence, c_{1} = 0.5$ $for exploration, c_{2} = 2.5$ $for convergence, [V_{\min}, V_{\max}]$ is set as 0.2 of search range	100
RLCGWO	$Learning rate α = 0.5,$ $discount factor γ = 0.5,$ $scaling factor e = 0.5,$ $the minimum learning probability a = 0.1,$ $the maximum learning probability b = 0.5$	100

Table 2. Results of the CEC-2014 test suite.

Fun	Metrics	RLNOA	NOA	SO	RSA	CPO	GWO	PSO	RLTLBO	RLMPSO	RLCGWO
F1	Ave	1.00 × 10²	1.04 × 10²	3.86 × 10⁷	1.02 × 10⁸	1.64 × 10²	4.57 × 10⁶	1.24 × 10⁴	2.39 × 10⁴	3.28 × 10⁵	1.86 × 10⁷
	Std	4.65 × 10⁻³	2.55 × 10⁰	2.05 × 10⁷	4.09 × 10⁷	4.73 × 10¹	3.34 × 10⁶	1.33 × 10⁴	3.63 × 10⁴	2.44 × 10⁵	1.79 × 10⁷
	Rank	1	2	9	10	3	7	4	5	6	8
F2	Ave	2.00 × 10²	2.00 × 10²	2.16 × 10⁹	5.89 × 10⁹	2.00 × 10²	6.15 × 10³	1.10 × 10³	5.83 × 10²	2.04 × 10³	2.60 × 10⁹
	Std	2.61 × 10⁻⁶	1.15 × 10⁻³	9.40 × 10⁸	1.50 × 10⁹	1.03 × 10⁻⁵	4.49 × 10³	1.17 × 10³	6.06 × 10²	2.90 × 10³	1.26 × 10⁹
	Rank	1	3	8	10	2	7	5	4	6	9
F3	Ave	3.00 × 10²	3.00 × 10²	1.54 × 10⁴	8.05 × 10³	3.00 × 10²	4.01 × 10³	3.39 × 10²	3.89 × 10²	4.27 × 10³	4.97 × 10⁴
	Std	1.52 × 10⁻⁸	1.60 × 10⁻⁶	4.95 × 10³	3.32 × 10³	1.12 × 10⁻⁸	3.55 × 10³	6.59 × 10¹	1.18 × 10²	1.70 × 10³	2.18 × 10⁴
	Rank	2	3	9	8	1	6	4	5	7	10
F4	Ave	4.00 × 10²	4.05 × 10²	8.49 × 10²	1.37 × 10³	4.09 × 10²	4.29 × 10²	4.28 × 10²	4.16 × 10²	4.31 × 10²	5.15 × 10²
	Std	3.68 × 10⁻³	1.14 × 10¹	2.81 × 10²	6.97 × 10²	1.51 × 10¹	1.17 × 10¹	1.34 × 10¹	1.63 × 10¹	8.74 × 10⁰	5.83 × 10¹
	Rank	1	2	9	10	3	6	5	4	7	8
F5	Ave	5.13 × 10²	5.18 × 10²	5.21 × 10²	5.20 × 10²	5.17 × 10²	5.20 × 10²	5.20 × 10²	5.20 × 10²	5.19 × 10²	5.20 × 10²
	Std	8.38 × 10⁰	6.18 × 10⁰	1.16 × 10⁻¹	8.23 × 10⁻²	7.38 × 10⁰	5.93 × 10⁻²	1.06 × 10⁻³	6.58 × 10⁻²	4.55 × 10⁰	1.13 × 10⁻¹
	Rank	1	3	10	9	2	7	5	8	4	6
F6	Ave	6.00 × 10²	6.00 × 10²	6.10 × 10²	6.09 × 10²	6.00 × 10²	6.01 × 10²	6.02 × 10²	6.02 × 10²	6.03 × 10²	6.09 × 10²
	Std	6.70 × 10⁻⁶	1.89 × 10⁻²	6.27 × 10⁻¹	1.09 × 10⁰	4.34 × 10⁻²	6.05 × 10⁻¹	1.47 × 10⁰	9.28 × 10⁻¹	1.34 × 10⁰	1.19 × 10⁰
	Rank	1	3	10	9	2	4	5	6	7	8
F7	Ave	7.00 × 10²	7.00 × 10²	7.48 × 10²	7.79 × 10²	7.00 × 10²	7.01 × 10²	7.00 × 10²	7.00 × 10²	7.00 × 10²	7.23 × 10²
	Std	8.66 × 10⁻³	2.32 × 10⁻²	1.69 × 10¹	2.48 × 10¹	3.57 × 10⁻²	1.03 × 10⁰	7.46 × 10⁻²	8.26 × 10⁻²	8.76 × 10⁻²	7.51 × 10⁰
	Rank	1	2	9	10	3	7	4	5	6	8
F8	Ave	8.00 × 10²	8.00 × 10²	8.58 × 10²	8.69 × 10²	8.00 × 10²	8.07 × 10²	8.15 × 10²	8.11 × 10²	8.16 × 10²	8.60 × 10²
	Std	0.00 × 10⁰	3.58 × 10⁻¹³	1.10 × 10¹	9.38 × 10⁰	2.83 × 10⁻¹⁰	3.19 × 10⁰	7.35 × 10⁰	6.43 × 10⁰	6.83 × 10⁰	9.87 × 10⁰
	Rank	1	2	8	10	3	4	6	5	7	9
F9	Ave	9.02 × 10²	9.04 × 10²	9.61 × 10²	9.58 × 10²	9.12 × 10²	9.11 × 10²	9.15 × 10²	9.22 × 10²	9.21 × 10²	9.59 × 10²
	Std	6.20 × 10⁻¹	1.14 × 10⁰	1.27 × 10¹	5.78 × 10⁰	2.46 × 10⁰	4.78 × 10⁰	7.19 × 10⁰	1.26 × 10¹	7.73 × 10⁰	8.83 × 10⁰
	Rank	1	2	10	8	4	3	5	7	6	9
F10	Ave	1.00 × 10³	1.00 × 10³	2.51 × 10³	1.97 × 10³	1.01 × 10³	1.30 × 10³	1.35 × 10³	1.52 × 10³	1.29 × 10³	2.01 × 10³
	Std	5.16 × 10⁻²	9.51 × 10⁻¹	2.37 × 10²	2.07 × 10²	3.48 × 10⁰	2.15 × 10²	1.25 × 10²	1.93 × 10²	1.88 × 10²	2.48 × 10²
	Rank	1	2	10	8	3	5	6	7	4	9
F11	Ave	1.22 × 10³	1.56 × 10³	3.18 × 10³	2.43 × 10³	2.16 × 10³	1.72 × 10³	1.80 × 10³	1.73 × 10³	2.70 × 10³	4.04 × 10³
	Std	4.64 × 10¹	1.65 × 10²	2.95 × 10²	1.80 × 10²	1.26 × 10²	3.94 × 10²	2.84 × 10²	2.64 × 10²	1.41 × 10²	4.54 × 10²
	Rank	1	2	9	7	6	3	5	4	8	10
F12	Ave	1.20 × 10³	1.20 × 10³	1.20 × 10³	1.20 × 10³	1.20 × 10³	1.20 × 10³	1.20 × 10³	1.20 × 10³	1.20 × 10³	1.20 × 10³
	Std	1.74 × 10⁻²	5.36 × 10⁻²	4.70 × 10⁻¹	2.00 × 10⁻¹	6.69 × 10⁻²	4.50 × 10⁻¹	1.23 × 10⁻¹	3.44 × 10⁻¹	1.55 × 10⁻¹	3.60 × 10⁻¹
	Rank	1	3	10	9	5	4	2	7	8	6
F13	Ave	1.30 × 10³	1.30 × 10³	1.30 × 10³	1.30 × 10³	1.30 × 10³	1.30 × 10³	1.30 × 10³	1.30 × 10³	1.30 × 10³	1.30 × 10³
	Std	8.47 × 10⁻³	1.91 × 10⁻²	6.58 × 10⁻¹	8.19 × 10⁻¹	2.70 × 10⁻²	6.50 × 10⁻²	7.76 × 10⁻²	8.02 × 10⁻²	5.51 × 10⁻²	2.29 × 10⁻¹
	Rank	1	2	9	10	5	4	3	6	7	8
F14	Ave	1.40 × 10³	1.40 × 10³	1.41 × 10³	1.41 × 10³	1.40 × 10³	1.40 × 10³	1.40 × 10³	1.40 × 10³	1.40 × 10³	1.40 × 10³
	Std	1.29 × 10⁻²	3.13 × 10⁻²	5.57 × 10⁰	3.50 × 10⁰	3.96 × 10⁻²	1.66 × 10⁻¹	5.65 × 10⁻²	1.02 × 10⁻¹	4.79 × 10⁻²	1.93 × 10⁰
	Rank	1	2	10	9	4	5	3	7	6	8
F15	Ave	1.50 × 10³	1.50 × 10³	2.77 × 10³	2.68 × 10³	1.50 × 10³	1.50 × 10³	1.50 × 10³	1.50 × 10³	1.50 × 10³	2.14 × 10³
	Std	8.24 × 10⁻²	1.35 × 10⁻¹	1.83 × 10³	1.25 × 10³	2.30 × 10⁻¹	6.79 × 10⁻¹	3.66 × 10⁻¹	4.93 × 10⁻¹	9.82 × 10⁻¹	9.17 × 10²
	Rank	1	2	10	9	5	4	3	6	7	8
F16	Ave	1.60 × 10³	1.60 × 10³	1.60 × 10³	1.60 × 10³	1.60 × 10³	1.60 × 10³	1.60 × 10³	1.60 × 10³	1.60 × 10³	1.60 × 10³
	Std	2.10 × 10⁻¹	2.95 × 10⁻¹	1.86 × 10⁻¹	1.87 × 10⁻¹	2.46 × 10⁻¹	4.42 × 10⁻¹	5.19 × 10⁻¹	3.45 × 10⁻¹	4.04 × 10⁻¹	2.30 × 10⁻¹
	Rank	1	2	10	9	4	3	6	5	7	8
F17	Ave	1.71 × 10³	1.73 × 10³	2.74 × 10⁵	4.34 × 10⁵	1.79 × 10³	4.06 × 10⁴	4.15 × 10³	2.57 × 10³	7.96 × 10³	4.44 × 10⁵
	Std	2.72 × 10⁰	1.17 × 10¹	2.38 × 10⁵	1.39 × 10⁵	2.59 × 10¹	1.04 × 10⁵	2.11 × 10³	6.71 × 10²	3.88 × 10³	8.66 × 10⁵
	Rank	1	2	8	9	3	7	5	4	6	10
F18	Ave	1.80 × 10³	1.80 × 10³	1.95 × 10⁵	8.88 × 10⁴	1.80 × 10³	9.13 × 10³	1.25 × 10⁴	4.15 × 10³	9.55 × 10³	3.19 × 10⁴
	Std	1.66 × 10⁻¹	5.94 × 10⁻¹	3.26 × 10⁵	1.83 × 10⁵	7.55 × 10⁻¹	6.34 × 10³	7.24 × 10³	1.78 × 10³	6.03 × 10³	3.89 × 10⁴
	Rank	1	2	10	9	3	5	7	4	6	8
F19	Ave	1.90 × 10³	1.90 × 10³	1.91 × 10³	1.91 × 10³	1.90 × 10³	1.90 × 10³	1.90 × 10³	1.90 × 10³	1.90 × 10³	1.91 × 10³
	Std	5.49 × 10⁻²	1.78 × 10⁻¹	4.63 × 10⁰	2.64 × 10⁰	1.98 × 10⁻¹	7.61 × 10⁻¹	1.02 × 10⁰	5.08 × 10⁻¹	1.15 × 10⁰	7.90 × 10⁻¹
	Rank	1	2	9	10	3	5	6	4	7	8
F20	Ave	2.00 × 10³	2.00 × 10³	1.22 × 10⁴	1.05 × 10⁴	2.00 × 10³	4.80 × 10³	2.67 × 10³	2.11 × 10³	2.34 × 10³	1.62 × 10⁴
	Std	4.59 × 10⁻²	1.65 × 10⁻¹	7.65 × 10³	3.22 × 10³	3.20 × 10⁻¹	3.80 × 10³	8.27 × 10²	3.41 × 10¹	5.31 × 10²	1.26 × 10⁴
	Rank	1	2	9	8	3	7	6	4	5	10
F21	Ave	2.10 × 10³	2.10 × 10³	1.02 × 10⁵	2.04 × 10⁵	2.11 × 10³	8.61 × 10³	2.27 × 10³	2.26 × 10³	6.38 × 10³	1.21 × 10⁴
	Std	1.49 × 10⁻¹	5.73 × 10⁻¹	1.72 × 10⁵	1.76 × 10⁵	1.96 × 10⁰	4.53 × 10³	1.29 × 10²	9.87 × 10¹	4.71 × 10³	1.13 × 10⁴
	Rank	1	2	9	10	3	7	5	4	6	8
F22	Ave	2.20 × 10³	2.20 × 10³	2.39 × 10³	2.35 × 10³	2.21 × 10³	2.28 × 10³	2.29 × 10³	2.23 × 10³	2.28 × 10³	2.29 × 10³
	Std	6.58 × 10⁻²	6.00 × 10⁻¹	8.30 × 10¹	4.06 × 10¹	1.22 × 10⁰	5.28 × 10¹	6.40 × 10¹	2.86 × 10¹	5.90 × 10¹	3.40 × 10¹
	Rank	1	2	10	9	3	6	7	4	5	8
F23	Ave	2.50 × 10³	2.50 × 10³	2.67 × 10³	2.50 × 10³	2.50 × 10³	2.63 × 10³	2.63 × 10³	2.50 × 10³	2.63 × 10³	2.69 × 10³
	Std	0.00 × 10⁰	0.00 × 10⁰	2.48 × 10¹	0.00 × 10⁰	0.00 × 10⁰	3.56 × 10⁰	2.95 × 10⁻¹³	0.00 × 10⁰	2.77 × 10⁻⁷	1.17 × 10¹
	Rank	1	2	9	3	4	8	6	5	7	10
F24	Ave	2.50 × 10³	2.50 × 10³	2.59 × 10³	2.60 × 10³	2.52 × 10³	2.52 × 10³	2.52 × 10³	2.59 × 10³	2.53 × 10³	2.58 × 10³
	Std	2.76 × 10⁰	3.85 × 10⁰	1.62 × 10¹	8.64 × 10⁰	2.98 × 10⁰	5.84 × 10⁰	8.79 × 10⁰	2.88 × 10¹	9.68 × 10⁰	1.37 × 10¹
	Rank	1	2	8	10	4	3	5	9	6	7
F25	Ave	2.61 × 10³	2.62 × 10³	2.70 × 10³	2.70 × 10³	2.64 × 10³	2.70 × 10³	2.68 × 10³	2.70 × 10³	2.64 × 10³	2.70 × 10³
	Std	3.29 × 10⁰	3.40 × 10⁰	7.49 × 10⁰	1.10 × 10⁻²	2.25 × 10¹	1.46 × 10¹	2.73 × 10¹	0.00 × 10⁰	1.74 × 10¹	7.56 × 10⁰
	Rank	1	2	6	8	3	7	5	9	4	10
F26	Ave	2.70 × 10³	2.70 × 10³	2.70 × 10³	2.70 × 10³	2.70 × 10³	2.70 × 10³	2.70 × 10³	2.70 × 10³	2.70 × 10³	2.70 × 10³
	Std	9.37 × 10⁻³	1.53 × 10⁻²	5.68 × 10⁻¹	5.20 × 10⁻¹	2.72 × 10⁻²	3.39 × 10⁻²	6.53 × 10⁻²	7.06 × 10⁻²	4.19 × 10⁻²	1.82 × 10⁻¹
	Rank	1	2	9	10	6	3	4	5	7	8
F27	Ave	2.70 × 10³	2.70 × 10³	3.04 × 10³	2.96 × 10³	2.72 × 10³	3.05 × 10³	2.99 × 10³	2.81 × 10³	2.97 × 10³	3.14 × 10³
	Std	2.19 × 10⁻¹	3.35 × 10⁻¹	1.44 × 10²	1.33 × 10²	8.89 × 10¹	3.79 × 10¹	1.27 × 10²	1.01 × 10²	1.73 × 10²	1.29 × 10²
	Rank	1	2	8	5	3	9	7	4	6	10
F28	Ave	3.00 × 10³	3.00 × 10³	3.19 × 10³	3.27 × 10³	3.04 × 10³	3.26 × 10³	3.31 × 10³	3.00 × 10³	3.19 × 10³	3.26 × 10³
	Std	0.00 × 10⁰	0.00 × 10⁰	2.19 × 10²	1.40 × 10²	8.85 × 10¹	8.53 × 10¹	7.12 × 10¹	0.00 × 10⁰	1.08 × 10²	5.94 × 10¹
	Rank	1	2	5	9	4	7	10	3	6	8
F29	Ave	3.05 × 10³	3.10 × 10³	1.14 × 10⁵	4.50 × 10⁴	3.12 × 10³	3.68 × 10³	3.24 × 10³	3.15 × 10³	1.98 × 10⁵	4.18 × 10⁵
	Std	6.18 × 10⁰	1.85 × 10¹	1.75 × 10⁵	1.10 × 10⁵	1.22 × 10¹	5.20 × 10²	9.86 × 10¹	1.56 × 10²	5.98 × 10⁵	8.48 × 10⁵
	Rank	1	2	8	7	3	6	5	4	9	10
F30	Ave	3.45 × 10³	3.49 × 10³	8.35 × 10³	7.21 × 10³	3.64 × 10³	4.10 × 10³	4.74 × 10³	3.37 × 10³	4.34 × 10³	3.92 × 10³
	Std	4.79 × 10¹	1.63 × 10¹	3.30 × 10³	2.05 × 10³	3.27 × 10¹	7.90 × 10²	4.97 × 10²	2.72 × 10²	7.15 × 10²	2.90 × 10²
	Rank	2	3	10	9	4	6	8	1	7	5
Mean	Ranking	1.0345	2.1724	8.8966	8.6897	3.4483	5.4828	5.1379	5.3103	6.3103	8.5172
Final	Rank	1	2	10	9	3	6	4	5	7	8

Table 3. Results of the CEC-2017 test suite.

Fun	Metrics	RLNOA	NOA	SO	RSA	CPO	GWO	PSO	RLTLBO	RLMPSO	RLCGWO
F1	Ave	1.00 × 10²	1.00 × 10²	3.65 × 10⁹	9.70 × 10⁹	1.00 × 10²	1.30 × 10⁶	1.25 × 10³	2.12 × 10³	3.10 × 10³	1.95 × 10⁹
	Std	8.32 × 10⁻⁵	1.59 × 10⁻²	1.44 × 10⁹	2.68 × 10⁹	3.70 × 10⁻⁴	4.96 × 10⁶	2.07 × 10³	2.45 × 10³	3.34 × 10³	6.61 × 10⁸
	Rank	1	3	9	10	2	7	4	5	6	8
F3	Ave	3.00 × 10²	3.00 × 10²	1.14 × 10⁴	7.33 × 10³	3.00 × 10²	7.61 × 10²	3.00 × 10²	3.00 × 10²	7.22 × 10²	1.42 × 10⁴
	Std	1.32 × 10⁻¹¹	5.52 × 10⁻⁸	3.10 × 10³	2.01 × 10³	9.79 × 10⁻⁷	8.18 × 10²	4.12 × 10⁻¹⁴	3.04 × 10⁻⁹	1.65 × 10²	6.44 × 10³
	Rank	2	4	9	8	5	7	1	3	6	10
F4	Ave	4.00 × 10²	4.00 × 10²	6.44 × 10²	9.52 × 10²	4.01 × 10²	4.08 × 10²	4.01 × 10²	4.07 × 10²	4.08 × 10²	5.30 × 10²
	Std	8.57 × 10⁻⁶	7.44 × 10⁻³	1.22 × 10²	4.15 × 10²	4.06 × 10⁻¹	2.34 × 10⁰	7.61 × 10⁻¹	1.46 × 10¹	1.36 × 10¹	8.14 × 10¹
	Rank	1	2	9	10	3	6	4	5	7	8
F5	Ave	5.03 × 10²	5.04 × 10²	5.70 × 10²	5.78 × 10²	5.12 × 10²	5.14 × 10²	5.17 × 10²	5.16 × 10²	5.22 × 10²	5.61 × 10²
	Std	7.16 × 10⁻¹	1.42 × 10⁰	1.13 × 10¹	1.38 × 10¹	1.70 × 10⁰	8.36 × 10⁰	6.65 × 10⁰	7.73 × 10⁰	8.66 × 10⁰	1.07 × 10¹
	Rank	1	2	9	10	3	4	6	5	7	8
F6	Ave	6.00 × 10²	6.00 × 10²	6.36 × 10²	6.45 × 10²	6.00 × 10²	6.00 × 10²	6.01 × 10²	6.00 × 10²	6.05 × 10²	6.30 × 10²
	Std	4.16 × 10⁻¹²	4.59 × 10⁻⁸	9.52 × 10⁰	7.07 × 10⁰	1.04 × 10⁻⁶	5.11 × 10⁻¹	7.29 × 10⁻¹	7.53 × 10⁻¹	3.59 × 10⁰	7.44 × 10⁰
	Rank	1	2	9	10	3	5	6	4	7	8
F7	Ave	7.14 × 10²	7.15 × 10²	8.16 × 10²	8.01 × 10²	7.23 × 10²	7.26 × 10²	7.21 × 10²	7.31 × 10²	7.44 × 10²	8.58 × 10²
	Std	6.13 × 10⁻¹	1.58 × 10⁰	1.82 × 10¹	1.26 × 10¹	2.83 × 10⁰	9.37 × 10⁰	6.15 × 10⁰	9.14 × 10⁰	1.08 × 10¹	4.68 × 10¹
	Rank	1	2	9	8	4	5	3	6	7	10
F8	Ave	8.02 × 10²	8.04 × 10²	8.58 × 10²	8.51 × 10²	8.11 × 10²	8.10 × 10²	8.13 × 10²	8.16 × 10²	8.23 × 10²	8.78 × 10²
	Std	7.86 × 10⁻¹	9.54 × 10⁻¹	9.68 × 10⁰	7.96 × 10⁰	2.33 × 10⁰	5.53 × 10⁰	6.70 × 10⁰	5.15 × 10⁰	9.85 × 10⁰	1.21 × 10¹
	Rank	1	2	9	8	4	3	5	6	7	10
F9	Ave	9.00 × 10²	9.00 × 10²	1.46 × 10³	1.46 × 10³	9.00 × 10²	9.06 × 10²	9.00 × 10²	9.02 × 10²	9.08 × 10²	2.40 × 10³
	Std	0.00 × 10⁰	2.61 × 10⁻¹⁴	2.65 × 10²	2.16 × 10²	0.00 × 10⁰	1.43 × 10¹	4.52 × 10⁻¹⁴	1.48 × 10⁰	7.89 × 10⁰	5.80 × 10²
	Rank	1	2	8	9	3	6	4	5	7	10
F10	Ave	1.04 × 10³	1.19 × 10³	2.89 × 10³	2.50 × 10³	1.55 × 10³	1.58 × 10³	1.64 × 10³	1.47 × 10³	1.80 × 10³	2.11 × 10³
	Std	1.95 × 10¹	8.33 × 10¹	2.00 × 10²	1.85 × 10²	1.44 × 10²	3.29 × 10²	2.47 × 10²	3.29 × 10²	2.30 × 10²	3.47 × 10²
	Rank	1	2	10	9	4	5	6	3	7	8
F11	Ave	1.10 × 10³	1.10 × 10³	6.78 × 10³	4.83 × 10³	1.10 × 10³	1.12 × 10³	1.11 × 10³	1.11 × 10³	1.72 × 10³	9.36 × 10⁴
	Std	2.58 × 10⁻¹	1.11 × 10⁰	1.02 × 10⁴	2.47 × 10³	4.60 × 10⁻¹	1.74 × 10¹	6.89 × 10⁰	9.19 × 10⁰	4.22 × 10²	1.38 × 10⁵
	Rank	1	2	9	8	3	6	4	5	7	10
F12	Ave	1.24 × 10³	1.36 × 10³	7.05 × 10⁷	3.70 × 10⁸	1.55 × 10³	6.79 × 10⁵	1.19 × 10⁴	1.16 × 10⁴	2.81 × 10⁵	4.66 × 10⁷
	Std	1.38 × 10¹	4.92 × 10¹	6.17 × 10⁷	4.42 × 10⁸	8.08 × 10¹	9.50 × 10⁵	9.58 × 10³	7.96 × 10³	1.10 × 10⁶	4.11 × 10⁷
	Rank	1	2	9	10	3	7	5	4	6	8
F13	Ave	1.30 × 10³	1.31 × 10³	2.15 × 10⁵	2.03 × 10⁷	1.31 × 10³	9.55 × 10³	7.69 × 10³	3.28 × 10³	1.25 × 10⁴	3.91 × 10⁴
	Std	7.95 × 10⁻¹	1.35 × 10⁰	2.41 × 10⁵	2.30 × 10⁷	2.99 × 10⁰	4.31 × 10³	5.51 × 10³	1.77 × 10³	8.14 × 10³	2.96 × 10⁴
	Rank	1	2	9	10	3	6	5	4	7	8
F14	Ave	1.40 × 10³	1.40 × 10³	8.12 × 10³	4.26 × 10³	1.41 × 10³	2.63 × 10³	1.46 × 10³	1.43 × 10³	1.53 × 10³	1.98 × 10³
	Std	1.03 × 10⁻¹	7.23 × 10⁻¹	1.11 × 10⁴	2.20 × 10³	1.82 × 10⁰	1.65 × 10³	3.58 × 10¹	1.10 × 10¹	2.99 × 10¹	6.48 × 10²
	Rank	1	2	10	9	3	8	5	4	6	7
F15	Ave	1.50 × 10³	1.50 × 10³	1.03 × 10⁴	8.90 × 10³	1.50 × 10³	3.18 × 10³	1.57 × 10³	1.55 × 10³	2.04 × 10³	5.38 × 10³
	Std	4.70 × 10⁻²	2.08 × 10⁻¹	4.77 × 10³	5.49 × 10³	2.71 × 10⁻¹	1.66 × 10³	4.52 × 10¹	3.08 × 10¹	3.62 × 10²	4.70 × 10³
	Rank	1	2	10	9	3	7	5	4	6	8
F16	Ave	1.60 × 10³	1.60 × 10³	2.04 × 10³	2.09 × 10³	1.60 × 10³	1.72 × 10³	1.83 × 10³	1.64 × 10³	1.73 × 10³	1.75 × 10³
	Std	1.11 × 10⁻¹	3.35 × 10⁻¹	1.11 × 10²	1.29 × 10²	3.50 × 10⁻¹	1.19 × 10²	1.20 × 10²	6.58 × 10¹	1.24 × 10²	6.74 × 10¹
	Rank	1	2	9	10	3	5	8	4	6	7
F17	Ave	1.70 × 10³	1.70 × 10³	1.86 × 10³	1.82 × 10³	1.71 × 10³	1.74 × 10³	1.75 × 10³	1.74 × 10³	1.76 × 10³	1.86 × 10³
	Std	3.84 × 10⁻¹	1.65 × 10⁰	4.44 × 10¹	2.78 × 10¹	3.25 × 10⁰	1.90 × 10¹	2.26 × 10¹	1.22 × 10¹	3.39 × 10¹	7.27 × 10¹
	Rank	1	2	10	8	3	5	6	4	7	9
F18	Ave	1.80 × 10³	1.80 × 10³	4.46 × 10⁶	1.53 × 10⁷	1.80 × 10³	2.64 × 10⁴	4.97 × 10³	4.05 × 10³	2.02 × 10⁴	5.60 × 10⁴
	Std	1.09 × 10⁻¹	5.46 × 10⁻¹	6.83 × 10⁶	3.52 × 10⁷	1.12 × 10⁰	1.63 × 10⁴	4.20 × 10³	1.92 × 10³	1.66 × 10⁴	1.81 × 10⁴
	Rank	1	2	9	10	3	7	5	4	6	8
F19	Ave	1.90 × 10³	1.90 × 10³	2.85 × 10⁴	4.99 × 10⁵	1.90 × 10³	6.71 × 10³	2.19 × 10³	1.94 × 10³	2.53 × 10³	2.61 × 10⁴
	Std	3.39 × 10⁻²	9.22 × 10⁻²	3.37 × 10⁴	5.34 × 10⁵	2.28 × 10⁻¹	5.57 × 10³	6.10 × 10²	2.65 × 10¹	1.16 × 10³	1.40 × 10⁴
	Rank	1	2	9	10	3	7	5	4	6	8
F20	Ave	2.00 × 10³	2.00 × 10³	2.18 × 10³	2.22 × 10³	2.00 × 10³	2.05 × 10³	2.07 × 10³	2.03 × 10³	2.10 × 10³	2.19 × 10³
	Std	2.20 × 10⁻¹²	2.30 × 10⁻¹	6.39 × 10¹	3.92 × 10¹	1.30 × 10⁻¹	3.26 × 10¹	6.49 × 10¹	1.34 × 10¹	6.33 × 10¹	6.47 × 10¹
	Rank	1	2	8	10	3	5	6	4	7	9
F21	Ave	2.20 × 10³	2.20 × 10³	2.35 × 10³	2.29 × 10³	2.21 × 10³	2.30 × 10³	2.29 × 10³	2.25 × 10³	2.26 × 10³	2.34 × 10³
	Std	5.94 × 10⁻⁹	1.38 × 10⁻²	3.05 × 10¹	5.86 × 10¹	2.49 × 10¹	4.07 × 10¹	5.33 × 10¹	5.36 × 10¹	6.20 × 10¹	4.49 × 10¹
	Rank	1	2	10	7	3	8	6	4	5	9
F22	Ave	2.25 × 10³	2.21 × 10³	2.61 × 10³	2.87 × 10³	2.30 × 10³	2.35 × 10³	2.30 × 10³	2.30 × 10³	2.30 × 10³	2.50 × 10³
	Std	5.13 × 10¹	2.39 × 10¹	1.68 × 10²	2.02 × 10²	5.58 × 10⁻¹	1.81 × 10²	2.02 × 10¹	1.12 × 10⁰	1.83 × 10¹	9.39 × 10¹
	Rank	2	1	9	10	4	7	3	6	5	8
F23	Ave	2.60 × 10³	2.61 × 10³	2.69 × 10³	2.69 × 10³	2.61 × 10³	2.62 × 10³	2.62 × 10³	2.62 × 10³	2.62 × 10³	2.64 × 10³
	Std	1.08 × 10⁰	1.13 × 10⁰	2.25 × 10¹	9.98 × 10⁰	3.18 × 10⁰	8.09 × 10⁰	1.31 × 10¹	6.32 × 10⁰	6.65 × 10⁰	6.20 × 10⁰
	Rank	1	2	9	10	3	4	6	5	7	8
F24	Ave	2.50 × 10³	2.50 × 10³	2.82 × 10³	2.85 × 10³	2.62 × 10³	2.74 × 10³	2.72 × 10³	2.73 × 10³	2.72 × 10³	2.78 × 10³
	Std	9.21 × 10⁻¹³	3.92 × 10⁻⁸	3.07 × 10¹	5.95 × 10¹	1.22 × 10²	1.11 × 10¹	7.72 × 10¹	5.34 × 10¹	9.35 × 10¹	6.29 × 10⁰
	Rank	1	2	9	10	3	7	5	6	4	8
F25	Ave	2.90 × 10³	2.88 × 10³	3.11 × 10³	3.30 × 10³	2.90 × 10³	2.93 × 10³	2.92 × 10³	2.92 × 10³	2.92 × 10³	3.04 × 10³
	Std	9.33 × 10⁻¹³	6.65 × 10¹	1.04 × 10²	1.13 × 10²	1.66 × 10¹	1.60 × 10¹	3.27 × 10¹	2.39 × 10¹	2.47 × 10¹	3.27 × 10¹
	Rank	2	1	9	10	3	7	4	6	5	8
F26	Ave	2.81 × 10³	2.83 × 10³	3.66 × 10³	4.01 × 10³	2.90 × 10³	3.02 × 10³	2.88 × 10³	2.99 × 10³	2.96 × 10³	3.29 × 10³
	Std	1.30 × 10²	1.08 × 10²	2.85 × 10²	2.96 × 10²	2.24 × 10¹	3.03 × 10²	7.36 × 10¹	7.85 × 10¹	2.28 × 10²	4.75 × 10²
	Rank	1	2	9	10	4	7	3	6	5	8
F27	Ave	3.09 × 10³	3.09 × 10³	3.16 × 10³	3.19 × 10³	3.09 × 10³	3.09 × 10³	3.10 × 10³	3.10 × 10³	3.10 × 10³	3.10 × 10³
	Std	9.42 × 10⁻¹	8.96 × 10⁻¹	2.43 × 10¹	5.12 × 10¹	1.51 × 10⁰	3.54 × 10⁰	1.67 × 10¹	1.20 × 10¹	1.52 × 10¹	1.10 × 10¹
	Rank	1	2	9	10	3	4	8	6	5	7
F28	Ave	3.10 × 10³	3.10 × 10³	3.53 × 10³	3.73 × 10³	3.10 × 10³	3.33 × 10³	3.22 × 10³	3.20 × 10³	3.30 × 10³	3.33 × 10³
	Std	9.46 × 10⁻¹²	5.12 × 10⁻⁵	1.13 × 10²	9.77 × 10¹	7.76 × 10⁻⁹	1.00 × 10²	1.41 × 10²	1.00 × 10²	1.75 × 10²	7.95 × 10¹
	Rank	1	3	9	10	2	8	5	4	6	7
F29	Ave	3.14 × 10³	3.15 × 10³	3.39 × 10³	3.33 × 10³	3.17 × 10³	3.18 × 10³	3.21 × 10³	3.17 × 10³	3.21 × 10³	3.25 × 10³
	Std	1.75 × 10⁰	7.83 × 10⁰	7.79 × 10¹	7.80 × 10¹	6.72 × 10⁰	3.31 × 10¹	5.19 × 10¹	1.92 × 10¹	4.20 × 10¹	6.61 × 10¹
	Rank	1	2	10	9	3	5	6	4	7	8
F30	Ave	3.41 × 10³	3.51 × 10³	1.35 × 10⁷	3.61 × 10⁶	3.69 × 10³	4.29 × 10⁵	2.96 × 10⁵	1.00 × 10⁵	2.73 × 10⁵	4.49 × 10⁵
	Std	3.52 × 10⁰	5.74 × 10¹	1.27 × 10⁷	4.38 × 10⁶	1.57 × 10²	7.89 × 10⁵	4.63 × 10⁵	2.25 × 10⁵	4.75 × 10⁵	4.78 × 10⁵
	Rank	1	2	10	9	3	7	6	4	5	8
Mean	Ranking	1.1071	2.0714	9.1429	9.3571	3.1786	6.0000	4.9643	4.6429	6.2143	8.3214
Final	Rank	1	2	9	10	3	6	5	4	7	8

Table 4. Results of the CEC-2020 test suite.

Fun	Metrics	RLNOA	NOA	SO	RSA	CPO	GWO	PSO	RLTLBO	RLMPSO	RLCGWO
F1	Ave	1.00 × 10²	1.44 × 10²	1.70 × 10¹⁰	2.91 × 10¹⁰	1.04 × 10²	1.40 × 10⁸	3.58 × 10³	3.41 × 10³	6.56 × 10³	1.57 × 10¹⁰
	Std	2.14 × 10⁻¹	3.57 × 10¹	5.57 × 10⁹	4.45 × 10⁹	3.33 × 10⁰	3.75 × 10⁸	3.13 × 10³	3.52 × 10³	4.52 × 10³	2.63 × 10⁹
	Rank	1	3	9	10	2	7	5	4	6	8
F2	Ave	1.36 × 10³	1.96 × 10³	5.71 × 10³	5.53 × 10³	2.70 × 10³	2.54 × 10³	2.45 × 10³	2.61 × 10³	3.49 × 10³	5.53 × 10³
	Std	1.14 × 10²	1.62 × 10²	3.81 × 10²	2.06 × 10²	2.39 × 10²	5.79 × 10²	3.68 × 10²	5.94 × 10²	6.67 × 10²	4.30 × 10²
	Rank	1	2	10	9	6	4	3	5	7	8
F3	Ave	7.31 × 10²	7.42 × 10²	1.04 × 10³	1.01 × 10³	7.63 × 10²	7.67 × 10²	7.53 × 10²	7.99 × 10²	8.26 × 10²	1.74 × 10³
	Std	3.04 × 10⁰	1.01 × 10¹	3.51 × 10¹	2.94 × 10¹	4.60 × 10⁰	2.22 × 10¹	1.03 × 10¹	3.05 × 10¹	3.73 × 10¹	1.90 × 10²
	Rank	1	2	9	8	4	5	3	6	7	10
F4	Ave	1.90 × 10³	1.90 × 10³	1.54 × 10⁵	3.54 × 10⁵	1.91 × 10³	1.91 × 10³	1.90 × 10³	1.92 × 10³	1.91 × 10³	3.63 × 10⁴
	Std	3.02 × 10⁻¹	4.32 × 10⁻¹	1.13 × 10⁵	1.81 × 10⁵	6.18 × 10⁻¹	3.06 × 10⁰	6.75 × 10⁻¹	1.04 × 10¹	3.31 × 10⁰	2.18 × 10⁴
	Rank	2	3	9	10	4	5	1	7	6	8
F5	Ave	2.12 × 10³	2.40 × 10³	3.16 × 10⁶	5.01 × 10⁶	3.02 × 10³	3.95 × 10⁵	8.93 × 10⁴	6.03 × 10⁴	2.74 × 10⁵	2.75 × 10⁶
	Std	5.42 × 10¹	1.28 × 10²	1.41 × 10⁶	2.14 × 10⁶	2.16 × 10²	6.93 × 10⁵	5.25 × 10⁴	3.58 × 10⁴	1.55 × 10⁵	2.52 × 10⁶
	Rank	1	2	9	10	3	7	5	4	6	8
F6	Ave	1.60 × 10³	1.60 × 10³	2.88 × 10³	3.12 × 10³	1.61 × 10³	1.86 × 10³	1.88 × 10³	1.75 × 10³	1.94 × 10³	2.38 × 10³
	Std	1.82 × 10⁻¹	4.85 × 10⁻¹	3.61 × 10²	4.69 × 10²	4.30 × 10⁰	1.49 × 10²	1.58 × 10²	1.10 × 10²	1.58 × 10²	2.12 × 10²
	Rank	1	2	9	10	3	5	6	4	7	8
F7	Ave	2.25 × 10³	2.39 × 10³	1.33 × 10⁶	3.21 × 10⁶	2.69 × 10³	1.36 × 10⁵	5.24 × 10⁴	1.85 × 10⁴	9.30 × 10⁴	1.04 × 10⁶
	Std	3.46 × 10¹	8.91 × 10¹	1.32 × 10⁶	3.63 × 10⁶	1.06 × 10²	8.51 × 10⁴	7.68 × 10⁴	1.91 × 10⁴	7.48 × 10⁴	6.58 × 10⁵
	Rank	1	2	9	10	3	7	5	4	6	8
F8	Ave	2.30 × 10³	2.30 × 10³	4.78 × 10³	5.30 × 10³	2.30 × 10³	2.71 × 10³	2.40 × 10³	2.30 × 10³	2.30 × 10³	5.47 × 10³
	Std	1.43 × 10¹	1.44 × 10⁻⁴	9.68 × 10²	7.07 × 10²	4.04 × 10⁻⁶	7.46 × 10²	4.47 × 10²	5.71 × 10⁰	1.23 × 10⁰	1.08 × 10³
	Rank	1	3	8	9	2	7	6	5	4	10
F9	Ave	2.81 × 10³	2.81 × 10³	3.13 × 10³	3.19 × 10³	2.86 × 10³	2.85 × 10³	2.86 × 10³	2.86 × 10³	2.88 × 10³	2.94 × 10³
	Std	4.14 × 10⁰	6.80 × 10¹	6.79 × 10¹	1.85 × 10²	1.01 × 10¹	3.68 × 10¹	3.03 × 10¹	1.97 × 10¹	3.45 × 10¹	1.33 × 10¹
	Rank	2	1	9	10	6	3	4	5	7	8
F10	Ave	2.91 × 10³	2.91 × 10³	4.26 × 10³	4.82 × 10³	2.93 × 10³	2.95 × 10³	2.93 × 10³	2.98 × 10³	2.95 × 10³	4.15 × 10³
	Std	4.49 × 10⁰	3.87 × 10⁻²	5.13 × 10²	7.40 × 10²	2.54 × 10¹	3.01 × 10¹	3.04 × 10¹	3.81 × 10¹	3.41 × 10¹	4.77 × 10²
	Rank	1	2	9	10	4	5	3	7	6	8
Mean	Ranking	1.2222	2.2222	9.0000	9.5556	3.6667	5.5556	4.2222	4.8889	6.2222	8.4444
Final	Rank	1	2	9	10	3	6	4	5	7	8

Table 5. Results of the RLNOA with different values for the parameter

P_{r p}

.

Table 5. Results of the RLNOA with different values for the parameter

P_{r p}

.

$P_{r p}$	Fun
$P_{r p}$	F1	F4	F17	F23
0.8	1.00 × 10²	4.00 × 10²	1.72 × 10³	2.50 × 10³
0.6	1.00 × 10²	4.00 × 10²	1.72 × 10³	2.50 × 10³
0.4	1.00 × 10²	4.00 × 10²	1.72 × 10³	2.50 × 10³
0.2	1.00 × 10²	4.00 × 10²	1.71 × 10³	2.50 × 10³

Table 6. Results of the RLNOA with different values for the parameter

δ

.

Table 6. Results of the RLNOA with different values for the parameter

δ

.

$δ$	Fun
$δ$	F1	F4	F17	F23
0.5	1.00 × 10²	4.00 × 10²	1.71 × 10³	2.50 × 10³
0.2	1.00 × 10²	4.00 × 10²	1.71 × 10³	2.50 × 10³
0.1	1.00 × 10²	4.00 × 10²	1.71 × 10³	2.50 × 10³
0.05	1.00 × 10²	4.00 × 10²	1.71 × 10³	2.50 × 10³

Table 7. Results of the RLNOA with different values for the parameter

k

.

Table 7. Results of the RLNOA with different values for the parameter

k

.

$k$	Fun
$k$	F1	F4	F17	F23
5	1.00 × 10²	4.00 × 10²	1.71 × 10³	2.50 × 10³
10	1.00 × 10²	4.00 × 10²	1.71 × 10³	2.50 × 10³
20	1.00 × 10²	4.00 × 10²	1.71 × 10³	2.50 × 10³
50	1.00 × 10²	4.00 × 10²	1.71 × 10³	2.50 × 10³

Table 8. Results of the RLNOA with different values for the parameter

ζ

.

Table 8. Results of the RLNOA with different values for the parameter

ζ

.

$ζ$	Fun
$ζ$	F1	F4	F17	F23
1	1.00 × 10²	4.00 × 10²	1.71 × 10³	2.50 × 10³
2	1.01 × 10²	4.00 × 10²	1.72 × 10³	2.50 × 10³
4	1.04 × 10²	4.00 × 10²	1.72 × 10³	2.50 × 10³
8	1.17 × 10²	4.00 × 10²	1.72 × 10³	2.50 × 10³

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Zhang, Y. A Reinforcement Learning-Based Bi-Population Nutcracker Optimizer for Global Optimization. Biomimetics 2024, 9, 596. https://doi.org/10.3390/biomimetics9100596

AMA Style

Li Y, Zhang Y. A Reinforcement Learning-Based Bi-Population Nutcracker Optimizer for Global Optimization. Biomimetics. 2024; 9(10):596. https://doi.org/10.3390/biomimetics9100596

Chicago/Turabian Style

Li, Yu, and Yan Zhang. 2024. "A Reinforcement Learning-Based Bi-Population Nutcracker Optimizer for Global Optimization" Biomimetics 9, no. 10: 596. https://doi.org/10.3390/biomimetics9100596

APA Style

Li, Y., & Zhang, Y. (2024). A Reinforcement Learning-Based Bi-Population Nutcracker Optimizer for Global Optimization. Biomimetics, 9(10), 596. https://doi.org/10.3390/biomimetics9100596

Article Menu

A Reinforcement Learning-Based Bi-Population Nutcracker Optimizer for Global Optimization

Abstract

1. Introduction

2. Preliminaries

2.1. Nutcracker Optimization Algorithm

2.1.1. Foraging and Storage Strategies

2.1.2. Cache Search and Recovery Strategies

2.2. Reinforcement Learning

3. The Development of the Proposed Algorithm

3.1. Overview

3.2. ROBL-Based Foraging Strategy

3.3. Q-Learning-Based Exploitation Behavior

3.3.1. State Encoding

3.3.2. Action Options

3.3.3. Reward Options

3.4. Time Complexity

4. Experimental Results

4.1. Test Conditions

4.2. Comparison over CEC-2014

4.3. Comparison over CEC-2017

4.4. Comparison over CEC-2020

4.5. Analysis of the Q-Table

4.6. Analysis of the RLNOA’s Parameters

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI