Ensembles of Biologically Inspired Optimization Algorithms for Training Multilayer Perceptron Neural Networks

Floria, Sabina-Adriana; Gavrilescu, Marius; Leon, Florin; Curteanu, Silvia

doi:10.3390/app12199997

Open AccessArticle

Ensembles of Biologically Inspired Optimization Algorithms for Training Multilayer Perceptron Neural Networks

by

Sabina-Adriana Floria

¹

,

Marius Gavrilescu

¹,

Florin Leon

^1,*

and

Silvia Curteanu

²

¹

Department of Computer Science and Engineering, Faculty of Automatic Control and Computer Engineering, “Gheorghe Asachi” Technical University of Iasi, Bd. Mangeron, No. 27, 700050 Iasi, Romania

²

Department of Chemical Engineering, Faculty of Chemical Engineering and Environmental Protection, “Gheorghe Asachi” Technical University of Iasi, Bd. Mangeron, No. 73, 700050 Iasi, Romania

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(19), 9997; https://doi.org/10.3390/app12199997

Submission received: 11 September 2022 / Revised: 29 September 2022 / Accepted: 3 October 2022 / Published: 5 October 2022

Download

Browse Figures

Versions Notes

Abstract

:

Artificial neural networks have proven to be effective in a wide range of fields, providing solutions to various problems. Training artificial neural networks using evolutionary algorithms is known as neuroevolution. The idea of finding not only the optimal weights and biases of a neural network but also its architecture has drawn the attention of many researchers. In this paper, we use different biologically inspired optimization algorithms to train multilayer perceptron neural networks for generating regression models. Specifically, our contribution involves analyzing and finding a strategy for combining several algorithms into a hybrid ensemble optimizer, which we apply for the optimization of a fully connected neural network. The goal is to obtain good regression models for studying and making predictions for the process of free radical polymerization of methyl methacrylate (MMA). In the first step, we use a search procedure to find the best parameter values for seven biologically inspired optimization algorithms. In the second step, we use a subset of the best-performing algorithms and improve the search capability by combining the chosen algorithms into an ensemble of optimizers. We propose three ensemble strategies that do not involve changes in the logic of optimization algorithms: hybrid cascade, hybrid single elite solution, and hybrid multiple elite solutions. The proposed strategies inherit the advantages of each individual optimizer and have faster convergence at a computational effort very similar to an individual optimizer. Our experimental results show that the hybrid multiple elite strategy ultimately produces neural networks which constitute the most dependable regression models for the aforementioned process.

Keywords:

ensembles; neural networks; optimization algorithms; neuroevolution

1. Introduction

Neuroevolution consists of using evolutionary algorithms in training artificial neural networks. Unlike traditional, gradient-based training methods, neuroevolution can optimize the parameters of a neural network, i.e., weights and biases, and its hyperparameters, i.e., the number of hidden layers, activation functions, learning rate, etc. Neuroevolution is also suitable for both supervised and reinforcement learning applications.

In this paper, feed-forward neural networks are trained using biologically inspired optimization algorithms analyzed in previous work [1], and we propose different strategies for combining these algorithms. According to the No Free Lunch (NFL) theorem [2], there is no algorithm that can provide superior performance to all other techniques in solving all optimization problems [3,4]. Motivated by this theorem, we propose using different algorithms to learn the architecture, weights, and biases of a multilayer perceptron (MLP), in order to generate suitable regression models for the process of the free radical polymerization of methyl methacrylate (MMA).

Polymerization processes are generally notorious for the difficulty in finding suitable regression models to properly characterize them and to allow for making reliable predictions. Aside from the complexity of the underlying reactions, the phenomenology behind these processes is often not fully understood. As such, approximations of the actual phenomena have to be made when analyzing them, which adversely affects the accuracy and convergence of most conventional regression methods. Furthermore, the related mathematical models are themselves, of high complexity, which causes difficulties when solving them, requiring considerable computational resources and making them unusable in online control and optimization scenarios. Under such circumstances, empirical modeling is often the preferred approach.

In this context, MMA is among the more difficult to model chemical processes, consisting of complex reactions which cause difficulties in building a phenomenological model based on mass and energy balances. In addition to the multitude of elementary reactions and species, a volume contraction also takes place during the process. Moreover, the significant increase in viscosity from a certain moment of the reaction determines the decrease in the diffusion rate of the polymer and monomer molecules. In free radical polymerization, such diffusional aspects (gel and glass effects) must be quantified by relations which should render the variation of the propagation and termination rate constants with the conversion. This is a difficult part to model, especially because not all the aspects of these phenomena have been completely explained.

In order to overcome the difficulties related to the modeling of the controlled diffusion phenomena, neural networks prove to be suitable tools for modeling, provided that they are developed in a near-optimal version for both structure and parameters.

The optimization goal is to find MLPs with the best prediction performance, i.e., the minimization of prediction error, on the MMA data set (conversion and molecular masses depending on reaction conditions). The optimization algorithms we use are the Football Game Algorithm (FGA) [5], Imperialist Competitive Algorithm (ICA) [6], Simple Human Learning Optimization (SHLO) [7], Social Learning Optimization (SLO) [8], Teaching-Learning-Based Optimization (TLBO) [9], Viral System (VS) [10], and Virulence Optimization Algorithm (VOA) [11].

The main contribution of the paper is finding a neural network optimizer that generates optimal neural network-based regression models for a proper representation of the MMA process. Our studies show that conventional regression models fail to properly achieve this task. Consequently, we propose a multi-step process that ultimately analyses and tests several optimization algorithms and combines them into ensembles of optimizers using three proposed strategies: hybrid cascade, hybrid single elite solution, and hybrid multiple elite solutions. We also use an initial search procedure to determine the best parameter values for each algorithm considered in this paper. Based on the result provided by the search procedure, we identify the three best-performing algorithms and use them in each proposed ensemble strategy. Each individual algorithm is run for a longer period, while the algorithms in an ensemble are run for a significantly shorter time. For each simulation scenario, we collect performance statistics at specific iterations to perform a fair comparison between strong individual optimizers and the proposed ensembles of weak optimizers. As we demonstrate in Section 4, the most suitable model for our problem results from a combination of three biologically inspired optimizers using the hybrid multiple elite solutions strategy. To our knowledge, this is the first instance when multiple biologically inspired optimization algorithms have been combined in such a manner, for the study of complex chemical processes such as MMA.

The paper is structured as follows: Section 2 presents a brief review of some of the more notable results from the related literature; in Section 3 we provide a detailed description of our method: the dataset containing the experimental values pertaining to our problem, the encoding approach for transforming the neural network parameters into a format usable by an optimization algorithm, as well as the strategies for tuning, selecting and combining several algorithms into ensembles; Section 4 presents our experimental results, where we compare the best-performing algorithms with their various ensemble arrangements so as to find the approach that provides the best regression models for our problem; the paper ends with a Conclusions Section where we discuss our contribution and findings and point out the main directions for future work.

2. Related Work

In recent years, neuroevolution has received particular attention, and numerous works present successful applications in different fields, such as chemistry [12,13,14], medicine [15,16,17], and games [18,19]. However, one cannot identify a single evolutionary optimization technique that generally leads to the best results. Therefore, various algorithms for training neural networks are proposed in the literature [20,21,22,23,24,25]. For example, in [26], a new implementation for the Clonal Selection Algorithm (CSA) is proposed, which is used in training MLP neural networks. To significantly increase the classification accuracy of MLPs, CSA is used to find the optimal weights and biases. The proposed approach is compared with other training methods on five data sets, and the obtained results show that the new approach is a competitive method for training MLPs. A new learning strategy based on neuroevolution for designing and training optical neural networks (ONNs) is proposed in [27]. The authors use Genetic Algorithms (GA) and Particle Swarm Optimization (PSO) algorithms to determine the hyperparameters of ONNs and optimize the connection weights. Experimental results show that the proposed strategy is competitive with traditional learning algorithms such as Stochastic Gradient Descent (SGD) and Adjoint Variable Method (AVM). In [28], the problem of remaining useful life (RUL) prediction using Spiking Neural P (SN P) systems is addressed. The authors use the Neuro-Evolution of Augmenting Topologies (NEAT) algorithm to optimize the structure and parameters of SN P systems. The results show that the proposed approach provides a reasonable trade-off between performance and the number of trainable parameters. Additionally, in [29], the Moth–Flame Optimization (MFO) algorithm is used to train MLP networks. An autonomous navigation robot data set is used, and MFO is used as an optimizer to find the optimal weights and biases. The obtained results show the exploration and exploitation capabilities of MFO in comparison with other methods.

However, in the context of a prediction problem, it is known that ensemble-based approaches tend to have better accuracy, efficiency, and flexibility than approaches using a single classifier [30]. Due to these advantages, the use of ensembles has been addressed in the field of evolutionary algorithms.

A new neuroevolutionary model with quantum inspiration, called NEVE (Neuroevolutionary Ensemble), is proposed in [31] and is based on an ensemble of MLP neural networks to learn in nonstationary environments (when data distribution changes over time). Each neural network in NEVE is trained and has its parameters optimized by the quantum-inspired evolutionary algorithm with binary-real representation (QIEA-BR). The authors propose four variations of the NEVE algorithm that are evaluated on both real and synthetic data. The obtained results confirm that the neuroevolutionary ensemble approach is a suitable choice for those problems whose data sets are subject to sudden changes in behavior.

The idea of multiple subpopulations and bagging ensemble is used in [32] to generate new offspring in the multi-objective differential evolution (MODE) algorithm. In the proposed BagMPMODE algorithm, each subpopulation is regarded as a bootstrapped population, and the evolution process of each subpopulation is regarded as a base learner. The idea of cooperation between subpopulations is introduced by randomly sampling a solution from each subpopulation and generating new offspring. Depending on the quality of each subpopulation, specific weights are determined and used in the offspring generation procedure. A randomly selected parent from a better subpopulation has a larger contribution to the genes of the new offspring. Finally, the generated offspring replaces the weakest solution from a randomly selected subpopulation. The authors compare the efficiency of the BagMPMODE algorithm with the version of the algorithm where the bagging-based search is not adopted (MPMODE). Experimental results show that BagMPMODE significantly improves the search efficiency on 20 out of 22 multi-optimization problems compared to MPMODE.

The ensemble learning (EL) based on Adaboost is adopted in [33] for a dynamic multi-objective optimization evolutionary algorithm (DMOEA). Multiple base models are used to predict new populations using a shared population, and the weight associated with each base model is determined based on the prediction error. A strong model is defined based on the weights determined as follows: a base model that has a higher weight has a higher chance of being incorporated into the strong model. At the end of an iteration, the strong model generates a new, improved population used in the next iteration. The proposed EL-DMOEA algorithm combines different strategies and benefits from an improved convergence.

In [34] are proposed different strategies for hybridizing a Genetic Algorithm (GA) with a Genetic Programming (GP) algorithm. The population of GP is regarded as a pool of base classifiers (i.e., arithmetic trees) that are improved during the GP search. However, at different iterations of GP, the authors choose to sample the current population of GP to create multiple subpopulations of a given size. Each subpopulation in GP is regarded as an ensemble of base classifiers and is coded as a single chromosome in GA. The GA search procedure aims to find an ensemble with the best combination of base classifiers. Experimental results show that the proposed hybridization approach and its different strategies provide ensembles of classifiers that significantly outperform the standard GP. It is also interesting that the authors observe degraded performance when ensembles contain a larger number of classifiers (greater than seven), possibly due to increased ensemble complexity.

Four individual niching genetic algorithms are used in [35] to form an ensemble. The authors choose two instantiations of the restricted tournament selection (RTS) and two of the clearing (CLR) algorithms as niching algorithms. The basic principle is to use four parallel populations, where a particular niching algorithm owns a population. Although a specific algorithm handles a population, a collaborative strategy between populations is achieved using a shared pool of newly created offspring from each niching algorithm. The parallel populations are iteratively evolved until a maximum number of function evaluations is reached. Experimental results show that the proposed ensemble scheme locates more optima than any of the individual niching algorithms in most cases.

In [36], an approach that adapts evolution strategies for evolving an ensemble model is presented. A subset of sensor data represents the input data of each model in the ensemble, and neuroevolution is used to optimize the architecture and hyperparameters of each model. Other studies can also show the advantages of using ensemble learning with evolutionary computation [37,38,39].

3. Materials and Methods

Our approach consists in a multi-step process that involves the analysis, tuning, selection, and combination of multiple biologically inspired optimization algorithms. The overall method is depicted in Figure 1. The chemical process considered in this paper is influenced by three inputs. The experimental data are processed within a pipeline involving the following steps:

Analysis of several popular algorithms in terms of their usefulness. This involves a hyperparameter search in order to find the best versions of the algorithms (i.e., the parameter values which result in the lowest RMSE);
Selection of the best algorithms, out of previously tested ones. Out of all algorithms, only the top few are further used;
Incorporation of the selected algorithms into various hybrid ensemble strategies, the purpose being to find the strategy which ultimately leads to optimal neural network architecture, with the potential to provide meaningful predictions for the three outputs of our studied process.

3.1. Data Set

The polymerization process is approached by modeling the conversion and numerical and gravimetrical average molecular masses (three outputs) depending on the reaction conditions: time, initiator concentration, and temperature (three inputs). The data set consists of 3217 samples split into 75% for training and 25% for testing.

Other methodologies have also been tested on this process by our research group. The first series of attempts [40,41] implied the design of neural networks of feedforward type, with satisfactory results for conversion, but not acceptable for molecular weights. A more complex approach, which led to better results [42] was based on combining a simplified phenomenological model with neural networks, obtaining hybrid models. Several modeling modalities were considered, namely the neural networks have replaced different parts of the model—in general, the parts are difficult to model due to diffusion-controlled phenomena (gel and glass effects). The results obtained were much better than the models represented by single neural networks, but also not very satisfactory for gravimetrical molecular weight. Another attempt [43] was based on different regression methods: Large Margin Nearest Neighbor Regression algorithm trained either with an evolutionary algorithm or by gradient descent and Nearest Neighbor Regression with Adaptive Distance Metrics Trained by Multiple Point Hill Climbing on Noisy Training Set Error. Acceptable results were obtained with these methods, but there is still room for improvement.

The main goal of the present approach is to test a series of algorithms, combined with different optimization strategies, for obtaining a near-optimal artificial neural network, thus developing an efficient methodology that can be easily and successfully adapted to other processes (models). MMA polymerization, a real complex process, is a choice suitable for the proposed purpose representing a difficult test for the optimization algorithms.

3.2. Neural Network Modeling

This section describes the use of biologically inspired optimization algorithms for training MLP neural networks. An MLP neural network architecture consists of an input layer, an output layer, and one or more hidden layers, and each of these layers contains a certain number of neurons. Adjacent layers of the MLP are fully connected, and each connection between two neurons has an associated weight. The values of weights and biases are adjusted in a supervised manner using an optimization algorithm. Since we are normalizing the data set in the range [−0.9, 0.9], we choose the hyperbolic tangent activation function (1) for each neuron:

t a n h (x) = \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}}

(1)

We use the notation [ni-nh1- … -no] to describe the architecture of an MLP, where ni represents the number of neurons in the input layer, nh1 represents the number of neurons in the first hidden layer (the number of neurons of this layer is illustrated in bold), and no represents the number of neurons in the output layer. For example, the notation [7-5-2] describes an MLP with seven inputs, five neurons in the hidden layer, and two outputs.

In this paper, we use different algorithms to optimize the architecture, weights, and biases of MLP neural networks to obtain networks with the best prediction performance on the MMA data set. A population-based algorithm uses a set of candidate solutions (i.e., individuals) that evolve through an iterative process. Individuals in the population are created or modified using different operators specific to the algorithm. In our case, a solution represents a neural network, and a population is equivalent to a set of neural networks. An individual’s objective function is evaluated using the root mean square error (RMSE) (2) of the MLP on the training or testing data set. Therefore, minimizing the RMSE value is the primary goal of the algorithm:

R M S E = \sqrt{\frac{1}{n s \cdot n o} \cdot \sum_{i = 0}^{n s} \sum_{j = 0}^{n o} {(d_{i j} - y_{i j})}^{2}}

(2)

where ns is the number of samples in the training or testing data set, no is the number of outputs of the MLP, d_ij is the desired MLP jth output at ith training or testing sample, and y_ij is the actual MLP jth output at ith training or testing sample.

A coding and decoding step is required in training MLP neural networks with such algorithms. The coding step consists of extracting information from an MLP structure and organizing it into a specific representation compatible with a candidate solution used by an algorithm. In this paper, we code the solution as a fixed-length one-dimensional array of real values. A general case of an MLP structure and its associated coded solution is illustrated in Figure 2.

The locations in the solution array illustrated in Figure 2 are described as follows:

nl: the number of hidden layers of the neural network. This number can be different from one network to another and is an integer value in the range [nl_min, nl_max];
nh1: the number of neurons in the first hidden layer. nh1 is an integer value in the range [nh1_min, nh1_max];
nh2: the number of neurons in the second hidden layer. nh2 is an integer value in the range [nh2_min, nh2_max]. This location is only used when nl = 2;
w_i-h₁: the weight values associated with the connections between the neurons of the input layer (i) and the neurons of the first hidden layer (h1). These locations always exist in the solution array because all networks we use have at least one hidden layer;
w_h_1-h2: the weight values associated with the connections between the neurons of the first hidden layer (h1) and the neurons of the second hidden layer (h2). These locations are only used when nl = 2;
w_h_2-o: the weight values associated with the connections between the neurons of the second hidden layer (h2) and the neurons of the output layer (o). These locations are always used in the solution array. If the second hidden layer does not exist (nl = 1), then these locations represent the weights associated with the connections between the neurons of the first hidden layer (h1) and the neurons of the output layer (o);
bh1, bh2, bo: the biases of neurons in each layer. The locations for bh2 are only used when nl = 2.
The MMA data set we use has three inputs (ni = 3) and three outputs (no = 3). Therefore, we define certain limits on the number of neurons in each layer of the neural network as follows:
nl ∈ [nl_min = 1, nl_max = 2]: neural networks always have at least one hidden layer and at most two hidden layers. These limits are recommended in [44];
nh1 ∈ [nh1_min = 7, nh1_max = 12]: in the first hidden layer, the neural networks can have a minimum of seven and a maximum of twelve neurons. These limits are recommended in [45] as nh1_min = 2 ⋅ ni + 1 and nh1_max = no ⋅ (ni + 1);
nh2 ∈ [nh2_min = 2, nh2_max = 4]: in the second hidden layer, neural networks can have a minimum of two and a maximum of four neurons. These limits are recommended in [45] as nh1_min = 3 ⋅ nh2_min and nh1_max = 3 ⋅ nh2_max (i.e., the number of neurons in the second hidden layer is three times smaller than the number of neurons in the first hidden layer);
The values of weights and biases are real values in the range [−3, 3].

Limiting the number of neurons in each layer of the neural network implies limiting the search space of the algorithms. Depending on the defined limits and the characteristics of the data set, the maximum length of the solution array is determined using the Equation (3), where the value 3 represents the locations assigned to the structural information (i.e., nl, nh1, nh2), nw_max and nb_max represents the maximum number of locations assigned for the weights and biases, ni in the Equation (4) represents the number of input features, and no in the Equations (4), (5) represents the number of output features specific to the data set:

len = 3 + nw_max + nb_max

(3)

n w_{m a x} = {\begin{array}{l} n i \cdot n h 1_{m a x} + n h 1_{m a x} \cdot n o, & n l = 1 \\ n i \cdot n h 1_{m a x} + n h 1_{m a x} \cdot n h 2_{m a x} + n h 2_{m a x} \cdot n o, & n l = 2 \end{array}

(4)

n b_{m a x} = {\begin{array}{l} n h 1_{m a x} + n o, & n l = 1 \\ n h 1_{m a x} + n h 2_{m a x} + n o, & n l = 2 \end{array}

(5)

The limits chosen for each location of the solution array are constant throughout the simulations. The operators of the algorithms can change the value of any location in the array. However, these changes are limited by the predefined range of each location.

Since the coding has a fixed-length representation, there may be unused locations for some neural network structures. For example, consider the case of MLPs with one input (ni = 1), one hidden layer (nl = 1), at most four neurons in the hidden layer (nh1_max = 4), and one output (no = 1). Based on these bounds, the MLPs [1-4-1] and [1-2-1] are associated with an array of the same length (Figure 3). One can see in Figure 3 that MLP [1-2-1] contains unused locations (grey colored) for some weights and biases. A particular case can occur when two solutions must be combined. Since the algorithms do not check for invalid or unused locations, the combination of two individuals can unintentionally behave as a mutation operator when the value of a used location (i.e., meaningful data) is combined with the value of an unused location (i.e., noise).

3.3. Strategies for Combining Optimization Algorithms

In the following, different strategies are proposed for combining the algorithms in an ensemble of optimizers. We use the original implementation of each optimizer except the TLBO algorithm. In general, we observed long simulation times in the case of TLBO and introduced a slight change in the algorithm to minimize the number of interactions between individuals. More specifically, in the teacher phase of TLBO, we allow the teacher to interact with only 30% of random students (instead of all students), and we apply the same constraint in the learner phase.

In the proposed ensembles of optimizers, we use some basic procedures that handle a single solution or a set of solutions. Although these procedures may vary from one strategy to another, their basic principles are described as follows:

Solution fetching procedure: one or more solutions are fetched from an algorithm that has reached its termination criterion;
The procedure for applying the mutation operator: some genes of one or more fetched solutions are modified;
Solution transfer procedure: a set of solutions (i.e., population) is provided as the initial population to an algorithm. In this procedure, we ensure that the population provided to the algorithm has a size compatible with the algorithm configuration.

The procedures described involve minimal changes to the existing implementation of the algorithms in [1]. Specifically, each algorithm is adapted to accept an input population as the initial population. The input population size must be compatible with the algorithm configuration. If no population of individuals is provided at the input of the algorithm, then the initial population is created according to the procedure in the algorithm (i.e., a population of randomly generated individuals).

Regarding the ensemble models in this paper, we use only the three best-performing algorithms in all the proposed strategies. Our reasoning is to eliminate those less efficient algorithms for the given problem to maximize the performance of an ensemble. We believe this is a good approach because the most suitable algorithms will be used depending on the given optimization problem.

Moreover, we propose to use a minimal number of iterations for any algorithm used in an ensemble. This approach is based on the popular idea of an ensemble of weak learners [46]. In our case, a weak learner is correlated with a weak optimizer, the generation of a data subset is correlated with the generation of a population, and the combined responses of the weak learners are correlated with an improved population containing the best solutions from the weak optimizers. The main focus is to use an ensemble of weak optimizers and compare the performance of the ensemble with the performance of strong individual optimizers. The motivation of this approach is that an ensemble of weak optimizers could lead to better convergence.

3.3.1. Choosing the Best Performing Algorithms

The step of choosing the algorithms with the best performance consists of a search procedure for the best parameter values for each of the implemented algorithms (FGA, ICA, SHLO, SLO, TLBO, VS, and VOA). This step is performed only once, and the best parameter values found are used in all subsequent runs.

The search procedure randomly generates n_conf sets of values for the algorithm parameters. We choose a predefined range of values for each parameter in a single algorithm. Based on each parameter’s predefined range of values, we randomly generate a set of parameter values for the chosen algorithm. However, we keep only that set for which the algorithm provides the solution with the best objective function value. Since the algorithms are based on random events, it is expected that the same set of parameter values will provide slightly different solutions. Therefore, we perform several runs (n_runs) for a single set of parameter values to obtain average results.

After determining the three best-performing algorithms, we use them in various ensemble strategies described in the following sections.

3.3.2. Hybrid Cascade Strategy

In the hybrid cascade strategy (Figure 4), we propose that a single solution be sequentially transferred from one algorithm to another. The initial solution is randomly generated and then used to generate a population of N individuals (population_1). We choose N to have the same value as the population size parameter of the first algorithm in the sequence (Algorithm_1). In this strategy, the procedure to generate a population is suggestively called GeneratePopulation_1-to-N to reflect that we are using a single individual in generating a population of N individuals. In the first step of this procedure, we perform a simple cloning operation of the initial solution N times. In the last step, we apply a mutation operator to each cloned solution, thus obtaining a population of modified individuals. The mutation operator we use is identical for all proposed ensemble strategies, and we describe it in the Mutation Operator section. However, it is worth mentioning that applying the mutation operator to the entire population can lead to the loss of the global best solution. Therefore, we always use elitism, i.e., the best individual is copied directly into the next population provided by the GeneratePopulation_1-to-N procedure. Elitism applies to any GeneratePopulation procedure used in other ensemble strategies.

The first population generated is provided to the first algorithm, which is run with a minimal number of iterations. At the end of the run, we take the best solution from the first algorithm and use it to generate a new population for the second algorithm. This sequence continues until the best solution is obtained from the last algorithm used in the hybrid cascade strategy. A complete run of this sequence is correlated with a single ensemble iteration. The desired number of ensemble iterations (ensemble_iter) is performed by repeatedly running the described sequence and transferring the best solution from one ensemble iteration to the next.

3.3.3. Hybrid Single Elite Solution Strategy

In the hybrid single elite solution strategy (Figure 5), we propose that an algorithm does not depend on the results of other algorithms at the same ensemble iteration. However, at the end of an ensemble iteration, we take the best solution from each algorithm in the ensemble to define an improved population. Thus, a new ensemble iteration will use the previous improved population as the initial population. The suggestive name of the “single elite solution” strategy is derived from the fact that we take only one solution (i.e., the elite) from each algorithm to define an improved population.

In the hybrid single elite solution strategy, an ensemble iteration always starts from a population of three individuals. Therefore, the population generation procedure is suggestively called GeneratePopulation_3-to-N. If a different number of algorithms are used in the ensemble, e.g., NoAlg ≥ 2, the population generation procedure is adapted for the general use case, i.e., GeneratePopulation_NoAlg-to-N. In this procedure, we propose that the cloning operation of the initial solutions is influenced by their objective function value. Let the initial solutions be s₁, s₂, …, s_n and their objective function values f₁, f₂, …, f_n. We use the objective function values f₁, f₂, …, f_n to determine the weights w₁, w₂, …, w_n associated with the initial solutions using the softmax function:

w_{i} = \frac{e^{- f_{i}}}{\sum_{k = 1}^{n} e^{- f_{k}}}

(6)

Since the goal is to minimize the RMSE value of an MLP, in (6) we use the objective function value with a negative sign to obtain higher weights for solutions with a lower objective function value. The obtained weights determine the number of cloning operations for each initial solution. In other words, we clone a larger number of fitter solutions and a smaller number of less fitted solutions. For example, if the initial solutions s₁, s₂, s₃ have the objective function values f₁ = 0.01, f₂ = 0.5, f₃ = 0.9, then their weights are approximatively w₁ = 0.5, w₂ = 0.3, w₃ = 0.2. Therefore, for a population size of N = 100 individuals, a population of 50 s₁ clones, 30 s₂ clones, and 20 s₃ clones will be generated.

In the last step of the GeneratePopulation_NoAlg-to-N procedure, we apply the mutation operator to each cloned solution to obtain the final population.

One can see in Figure 5 that the population generation procedure is used independently for each algorithm. The main reason we generate different populations is that each algorithm requires a population of a different size. Additionally, generating different populations leads to better diversity that could improve ensemble performance.

3.3.4. Hybrid Multiple Elite Solution Strategy

The third proposed ensemble strategy is similar to the hybrid single elite solution strategy, but the population generation procedure differs. In the hybrid multiple elite solution strategy (Figure 6), we use a larger initial population, and the population generation procedure is based on the bagging technique [47]. More precisely, in the procedure called GeneratePopulation_Bagging, we randomly sample N solutions from the initial population (sampling with replacement). The obtained intermediate population (i.e., bootstrapped population) is then modified by mutating each individual to obtain the final population.

In this paper, we use an ensemble of three algorithms. After the termination criterion of the algorithms is reached at the end of an ensemble iteration, the improved population is created based on the performance of each algorithm. We propose that the improved population contains 15% elites from the best algorithm, 10% elites from the second best algorithm, and 5% elites from the worst algorithm. Since each algorithm has a specific population size, the size of the improved population may vary when the algorithms are sorted according to their performance in a new order (i.e., the algorithms may perform differently on new iterations). Similar to the previous strategy, the improved population in the current ensemble iteration becomes the initial population in the next iteration.

3.3.5. Mutation Operator

The mutation operator in all population generation procedures consists of modifying individuals’ genes using a Gaussian random number. The mutation we use has a certain chance of altering an individual’s genes, thereby improving the diversity of the population. The mutation operator proposed in this paper is described in the following Algorithm 1:

Algorithm 1 The mutation operator.

Mutate-individual
inputs: x: the individual’s genes
mutProb: the individual’s chance to be mutated
mutGain: control factor for genes mutation chance
$σ^{2}$ : variance
outputs: x*: the resulted individual
----------------------------------------------------------------------------------
// check if the individual should be mutated or not
if mutProb < rand() then
x* $\leftarrow$ x // no mutation performed
else
// check which genes should be mutated
geneMutProb $\leftarrow m u t G a i n / n o G e n e s$
foreach gene in x
if geneMutProb > rand() then
x*[gene] $\leftarrow$ x[gene] + Gaussian(gene, $σ^{2}$ )
return x*

The terms used in Pseudocode 1 are described as follows:

mutProb represents the probability that the mutation operator is applied to the given individual;
mutGain represents a factor that controls the mutation probability of each gene;
Gaussian(gene, $σ^{2}$ ) is a Gaussian random number with a mean equal to the gene value and variance $σ^{2}$ ;
rand() is a uniformly distributed random number in the range [0, 1).

The parameter values we use in the mutation operator are mutProb = 0.5, mutGain = 2 and

σ

= 1.

4. Experiments and Results

In this section, we present the experiments performed and evaluate the MLP training efficiency of strong individual optimizers and ensembles of weak optimizers. Performance evaluation is completed using the root mean square error (RMSE) on the training and testing samples data set. We determine the RMSE at specific iterations to obtain a convergence curve graph during the optimization process.

Our original implementation for MLP allows us to access parameter information from the neural network easily. The C# implementation of the algorithms from paper [1] is used, and for the scope of this paper, we extend this implementation to training MLPs with these algorithms. We also implement the three ensemble strategies proposed in this work: hybrid cascade, hybrid single elite solution, and hybrid multiple elite solutions.

The experimental simulations are performed on machines with different specifications, and the execution time cannot be fairly evaluated. For this reason, we consider the number of evaluations as an index of the processing power used for a given individual optimizer or ensemble of optimizers. An evaluation counter is incremented each time an optimizer uses the objective function. We observed that the objective function is the most time-consuming procedure, with ~98% processing time, which is similar to the timing profile of NEAT [48]. In this function, a solution array is decoded into an MLP structure, and then the RMSE of the MLP outputs is calculated using the training or testing samples.

The data set consists of 3217 samples split into 75% (2412) training samples and 25% (805) testing samples.

Our experimental setup consists of two main steps, as illustrated in Figure 7. In the first step, we randomly search the parameter values for each algorithm. The search procedure is used to identify the three best-performing algorithms, and the best parameters found for these algorithms will be used in all subsequent simulations. We choose a fixed number of 300 iterations for each algorithm, while the other parameters can vary. The configuration of the search procedure is n_conf = 50 random sets of parameter values for each algorithm and n_runs = 2 independent runs to average the results. Since the search procedure is time-consuming, we do not use a larger number for n_runs.

The parameter limits for an algorithm are appropriately chosen to avoid inconsistent combinations of parameter values. For example, in the ICA algorithm, no. empires must not exceed pop. size, but the chosen limits do not allow such an inconsistent combination to be generated during the search procedure. Table A1, Table A2, Table A3, Table A4, Table A5, Table A6 and Table A7 in the Appendix A show the limits we choose for each parameter of an algorithm and the results of the search procedure: the best parameter values, the average RMSE calculated on the training samples, the average number of evaluations, and the structure of the best neural networks.

Based on Table A1, Table A2, Table A3, Table A4, Table A5, Table A6 and Table A7, we show in Table 1 the algorithms sorted (from best to worst) according to their performance using the RMSE value averaged over n_runs = 2 runs. One can see that the first three best-performing candidates are ICA, TLBO and SHLO. Although the algorithms presented in Table 1 show a different number of evaluations, in the present paper we do not investigate the causes that lead to these obtained values (e.g., parameter settings or early stopping conditions). It is not the purpose of this work to present a detailed comparison between the individual algorithms. Therefore, we choose three algorithms for which we obtain the lowest mean RMSE value (using the settings presented in Table A1, Table A2, Table A3, Table A4, Table A5, Table A6 and Table A7) and compare their individual performance with the ensembles created with the same algorithms.

In the second step of our experiments, we choose ICA, TLBO, and SHLO as the base optimizers. In Figure 7, we illustrate that we perform six main simulations. In the first three simulations, each algorithm is run using 300 iterations. Since this is a large number of iterations, we correlate an algorithm with this configuration with a strong individual optimizer. The other three simulations represent the run of each proposed ensemble strategy. A base optimizer is run with five iterations in a single ensemble iteration, and we correlate it with a weak optimizer. Therefore, a single ensemble iteration is equivalent to 15 cumulative iterations from three weak optimizers. We choose 20 ensemble iterations for each ensemble strategy to obtain a cumulative number of 300 iterations (i.e., the same number of iterations as a strong individual optimizer). Our reasoning is to perform a fair comparison between strong individual optimizers and ensembles of weak optimizers. We run all six simulations with thirty independent runs to obtain statistically meaningful results.

Figure 8 shows the convergence curve graphs of all six simulations obtained on the training and test samples. The curve graphs of the individual optimizers ICA, TLBO, and SHLO are shown with continuous lines, while the curve graphs of the ensemble strategies are shown with dashed lines. We illustrate on the x-axis the current number of an ensemble iteration. In the case of individual optimizers, one ensemble iteration is equivalent to fifteen individual iterations. Therefore, we collect performance statistics from individual optimizers every 15 iterations. On the y-axis, we show the RMSE obtained at each ensemble iteration (i.e., at every 15 iterations of the individual ICA, TLBO, and SHLO algorithms) averaged over 30 independent runs.

We can see in Figure 8 that the convergence curve graphs for the training and test samples are very similar in shape but have different scales. During the first ten ensemble iterations, all three ensemble strategies outperformed the individual optimizers in convergence speed. It is interesting to note that hybrid cascade ensemble has better convergence than hybrid single elite solution. However, these ensembles have similar performances after the 10th ensemble iteration. The best overall performance is observed for the hybrid multiple elite solutions ensemble, outperforming all individual algorithms and the other two ensemble strategies. A hybrid multiple elite solutions ensemble is an extension of hybrid single elite solution because it uses a larger population of elites. Therefore, we argue that the increased performance of the hybrid multiple elite solutions ensemble comes from creating a more diverse improved population at the end of each ensemble iteration.

The individual ICA optimizer has a slow convergence, but after 300 iterations, it achieves a performance similar to that of the hybrid multiple elite solutions ensemble (at ensemble iteration 20). However, for some problems, 300 iterations might be a large value, and choosing the multiple elite hybrid solution strategy is preferred.

In Table 2, we present the detailed simulation results for the individual ICA, TLBO, and SHLO optimizers at iteration 300 and the results of the proposed ensemble strategies at ensemble iteration 20. We can see that the hybrid multiple elite solutions ensemble provides the smallest errors with a mean RMSE train of 0.00582 and a mean RMSE test of 0.01029. The same ensemble provides the best solutions with an RMSE train of 0.00517 and an RMSE test of 0.00932. The number of evaluations of the hybrid multiple elite solutions ensemble (39,245) is competitive with the number of evaluations of the individual optimizers ICA (37,159), TLBO (40,424), and SHLO (40,033). Therefore, the faster convergence and smaller errors provided by the hybrid multiple elite solutions ensemble make this strategy the preferred choice.

We can see in Table 2 that each ensemble strategy has roughly the same number of evaluations, i.e., 39,200. This is an expected result because we use the same configuration of weak optimizers in all ensemble strategies. However, we emphasize that the value of 39,200 is the approximate average of the number of evaluations from the individual optimizers ICA (37,159), TLBO (40,424), and SHLO (40,033). Since we use an equal number of iterations (i.e., five) for each weak optimizer in all ensemble strategies, each ensemble inherits the average computational effort of the individual optimizers. However, choosing an unbalanced number of iterations for weak optimizers of an ensemble is an easy way to control the trade-off between inherited computational effort and inherited performance. To investigate the impact of an unbalanced number of iterations for weak optimizers, we present in Figure 9 the convergence curve graphs of each ensemble strategy for the following scenarios:

Scenario 1: five iterations are used for each weak optimizer in all ensemble strategies. This is the original experiment setup, and we use it as a baseline comparison. The convergence curve graphs for this scenario are shown in Figure 9 with continuous lines and are labeled Cascade-balanced, SingleElite-balanced, and MultipleElite-balanced;
Scenario 2: nine iterations for ICA weak optimizer and three iterations for TLBO and SHLO are used. In this scenario, we want the ensembles to inherit, to a greater extent, the characteristics of the best optimizer, i.e., ICA. The convergence curve graphs for this scenario are shown in Figure 9 with dashed red lines and are labeled Cascade-ICA, SingleElite-ICA, and MultipleElite-ICA;
Scenario 3: nine iterations for TLBO weak optimizer and three iterations for ICA and SHLO are used. In this scenario, we want the ensembles to inherit, to a greater extent, the characteristics of the second best optimizer, i.e., TLBO. The convergence curve graphs for this scenario are shown in Figure 9 with dashed blue lines and are labeled Cascade-TLBO, SingleElite-TLBO, and MultipleElite-TLBO;
Scenario 4: nine iterations for SHLO weak optimizer and three iterations for ICA and TLBO are used. In this scenario, we want the ensembles to inherit, to a greater extent, the characteristics of the worst optimizer, i.e., SHLO. The convergence curve graphs for this scenario are shown in Figure 9 with dashed gray lines and are labeled Cascade-SHLO, SingleElite-SHLO, and MultipleElite-SHLO.

Note that we use a cumulative number of 15 iterations for the weak optimizers in all scenarios. We choose this number of iterations for convenience because the results for Scenario 1 are already available, and new simulations are unnecessary. We also show in Figure 9 only the convergence curve graphs on train samples because the curve graphs on test samples are very similar in shape but have different scales.

We can see in Figure 9a,b that the Cascade-ICA and SingleElite-ICA ensembles inherit slower convergence from the ICA individual optimizer. The Cascade-TLBO and SingleElite-TLBO ensembles inherit faster convergence from the TLBO individual optimizer, but the Cascade-SHLO and SingleElite-SHLO ensembles show no promising improvement. Therefore, we can say that Cascade and SingleElite ensembles are sensitive to an unbalanced number of iterations for weak optimizers.

On the other hand, we can see in Figure 9c that the MultipleElite-ICA ensemble has a better convergence in the first seven ensemble iterations compared to the balanced ensemble, which is not an intuitive outcome. In the case of the MultipleElite-TLBO and MultipleElite-SHLO ensembles, we observe slightly poorer performance in the first seven ensemble iterations. However, we can see that the MultipleElite ensemble is stable because MultipleElite-ICA, MultipleElite-TLBO, and MultipleElite-SHLO perform similarly to MultipleElite-balanced after ensemble iteration 7. The performance stability of the MultipleElite ensemble provides the advantage of using an unbalanced number of iterations. In other words, in the MultipleElite ensemble, we can favor the weak optimizer with less computational effort without losing ensemble performance. We show in Table 3 the performance of each ensemble in the four scenarios (at ensemble iteration 20).

Based on the results in Table 3, we can see that the MultipleElite ensemble performs similarly for balanced and unbalanced number of iterations for weak optimizers. Additionally, the MultipleElite-ICA ensemble inherits a lower computational effort (i.e., 38,261) from the individual optimizer ICA (37,159). The presented results demonstrate that our proposed hybrid multiple elite solutions ensemble is a simple and promising ensemble strategy that benefits from the best performance of the base optimizers in terms of accuracy and computational effort.

5. Conclusions

In this paper, we addressed the concept of neuroevolution, where the algorithms FGA, ICA, SHLO, SLO, TLBO, vs., and VOA were used to find the architecture, weights, and biases of neural networks. Using a search procedure, we identified three best-performing algorithms and combined them using three proposed ensemble strategies: hybrid cascade, hybrid single elite solution, and hybrid multiple elite solutions. The proposed ensemble strategies are easy to implement because they do not involve changes in the logic of the algorithms. Instead, the proposed strategies involve simple methods of generating populations and transferring them to the algorithms.

We used the MMA dataset to train neural networks with variable structures using each individual algorithm and the proposed ensembles constructed with the same algorithms. The training performance of individual optimizers was compared with that of the ensemble of optimizers, and we observed that hybrid multiple elite solutions outperformed all optimizers in convergence speed. We believe this performance is due to a more diverse population that we create at each iteration of the hybrid multiple elite solutions ensemble. Using a larger number of iterations, one of the individual optimizers, i.e., ICA, achieves a prediction accuracy similar to that of the ensemble. However, ICA has a very slow convergence. Furthermore, we have analyzed the effect of using an unbalanced computational effort for weak optimizers, and the experimental results demonstrate that a hybrid multiple elite solutions ensemble is a stable strategy. Our experimental results show that the hybrid multiple elite solutions ensemble is the strategy that generates neural network-based regression models that provide the best representation of the underlying process behind MMA, and which have the potential to generate the most dependable related predictions. Furthermore, given that there is no precise phenomenological model for this process, the predictions provided by the optimal neural network prove to be of real use in industrial practice, successfully replacing a series of experiments that consume time, materials, and energy. In addition, the proposed model can be easily introduced in an online control procedure for optimal control. Additionally, to our knowledge, such a neural network optimization strategy has not been used before for such a process, while performing better than other well-established biologically inspired optimizers.

A future research direction is to evaluate the performance of the proposed ensembles of optimizers on a larger number of optimization problems. The proposed ensemble strategies can be seen as adaptive optimizers that combine the characteristics of individual optimizers. Therefore, we believe this approach would provide competitive results because an ensemble is always constructed with a subset of optimizers performing better on a given optimization problem.

Author Contributions

Conceptualization, S.C. and F.L.; methodology, S.-A.F., F.L. and M.G.; software, S.-A.F. and M.G.; validation, S.C. and F.L.; investigation, S.-A.F.; writing—original draft preparation, S.-A.F., S.C., F.L. and M.G.; writing—review and editing, S.-A.F., S.C., F.L. and M.G.; funding acquisition, S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by UEFISCDI Romania, Exploratory Research Project PN-III-P4-ID-PCE-2020-0551, no. 91/2021.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data used for the experimental studies are available on request.

Acknowledgments

This work was supported by Exploratory Research Project PN-III-P4-ID-PCE-2020-0551, no. 91/2021, financed by UEFISCDI Romania.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

The best results provided by the search procedure for each algorithm are shown in Table A1, Table A2, Table A3, Table A4, Table A5, Table A6 and Table A7.

Table A1. The best results provided by the search procedure for the Football Game algorithm.

Football Game Algorithm		Limits		Best Values	Mean RMSE	Mean No. Evaluations	Best NN Structure
Football Game Algorithm		Min	Max	Best Values	Mean RMSE	Mean No. Evaluations	Best NN Structure
Parameters	no. iterations	300	300	300	0.01020	42,8156	[3-8-2-3]
	no. players	35	50	39
	max. strategies	10	30	21
	flv reduction	0.3	1	0.91452
	hrlv reduction	0.3	1	0.91280
	(1)

Table A2. The best results provided by the search procedure for the Imperialist Competitive algorithm.

Imperialist Competitive Algorithm		Limits		Best Values	Mean RMSE	Mean No. Evaluations	Best NN Structure
Imperialist Competitive Algorithm		Min	Max	Best Values	Mean RMSE	Mean No. Evaluations	Best NN Structure
Parameters	no. iterations	300	300	300	0.005810	37,202	[3-11-3-3]
	pop. size	50	200	109
	no. empires	5	20	18
	alpha	0.1	1.5	0.44226
	beta	0.1	2	0.76682
	prob. revolution	0.03	0.2	0.14058
	mu	0.01	0.1	0.07262
	zeta	0.01	0.3	0.16426

Table A3. The best results provided by the search procedure for the Simple Human Learning Optimization algorithm.

Simple Human Learning Optimization		Limits		Best Values	Mean RMSE	Mean No. Evaluations	Best NN Structure
Simple Human Learning Optimization		Min	Max	Best Values	Mean RMSE	Mean No. Evaluations	Best NN Structure
Parameters	no. iterations	300	300	300	0.00643	40,033	[3-7-2-3]
	pop. size	50	150	133
	no. bits	32	32	32

Table A4. The best results provided by the search procedure for the Social Learning Optimization algorithm.

Social Learning Optimization		Limits		Best Values	Mean RMSE	Mean No. Evaluations	Best NN Structure
Social Learning Optimization		Min	Max	Best Values	Mean RMSE	Mean No. Evaluations	Best NN Structure
Parameters	pop. size	20	100	98	0.00663	60,998	[3-9-2-3]
	no. generations	300	300	300
	amplification factor	0.3	1	0.64947
	crossover rate	0.3	1	0.84117
	lambda	1	10	5
	delta	1	10	8

Table A5. The best results provided by the search procedure for the Teaching Learning Based Optimization algorithm.

Teaching Learning Based Optimization		Limits		Best values	Mean RMSE	Mean No. evaluations	Best NN structure
Teaching Learning Based Optimization		Min	Max	Best values	Mean RMSE	Mean No. evaluations	Best NN structure
Parameters	no. iterations	300	300	300	0.00612	40,424	[3-7-3]
Parameters	pop. size	100	500	224	0.00612	40,424	[3-7-3]

Table A6. The best results provided by the search procedure for the Viral System algorithm.

Viral System		Limits		Best Values	Mean RMSE	Mean No. Evaluations	Best NN Structure
Viral System		Min	Max	Best Values	Mean RMSE	Mean No. Evaluations	Best NN Structure
Parameters	no. iterations	300	300	300	0.01179	188	[3-10-3-3]
	no. cells	10	150	145
	plt	0.1	1	0.10409
	pi	0.1	1	0.30378
	pr	0.1	1	0.65739
	pan	0.1	0.5	0.13174
	lnr init	5	15	10
	lit init	5	15	7
	max. neighborhooddistance	5	15	14.37135
	no. max. converge solutions	1	5	3
	convergence epsilon	0.001	0.015	0.00802

Table A7. The best results provided by the search procedure for the Virulence Optimization algorithm.

Virulence Optimization Algorithm		Limits		Best Values	Mean RMSE	Mean No. Evaluations	Best NN Structure
Virulence Optimization Algorithm		Min	Max	Best Values	Mean RMSE	Mean No. Evaluations	Best NN Structure
Parameters	no. iterations	300	300	300	0.01265	5558	[3-11-3-3]
	no. cells	10	150	84
	no. initial viruses	5	15	13
	prob. to mutate	0.1	0.9	0.34932
	prob. to recombine	0.1	0.5	0.28926
	mutation sigma	0.5	1.5	1.38772
	max. angle offset	1	2	1.32983
	no. best viruses from cluster	1	7	5
	no. best virus clones	1	5	3
	no. max. converge solutions	1	5	4
	convergence epsilon	0.001	0.015	0.00689

References

Anton, C.; Leon, F.; Gavrilescu, M.; Drăgoi, E.-N.; Floria, S.-A.; Curteanu, S.; Lisa, C. Obtaining Bricks Using Silicon-Based Materials: Experiments, Modeling and Optimisation with Artificial Intelligence Tools. Math 2022, 10, 1891. [Google Scholar] [CrossRef]
Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE Trans. Evolut. Comput. 1997, 1, 67–82. [Google Scholar] [CrossRef] [Green Version]
Elsken, T.; Metzen, J.H.; Hutter, F. Neural Architecture Search: A Survey. J. Mach. Learn. Res. 2019, 20, 1997–2017. [Google Scholar]
Kaveh, M.; Khishe, M.; Mosavi, M.R. Design and implementation of a neighborhood search biogeography-based optimisation trainer for classifying sonar dataset using multilayer perceptron neural network. Analog Integr. Circuits Signal Process. 2018, 100, 405–428. [Google Scholar] [CrossRef]
Fadakar, E.; Ebrahimi, M. A New Metaheuristic Football Game Inspired Algorithm. In Proceedings of the 1st Conference on Swarm Intelligence and Evolutionary Computation (CSIEC 2016), Higher Education Complex of Bam, Bam, Iran, 9–11 March 2016. [Google Scholar]
Atashpaz-Gargari, E.; Lucas, C. Imperialist Competitive Algorithm: An Algorithm for Optimization Inspired by Imperialistic Competition. In Proceedings of the 2007 IEEE Congress on Evolutionary Computation, Singapore, 25–28 September 2007; pp. 4661–4667. [Google Scholar]
Wang, L.; Ni, H.; Yang, R.; Fei, M.; Ye, W.A. Simple Human Learning Optimization Algorithm. In Communications Computer and Information Science; Book Series CCIS; Springer: Berlin/Heidelberg, Germany, 2014; Volume 462, pp. 56–65. [Google Scholar]
Liu, Z.-Z.; Chu, D.H.; Song, C.; Xue, X.; Lu, B.Y. Social learning optimization (SLO) algorithm paradigm and its application in QoS-aware cloud service composition. Inf. Sci. 2016, 326, 315–333. [Google Scholar] [CrossRef]
Rao, R.V.; Savsani, V.J.; Vakharia, D.P. Teaching–learning-based optimization: A novel method for constrained mechanical design optimization problems. Comput. Aided Des. 2011, 43, 303–315. [Google Scholar] [CrossRef]
Cortés, P.; García, J.M.; Muñuzuri, J.; Onieva, L. Viral systems: A new bio-inspired optimisation approach. Comput. Oper. Res. 2008, 35, 2840–2860. [Google Scholar] [CrossRef]
Jaderyan, M.; Khotanlou, H. Virulence optimization algorithm. Appl. Soft Comput. 2016, 43, 596–618. [Google Scholar] [CrossRef]
Drăgoi, E.-N.; Curteanu, S.; Cașcaval, D.; Galaction, A.-I. Artificial Neural Network Modeling of Mixing Efficiency in a Split-Cylinder Gas-Lift Bioreactor for Yarrowia lipolytica Suspensions. Chem. Eng. Commun. 2016, 203, 1600–1608. [Google Scholar] [CrossRef]
Pirdashti, M.; Movagharnejad, K.; Curteanu, S.; Drăgoi, E.-N.; Rahimpour, F. Prediction of partition coefficients of guanidine hydrochloride in PEG–phosphate systems using neural networks developed with differential evolution algorithm. J. Ind. Eng. Chem. 2015, 27, 268–275. [Google Scholar] [CrossRef]
Curteanu, S.; Suditu, G.D.; Buburuzan, A.M.; Drăgoi, E.-N. Neural networks and differential evolution algorithm applied for modelling the depollution process of some gaseous streams. Environ. Sci. Pollut. Res. 2014, 21, 12856–12867. [Google Scholar] [CrossRef] [PubMed]
Si, T.; Bagchi, J.; Miranda, P.B.C. Artificial neural network training using metaheuristics for medical data classification: An experimental study. Expert Syst. Appl. 2022, 193, 116423. [Google Scholar] [CrossRef]
Sharifi, A.; Alizadeh, K. Comparison of the Particle Swarm Optimization with the Genetic Algorithms as a Training for Multilayer Perceptron Technique to Diagnose Thyroid Functional Disease. Shiraz E-Med. J. 2020, 22. [Google Scholar] [CrossRef]
Bhattacharjee, K.; Pant, M. Hybrid particle swarm optimization-genetic algorithm trained multi-layer perceptron for classification of human glioma from molecular brain neoplasia data. Cogn. Syst. Res. 2019, 58, 173–194. [Google Scholar] [CrossRef]
Risi, S.; Togelius, J. Neuroevolution in Games: State of the Art and Open Challenges. IEEE Trans. Comput. Intell. AI Games 2017, 9, 25–41. [Google Scholar] [CrossRef] [Green Version]
Parker, M.; Bryant, B.D. Neurovisual Control in the Quake II Environment. IEEE Trans. Comput. Intell. AI Games 2012, 4, 44–54. [Google Scholar] [CrossRef] [Green Version]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Let a biogeography-based optimizer train your Multi-Layer Perceptron. Inf. Sci. 2014, 269, 188–209. [Google Scholar] [CrossRef]
Al Bataineh, A.; Jarrah, A. High Performance Implementation of Neural Networks Learning Using Swarm Optimization Algorithms for EEG Classification Based on Brain Wave Data. Int. J. Appl. Metaheur. Comput. 2022, 13, 1–17. [Google Scholar] [CrossRef]
Jalali, S.M.J.; Karimi, M.; Khosravi, A.; Nahavandi, S. An efficient Neuroevolution Approach for Heart Disease Detection. In Proceedings of the IEEE International Conference on Systems, Man and Cybernetics (SMC), Bari, Italy, 6–9 October 2019. [Google Scholar] [CrossRef]
Jalali, S.M.J.; Ahmadian, S.; Khosravi, A.; Mirjalili, S.; Mahmoudi, M.R.; Nahavandi, S. Neuroevolution-based autonomous robot navigation: A comparative study. Cogn. Syst. Res. 2020, 62, 35–43. [Google Scholar] [CrossRef]
Khan, K.; Sahai, A. A Comparison of BA, GA, PSO, BP and LM for Training Feed forward Neural Networks in e-Learning Context. Int. J. Intell. Syst. Appl. 2012, 4, 23–29. [Google Scholar] [CrossRef]
Hacibeyoglu, M.; Ibrahim, M.H. A Novel Multimean Particle Swarm Optimization Algorithm for Nonlinear Continuous Optimization: Application to Feed-Forward Neural Network Training. Sci. Program. 2018, 2018, 1–9. [Google Scholar] [CrossRef]
Al Bataineh, A.; Kaur, D.; Jalali, S.M.J. Multi-Layer Perceptron Training Optimization Using Nature Inspired Computing. IEEE Access 2022, 10, 36963–36977. [Google Scholar] [CrossRef]
Zhang, T.; Wang, J.; Dan, Y.; Lanqiu, Y.; Dai, J.; Han, X.; Sun, X.; Xu, K. Efficient training and design of photonic neural network through neuroevolution. Opt. Express 2019, 27, 37150. [Google Scholar] [CrossRef] [PubMed]
Custode, L.L.; Mo, H.; Ferigo, A.; Iacca, G. Evolutionary Optimization of Spiking Neural P Systems for Remaining Useful Life Prediction. Algorithms 2022, 15, 98. [Google Scholar] [CrossRef]
Jalali, S.M.J.; Hedjam, R.; Khosravi, A.; Heidari, A.A.; Mirjalili, A.; Nahavandi, S. Autonomous Robot Navigation Using Moth-Flame-Based Neuroevolution. In Evolutionary Machine Learning Techniques. Algorithms for Intelligent Systems; Mirjalili, S., Faris, H., Aljarah, I., Eds.; Springer: Singapore, 2022; pp. 67–83. Available online: https://www.researchgate.net/publication/337197045 (accessed on 13 August 2022). [CrossRef]
Polikar, R. Ensemble based systems in decision making. IEEE Circuits Syst. Mag. 2006, 6, 21–45. [Google Scholar] [CrossRef]
Escovedo, T.; Koshiyama, A.; da Cruz, A.A.; Vellasco, M. Neuroevolutionary learning in nonstationary environments. Appl. Intell. 2020, 50, 1590–1608. [Google Scholar] [CrossRef] [Green Version]
Li, K.; Tian, H. A Bagging Based Multiobjective Differential Evolution with Multiple Subpopulations. IEEE Access 2021, 9, 105902–105913. [Google Scholar] [CrossRef]
Wang, F.; Liao, F.; Li, Y.; Yan, X.; Chen, X. An ensemble learning based multi-objective evolutionary algorithm for the dynamic vehicle routing problem with time windows. Comput. Ind. Eng. 2021, 107131. [Google Scholar] [CrossRef]
Dufourq, E.; Pillay, N. Hybridizing evolutionary algorithms for creating classifier ensembles. In Proceedings of the Sixth World Congress on Nature and Biologically Inspired Computing (NaBIC 2014), Porto, Portugal, 30 July 2014–1 August 2014. [Google Scholar] [CrossRef]
Yu, E.L.; Suganthan, P.N. Ensemble of niching algorithms. Inf. Sci. 2010, 180, 2815–2833. [Google Scholar] [CrossRef]
Faber, K.; Pietron, M.; Zurek, D. Ensemble Neuroevolution-Based Approach for Multivariate Time Series Anomaly Detection. Entropy 2021, 23, 1466. [Google Scholar] [CrossRef]
Ngo, G.; Beard, R.; Chandra, R. Evolutionary Bagged Ensemble Learning. Available online: https://arxiv.org/pdf/2208.02400.pdf (accessed on 13 August 2022).
Da Silva, R.S.; Da Costa-Abreu, M.; Smith, S. Investigating the use of an ensemble of evolutionary algorithms for letter identification in tremulous medieval handwriting. Evol. Intell. 2020, 14, 1657–1669. [Google Scholar] [CrossRef]
Bhowan, U.; Johnston, M.; Zhang, M.; Yao, X. Evolving Diverse Ensembles Using Genetic Programming for Classification with Unbalanced Data. IEEE Trans. Evol. Comput. 2013, 17, 368–386. [Google Scholar] [CrossRef] [Green Version]
Curteanu, S.; Leon, F.; Gâlea, D. Neural network models for free radical polymerization of methyl methacrylate. Eurasian Chem. Technol. J. 2003, 5, 225–231. [Google Scholar]
Curteanu, S. Direct and inverse neural network modeling in free radical polymerization. Cent. Eur. J. Chem. 2004, 2, 113–140. [Google Scholar] [CrossRef]
Curteanu, S.; Leon, F. Hybrid neural network models applied to a free radical polymerization process. Polym. Plast. Technol. Eng. 2006, 45, 1013–1023. [Google Scholar] [CrossRef]
Curteanu, S.; Leon, F.; Vicoveanu, A.M.; Logofatu, D. Regression methods based on nearest neighbors with adaptive distance metrics applied to a polymerization process. Mathematics 2021, 9, 547. [Google Scholar] [CrossRef]
Dragoi, E.-N.; Curteanu, S.; Galaction, A.-I.; Cascaval, D. Optimization methodology based on neural networks and self-adaptive differential evolution algorithm applied to an aerobic fermentation process. Appl. Soft Comput. 2013, 13, 222–238. [Google Scholar] [CrossRef]
Khaw, J.F.C.; Lim, B.S.; Lim, L.E.N. Optimal design of neural networks using the Taguchi method. Neurocomputing 1995, 7, 225–245. [Google Scholar] [CrossRef]
Zhou, Z.-H. Ensemble Learning. Encycl. Biom. 2009, 1, 270–273. [Google Scholar] [CrossRef]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
Kao, S.-C.; Krishna, T. E3: A HW/SW Co-design Neuroevolution Platform for Autonomous Learning in Edge Device. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Stony Brook, NY, USA, 28–30 March 2021. [Google Scholar] [CrossRef]

Figure 1. Illustration of the main steps of our approach.

Figure 2. A general structure of an MLP and its associated coded solution.

Figure 3. Example of two different MLPs that are associated with an array of the same length.

Figure 4. Hybrid cascade strategy. For each iteration of the ensemble, the initial population is generated from the improved solution of the previous iteration. Starting from this solution, a new population sol_i*, i = 1 … N, is generated by applying a mutation operator to clones sol_i, i = 1 … N of the previously-improved solution.

Figure 5. Hybrid single elite solution strategy. For each iteration of the ensemble, an initial population sol_i*, i = 1 … N, is generated by cloning the best solutions of each algorithm from the previous iteration and applying a mutation operator to the resulting clones sol_i, i = 1 … N.

Figure 6. Hybrid multiple elite solutions strategy. For each iteration of the ensemble, the initial population is generated from the improved population of the previous iteration. Starting from the improved population sol_i, i = 1 … N, an initial population sol_a*, sol_b*, …, sol_N* is generated for the next ensemble iteration by taking samples sol_a, sol_b, …, sol_N from sol_i and applying a mutation operator.

Figure 7. The experiment setup.

Figure 8. The convergence curve graphs on: (a) train samples; (b) test samples.

Figure 9. Convergence curve graphs of ensembles using a balanced number of iterations and an unbalanced number of iterations for weak optimizers: (a) cascade strategy; (b) single elite solution strategy; (c) multiple elite solutions strategy.

Table 1. The algorithms sorted according to their performance.

Optimization Algorithm	Mean RMSE	Mean No. EVALUATIONS	Best NN Structure
ICA (best)	0.00581	37,202	[3-11-3-3]
TLBO	0.00612	40,424	[3-7-3]
SHLO	0.00643	40,033	[3-7-2-3]
SLO	0.00663	60,998	[3-9-2-3]
FGA	0.01020	42,8156	[3-8-2-3]
VS	0.01179	188	[3-10-3-3]
VOA (worst)	0.01265	5558	[3-11-3-3]

Table 2. Simulation results obtained on the MMA data set with the individual optimizers and the ensemble strategies.

Algorithm	Average Results for 30 Independent Runs			Best Results
Algorithm	RMSE Training	RMSE Testing	No. Evals	RMSE Training	RMSE Testing	MLP Structure
ICA	0.00609	0.01079	37,159	0.00553	0.00982	[3-8-2-3]
TLBO	0.00693	0.01210	40,424	0.00598	0.01053	[3-7-2-3]
SHLO	0.00719	0.01265	40,033	0.00653	0.01194	[3-11-2-3]
Hybrid cascade	0.00655	0.01159	39,136	0.00555	0.00983	[3-8-3-3]
Hybrid single elite solution	0.00646	0.01142	39,225	0.00548	0.00973	[3-11-2-3]
Hybrid multiple elite solutions	0.00582	0.01029	39,245	0.00517	0.00932	[3-8-4-3]

Table 3. Performance of ensembles with a balanced number of iterations and an unbalanced number of iterations for weak optimizers.

Ensemble	Average Results for 30 Independent Runs
Ensemble	RMSE Training	RMSE Testing	No. Evals
Cascade-balanced	0.00655	0.01159	39,136
Cascade-ICA	0.00702	0.01234	38,224
Cascade-TLBO	0.00629	0.01111	39,767
Cascade-SHLO	0.00699	0.01240	39,654
SingleElite-balanced	0.00646	0.01142	39,225
SingleElite-ICA	0.00685	0.01207	38,348
SingleElite-TLBO	0.00607	0.01070	39,735
SingleElite-SHLO	0.00650	0.01153	39,628
MultipleElite-balanced	0.00582	0.01029	39,245
MultipleElite-ICA	0.00591	0.01048	38,261
MultipleElite-TLBO	0.00571	0.01015	39,735
MultipleElite-SHLO	0.00584	0.01040	39,603

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Floria, S.-A.; Gavrilescu, M.; Leon, F.; Curteanu, S. Ensembles of Biologically Inspired Optimization Algorithms for Training Multilayer Perceptron Neural Networks. Appl. Sci. 2022, 12, 9997. https://doi.org/10.3390/app12199997

AMA Style

Floria S-A, Gavrilescu M, Leon F, Curteanu S. Ensembles of Biologically Inspired Optimization Algorithms for Training Multilayer Perceptron Neural Networks. Applied Sciences. 2022; 12(19):9997. https://doi.org/10.3390/app12199997

Chicago/Turabian Style

Floria, Sabina-Adriana, Marius Gavrilescu, Florin Leon, and Silvia Curteanu. 2022. "Ensembles of Biologically Inspired Optimization Algorithms for Training Multilayer Perceptron Neural Networks" Applied Sciences 12, no. 19: 9997. https://doi.org/10.3390/app12199997

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ensembles of Biologically Inspired Optimization Algorithms for Training Multilayer Perceptron Neural Networks

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Data Set

3.2. Neural Network Modeling

3.3. Strategies for Combining Optimization Algorithms

3.3.1. Choosing the Best Performing Algorithms

3.3.2. Hybrid Cascade Strategy

3.3.3. Hybrid Single Elite Solution Strategy

3.3.4. Hybrid Multiple Elite Solution Strategy

3.3.5. Mutation Operator

4. Experiments and Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI