Gen2Gen: Efficiently Training Artificial Neural Networks Using a Series of Genetic Algorithms

Tsoulos, Ioannis G.; Charilogis, Vasileios

doi:10.3390/knowledge5030017

Open AccessArticle

Gen2Gen: Efficiently Training Artificial Neural Networks Using a Series of Genetic Algorithms

by

Ioannis G. Tsoulos

^*

and

Vasileios Charilogis

Department of Informatics and Telecommunications, University of Ioannina, 45110 Ioannina, Greece

^*

Author to whom correspondence should be addressed.

Knowledge 2025, 5(3), 17; https://doi.org/10.3390/knowledge5030017

Submission received: 9 May 2025 / Revised: 5 August 2025 / Accepted: 20 August 2025 / Published: 22 August 2025

Download

Browse Figures

Versions Notes

Abstract

Artificial neural networks have been used in a multitude of applications in various research areas in recent decades, providing excellent results in both data classification and data fitting. Their success is based on the effective identification (training) of their parameters using optimization techniques, and hence a series of programming methods have been developed for training these models. However, many times these techniques either can identity only some local minima of the error function with poor overall results or present overfitting problems in which the performance of the artificial neural network is significantly reduced when it is applied to different data from the training set. This manuscript introduces a method for the efficient training of artificial neural networks, where a series of genetic algorithms is applied to the network parameters in several stages. In the first stage, an initial identification of the network value interval is performed; in the second stage, the initial estimate of the value interval is improved; and in the third stage, the final adjustment of the network parameters within the previously identified value interval takes place. The new method was tested on some classification and regression problems found in the relevant literature, and the experimental results were compared against the results obtained by the application of other well-known methods used for neural network training.

Keywords:

genetic algorithms; evolutionary computation; artificial neural networks

1. Introduction

A widely used machine learning model is the artificial neural network [1,2]. In most cases, artificial neural networks are defined as parametric machine learning models, where learning is achieved by calculating the values of their parameters through some optimization technique. The underlying optimization procedure should minimize the associated training error, calculated as

E (N (\vec{x}, \vec{w})) = \sum_{i = 1}^{M} {(N ({\vec{x}}_{i}, \vec{w}) - y_{i})}^{2}

(1)

The function

N (\vec{x}, \vec{w})

stands for the artificial neural network and the input vector

\vec{x}

represents the input pattern. Also, the input vector

\vec{w}

stands for the parameter vector of the neural network. The set

(\vec{x_{i}}, y_{i}), i = 1, . . ., M

defines the so-called training set of the objective problem, where the values

y_{i}

are the expected outputs for every pattern

\vec{x_{i}}

. A closed form that can be used to represent neural networks was used in [3], where it is defined as

N (\vec{x}, \vec{w}) = \sum_{i = 1}^{H} w_{(d + 2) i - (d + 1)} σ (\sum_{j = 1}^{d} x_{j} w_{(d + 2) i - (d + 1) + j} + w_{(d + 2) i})

(2)

Here, the constant H defines the number of processing units and the symbol d stands for the dimension of the input pattern

\vec{x}

. The function

σ (x)

represents the sigmoid function:

σ (x) = \frac{1}{1 + exp (- x)}

(3)

Following Equation (2), it is deduced that the total number of elements in the weight vector are

n = (d + 2) H

. Also, alternative activation functions can be used. As an example, consider the tanh, defined as

t a n h (x) = \frac{e^{2 x} + 1}{e^{2 x} - 1}

(4)

Moreover, Guarnieri et al. introduced the usage of an adaptive spline function as the activation function of neural networks [4]. Similarly, Ertuğrul introduced the trained activation function [5]. Recently, Rasamoelina et al. [6] published a review on activation functions for artificial neural networks.

Artificial neural networks have been incorporated in a wide series of problems appearing in real-world problems, such as image processing [7], time series forecasting [8], credit card analysis [9], problems derived from physics [10], medicine [11,12], mechanical applications [13], etc. During the past years, a series of optimization methods have been incorporated to tackle the Equation (1). Among them one can detect the Back Propagation algorithm [14], the RPROP algorithm [15,16], the ADAM optimization method [17], etc. Moreover, many advanced global optimization methods have also been used, such as genetic algorithms [18], the Particle Swarm Optimization (PSO) method [19], the Simulated Annealing method [20], the Differential Evolution technique [21], the Artificial Bee Colony (ABC) method [22], etc. In the same direction of research, Sexton et al. proposed the application of the tabu search algorithm for optimal neural network training [23]. Additionally, Zhang et al. introduced a hybrid algorithm that combines PSO and the Back Propagation algorithm to efficiently train artificial neural networks [24]. Also, in a recently published paper, Zhao et al. proposed the usage of a new Cascaded Forward Algorithm to train artificial neural networks [25]. Additionally, the widespread use of parallel processing techniques in recent years has resulted in a series of related works on the training of artificial neural networks that utilize such techniques [26,27].

Nevertheless, the previous optimization methods faced a number of problems such as, for example, identifying only local minima of the error function or suffering from the phenomenon of overifitting. In overfitting, the artificial neural network exhibits reduced performance when it used on data that was not used in the training process. This problem has been tackled by various researchers in the past and some methods have been introduced, such as weight sharing [28], pruning [29,30], early stopping [31,32], weight decaying [33,34], etc. Furthermore, many studies proposed as a solution the evolution of the architecture of neural networks using some programming techniques. For example, genetic algorithms were proposed in a series of papers [35,36] to create the architecture of neural networks or the PSO method [37]. Additionally, Siebel et al. introduced the usage of evolutionary reinforcement learning for the optimal design of artificial neural networks [38]. Also, a review on the usage of reinforcement learning for neural architecture search is provided by Jaafra et al. [39]. Recently, Pham et al. introduced a method for the optimal identification of the architecture of neural networks with parameter sharing [40]. Similarly, Xie et al. proposed the incorporation of Stochastic Neural Architecture search [41] for the construction of the architecture of neural networks. Also, Zhou et al. used a Bayesian approach for neural architecture search [42].

In this paper, a three-stage technique is proposed, which aims, on the one hand, to effectively train artificial neural networks and, on the other, to limit the phenomenon of overifitting. In the first phase of the new method, a genetic algorithm is used to make an initial estimate of the value interval for the parameters of the artificial neural network. In this genetic algorithm, a modified version of the training error of the artificial neural network is used in order to avoid the phenomenon of the network parameters taking large values and, consequently, the phenomenon of overifitting. In the second phase of the process, an interval technique using a genetic algorithm is utilized to locate the optimal interval for the parameters of the artificial neural network using the best chromosome obtained by the algorithm of the first phase. Finally, in the third phase a simple genetic algorithm is incorporated to train the artificial neural network using the bounds located in the second phase of the method. The proposed technique was tested on a series of classification and regression problems from various research fields and it was compared against traditional methods used for the training of neural networks.

The proposed method consists of three distinct stages which have as their ultimate goal the improvement of the generalization ability of artificial neural networks. In the first phase, an estimate of the initial values for the value intervals of the parameters of the artificial neural network is performed. In this phase, the use of sigmoid functions is taken into account, as well as the observation that they can lose their generalization abilities when the inputs to them exceed certain values in absolute value. For this reason, the parameters of the artificial neural network are limited in such a way that the inputs of the problem are also taken into account. In the second phase, an interval technique undertakes to identify a reliable value interval for the parameters of the artificial neural network, using the information from the first phase, and finally, in the third phase, an optimization method, such as a genetic algorithm, can be used for the final phase of training the parameters of the artificial neural network.

The remainder of this article has the following sections: Section 2 describes the proposed method, Section 3 outlines the experimental datasets and the series of experiments conducted, and finally, Section 4 presents some conclusions.

2. Method Description

This section describes in detail the three distinct algorithms used in every stage of the proposed method.

2.1. The Algorithm of the First Stage

The activation function used commonly in neural networks is the sigmoid function, defined as

σ (x) = \frac{1}{1 + exp (- x)}

(5)

An example plot for this function is outlined in Figure 1.

As it is clearly shown in this figure, the sigmoid function very quickly takes on the value 1 as x goes towards infinity and very quickly takes on the value 0 as x goes towards minus infinity. This behavior has the direct result that the function loses its generalization capabilities very quickly and these are limited to a small range of values.

The sigmoid function converges very fast to 1 as

x \to + \infty

and to 0 as

x \to - \infty

. The consequence of this effect is that the corresponding computing unit has the same output for different input values and different values for the computational nodes and hence the corresponding unit. Based on the previous consideration, one can define the function

B (N (\vec{x}, \vec{w}), a)

, which stands for the percentage of how many times the absolute value of the input argument of the sigmoid units exceeds a limit a. This function can be used to avoid the phenomenon of overfitting by limiting the values of the parameters of the artificial neural network to specified intervals which will also directly depend on the inputs presented to the sigmoid functions. This function is described in Algorithm 1. The parameter a is used as a heuristic measure for the value that is input to the sigmoid function. If the input to the function is greater in absolute values than this parameter, then we can consider that the sigmoid function loses any generalization abilities, as now the result of the sigmoid will be the same (0 or 1) regardless of changes in its input.

Algorithm 1 Calculating the quantity

B (N (\vec{x}, \vec{w}), a)

with

a > 0

for a a provided neural network

N (x, w) .

Function $B (N (\vec{x}, \vec{w}), a)$
Inputs: The neural network $N (\vec{x}, \vec{w})$ and the bound factor $a > 0$ .
Set $k = 0$
For $i = 1 . . H$ Do
(a)
     For $j = 1 . . M$ Do
     i.         Set $v = \sum_{k = 1}^{d} (w_{(d + 2) i - (d + i) + k} x_{j k}) + w_{(d + 2) i}$
     ii.         If $|v| > a$ then $k = k + 1$
(b)
     EndFor
EndFor
Return $\frac{k}{H ★ M}$
End Function

The steps of the algorithm used are outlined below.

Initialization step.
(a)
Set $N_{g}$ as the maximum number of allowed generations.
(b)
Set $N_{c}$ the number of chromosomes used. Each chromosome is considered as a vector of the $n = (d + 2) H$ double-precision values. The value d is used to represent the dimension of the input pattern and the constant H defines the number of the nodes of the neural network. Every value in the chromosomes is initialized randomly in the range $[- I_{0}, I_{0}], I_{0} > 0$ .
(c)
Set $p_{s}$ as the selection rate, where $p_{s} \leq 1$ .
(d)
Set $p_{m}$ as the mutation rate, where $p_{m} \leq 1$ .
(e)
Set $k = 0$ as the generation counter.
Fitness calculation step.
(a)
For $i = 1, \dots, N_{c}$ perform the following.
i.
Create the neural network $N_{i} (\vec{x}, \vec{g_{i}})$ for the chromosome $\vec{g_{i}}$ .
ii.
Set $E_{i} = \sum_{j = 1}^{M} {(N (\vec{x_{j}}, \vec{g_{i}}) - y_{j})}^{2}$ .
iii.
Set $b_{i} = B (N (\vec{x}, \vec{g_{i}}), a)$ using the function of Algorithm 1.
iv.
Set $f_{i} = E (N (\vec{x}, \vec{g_{i}})) \times (1 + λ b_{i}^{2})$ as value for the fitness of chromosome $g_{i}$ , with $λ > 1$ .
(b)
End For
Application of genetic operations.
(a)
Copy the best $(1 - p_{s}) \times N_{c}$ chromosomes of the current population intact to the next generation. The remaining of chromosomes will be replaced by new chromosomes produced during crossover and mutation.
(b)
Perform the crossover procedure. For each new element, two parents, $z = (z_{1}, z_{2}, . . ., z_{n}), w = (w_{1}, w_{2}, . . ., w_{n})$ , are selected from the current population using using tournament selection. After the selection of the parents, the new offspring $\tilde{z}$ and $\tilde{w}$ are formed using the following:

$\begin{matrix} \tilde{z_{i}} & = & r_{i} z_{i} + (1 - r_{i}) w_{i} \\ \tilde{w_{i}} & = & r_{i} w_{i} + (1 - r_{i}) z_{i} \end{matrix}$

(6)

where $r_{i}$ are random numbers in the range $[- 0.5, 1.5]$ [43].
(c)
Perform the mutation procedure, as proposed in [44]: For every chromosome and each element select a random number $r \in [0, 1]$ . If $r \leq p_{m}$ alters the corresponding element $g_{i j}$ as

$g_{i j} = \{\begin{matrix} g_{i j} + Δ (t, b_{g, i} - g_{i j}) & t = 0 \\ g_{i j} - Δ (t, g_{i j} - a_{g, i}) & t = 1 \end{matrix}$

(7)

The number t is a random number that can be 0 or 1 and the function $Δ (t, y)$ is calculated as

$Δ (t, y) = y (1 - ω^{(1 - \frac{t}{N_{t}}) z})$

(8)

where $ω \in [0, 1]$ is a random number and z is parameter defined from the user.
Termination check step.
(a)
Set $k = k + 1$ .
(b)
If $k < N_{g}$ go to fitness calculation step.
Final Step.
(a)
Obtain the chromosome $g^{*}$ having the lowest fitness value in the population.
(b)
Produce the vectors $L^{*}$ and $R^{*}$ using the following:

$\begin{matrix} L_{i}^{*} & = & - f |g_{i}^{*}|, i = 1, \dots, n \\ R_{i}^{*} & = & f |g_{i}^{*}|, i = 1, \dots, n \end{matrix}$

where $f > 1$ . These vectors will be used in the following phase of the proposed algorithm.

2.2. The Algorithm of the Second Stage

In the second phase of the current work, a bound method is applied using the vectors

L^{*}

and

R^{*}

of the previous stage to discover the optimal bound for the parameters of the network. During this phase, a modified genetic algorithm is incorporated, where the chromosomes are defined as interval sets

[\vec{L_{k}}, \vec{R_{k}}]

. Also, the fitness value for each chromosome is considered as an interval

f = [f_{1}, f_{2}] .

The function

D (a, b)

is introduced here for the comparison of two intervals

a = [a_{1}, a_{2}]

and

b = [b_{1}, b_{2}] .

This function is defined as

D (a, b) = \{\begin{matrix} TRUE, & a_{1} < b_{1}, OR (a_{1} = b_{1} AND a_{2} < b_{2}) \\ FALSE, & OTHERWISE \end{matrix}

(9)

A modified genetic algorithm is incorporated here to locate the most promising interval for the weights of the neural network. This procedure uses the vectors

L^{*}

and

R^{*}

of the previous algorithm. Each chromosome is considered as a set of intervals, which is initialized randomly inside the vectors

L^{*}

and

R^{*}

. The steps of the algorithm for the second phase are presented below.

Initialization step.
(a)
Set as $N_{g}$ the maximum number of allowed generations and as $N_{c}$ the total number of chromosomes.
(b)
Set as $p_{s}$ the selection rate and as $p_{m}$ the mutation rate.
(c)
Initialize every chromosome, $g_{i} =$ $[\vec{L_{i}}, \vec{R_{i}}], i = 1, \dots, N_{c}$ , randomly inside the produced vectors $L^{*}$ and $R^{*}$ from the previous phase.
(d)
Set as $N_{s}$ the number of samples used in the fitness calculation step.
(e)
Set $k = 0$ , the generation counter.
Fitness calculation step.
(a)
For $i = 1, \dots, N_{c}$ perform the following.
- Calculate the fitness $f_{i}$ of each chromosome $g_{i}$ using the procedure provided in Algorithm 2.
(b)
End For.
Application of genetic operators.
(a)
Selection procedure. Copy the best $(1 - p_{s}) \times N_{c}$ chromosomes to the next generation without changes. The remaining ones will be replaced by offspring created using crossover and the mutation procedure. The sorting is performed using the operator $D (a, b)$ for the fitness values.
(b)
Crossover procedure. Perform the crossover procedure, where for every couple $(\tilde{z}, \tilde{w})$ of produced chromosomes, two parents $(z, w)$ will be chosen using tournament selection. The new chromosomes will be produced using the one-point crossover method, graphically presented in Figure 2.
(c)
Mutation procedure. For each element of each chromosome a random number $r \in [0, 1]$ is drawn. The corresponding element is altered randomly when $r \leq p_{m}$ .
Termination check step.
(a)
Set $k = k + 1$
(b)
If $k < N_{g}$ , go to fitness calculation step.
Final step.
(a)
Obtain the best chromosome from the population $g^{*}$ .
(b)
Produce the corresponding set of intervals $[\vec{L^{*}}, \vec{R^{*}}]$ .

Algorithm 2 Fitness calculation function.

function fitness $(g, N_{s})$
Input: The chromosome $g = [\vec{L_{g}}, \vec{R_{g}}]$ and the number of random samples $N_{s}$ .
Draw $N_{s}$ random samples in g and create the set $S_{a} = \{\vec{s_{1}}, \vec{s_{2},} \dots, \vec{s_{N_{s}}}\}$ .
Set $f_{\min} = \infty$
Set $f_{max} = - \infty$
For $i = 1, \dots, N_{s}$ do
(a)
Set $E_{i} = \sum_{j = 1}^{M} {(N (\vec{x_{j}}, \vec{s_{i}}) - y_{j})}^{2}$
(b)
If $E_{i} < f_{\min}$ set $f_{\min} = E_{i}$
(c)
If $E_{i} > f_{\max}$ set $f_{\max} = E_{i}$
End For
Return as fitness value the quantity $f_{g} = [f_{\min}, f_{\max}]$
End Function

2.3. The Final Training Algorithm

During the final step of the proposed method, a genetic algorithm is incorporated to train the artificial neural network inside the bounds

[\vec{L^{*}}, \vec{R^{*}}]

produced in the final step of the previous phase of the method. The main steps of this algorithm are listed below.

Initialization step.
(a)
Set as $N_{g}$ the maximum number of allowed generations and as $N_{c}$ the total number of chromosomes.
(b)
Set as $p_{s}$ the selection rate and as $p_{m}$ the mutation rate.
(c)
Initialize randomly the chromosomes $g_{i}, i = 1, \dots, N_{c}$ as random vectors with $n = (d + 2) H$ elements inside the bounds $[\vec{L^{*}}, \vec{R^{*}}]$ .
(d)
Set $k = 0$ , the generation counter.
Fitness calculation step.
(a)
For $i = 1, \dots, N_{c}$ perform the following.
i.
Create the neural network $N_{i} (\vec{x}, \vec{g_{i}})$ for the chromosome $\vec{g_{i}}$ .
ii.
Calculate the associated fitness value $f_{i}$ as $f_{i} = \sum_{j = 1}^{M} {(N (\vec{x_{j}}, \vec{g_{i}}) - y_{j})}^{2}$ .
(b)
End For.
Incorporation of genetic operators. Apply the same genetic operators as in the first phase of the proposed algorithm, described in Section 2.1.
Termination check step.
(a)
Set $k = k + 1$
(b)
If $k < N_{g}$ , go to fitness calculation step of the current algorithm.
Testing step.
(a)
Obtain the chromosome with the lowest fitness value in the population and denote it as $g^{*}$ .
(b)
Produce the associated neural network $N (\vec{x}, \vec{g^{*}})$ .
(c)
Apply a local search procedure to the error function for this network. The local search procedure used was a BFGS method of Powell [45].
(d)
Apply the neural network on the associated test set of the problem to obtain the test error.

3. Results

The validation of the proposed method was performed using a wide series of classification and regression datasets, available from various sources from the Internet. These datasets were downloaded from

The UCI database, https://archive.ics.uci.edu/ (accessed on 19 April 2025) [46].
The Keel website, https://sci2s.ugr.es/keel/datasets.php (accessed on 19 April 2025) [47].
The Statlib URL https://stat.ethz.ch/Teaching/Datasets/Statlib-index (accessed on 19 April 2025).

3.1. Experimental Datasets

The following datasets were utilized in the conducted experiments:

1.: Appendictis, which is a medical dataset [48].
2.: Alcohol, which is dataset regarding alcohol consumption [49].
3.: Australian, which is a dataset produced from various bank transactions [50].
4.: Balance dataset [51], produced from various psychological experiments.
5.: Cleveland, a medical dataset which was discussed in a series of papers [52,53].
6.: Circular dataset, which is an artificial dataset.
7.: Dermatology, a medical dataset for dermatology problems [54].
8.: The Hayes–Roth dataset, which was initially suggested in [55].
9.: Heart, which is a dataset related to heart diseases [56].
10.: HeartAttack, which is related to heart diseases.
11.: Housevotes, a dataset which contains data from Congressional voting in the USA [57].
12.: Ionosphere, which is related to measurements derived from the ionosphere [58,59].
13.: Liverdisorder, a medical dataset [60,61].
14.: The Lymography dataset [62].
15.: Mammographic, which is related to the presence of breast cancer [63].
16.: Parkinsons, which is a medical dataset used for the detection of Parkinson’s disease [64,65].
17.: Pima, which is related to the presence of diabetes [66].
18.: Popfailures, a dataset related to experiments regarding climate [67].
19.: Regions2, a medical dataset applied to liver problems [68].
20.: Saheart, which is a medical dataset concerning heart diseases [69].
21.: Segment dataset [70].
22.: The Sonar dataset, related to sonar signals [71].
23.: Statheart, a medical dataset related to heart diseases.
24.: Spiral, which was created artificially and contains two distinct classes.
25.: Student, which is a dataset regarding experiments in schools [72].
26.: Transfusion, which is also a dataset used for medical purposes [73].
27.: Wdbc, which is used for the detection of breast cancer [74,75].
28.: Wine, a dataset used to detect the quality of wines [76,77].
29.: EEG, which is a dataset regarding EEG recordings [78,79] and from this dataset the following cases were used: Z_F_S, ZO_NF_S, ZONF_S and Z_O_N_F_S.
30.: Zoo, which is a dataset regarding animal classification [80].

Moreover, a series of regression datasets was adopted in the performed experiments. The list of the regression datasets is as follows:

1.: Abalone, which is a dataset for the detection of the age of abalones [81].
2.: Airfoil, founded by NASA [82].
3.: Auto, a dataset used to predict the fuel consumption in cars.
4.: BK, which is used to predict the points scored in basketball games.
5.: BL, a dataset that contains measurements from electricity experiments.
6.: Baseball, which is a dataset used to predict the income of baseball players.
7.: Concrete, which is a civil engineering dataset [83].
8.: DEE, a dataset that is used to predict the price of electricity.
9.: Friedman, which is an artificial dataset [84].
10.: FY, which is a dataset regarding the longevity of fruit flies.
11.: HO, a dataset located in the STATLIB repository.
12.: Housing, regarding the price of houses [85].
13.: Laser, which is used in physics experiments.
14.: The MB dataset, originated in the Smoothing Methods in Statistics.
15.: The NT dataset [86].
16.: Mortgage, a dataset that contains data from the economy of the USA.
17.: PL dataset, located in the STALIB repository.
18.: Plastic, a dataset regarding problems occurring with pressure on plastics.
19.: The PY dataset [87].
20.: Quake, a dataset regarding the measurements of earthquakes.
21.: SN, a dataset related to trellising and pruning.
22.: Stock, which is related to the prices of stocks.
23.: Treasury, a dataset that contains measurements from the economy of the USA.

3.2. Experimental Results

The software used in the experiment was coded in C++ with the assistance of the freely available Optimus environment [88]. Each experiment was conducted 30 times and in every execution a different seed for the random generator was used. For the validation of the experimental results, the ten-fold cross validation technique was used. The average classification error, as measured in the corresponding test set was reported for the classification datasets. The classification error is computed using the following formula:

E_{C} (N (\vec{x}, \vec{w})) = 100 \times \frac{\sum_{i = 1}^{N} (class (N (\vec{x_{i}}, \vec{w})) - y_{i})}{N}

(10)

For the calculation of this error, the test set T, a set

T = (x_{i}, y_{i}), i = 1, \dots, N

, is used. Similarly, the average regression error is reported for the regression datasets and it is calculated as follows:

E_{R} (N (\vec{x}, \vec{w})) = \frac{\sum_{i = 1}^{N} {(N (\vec{x_{i}}, \vec{w}) - y_{i})}^{2}}{N}

(11)

Table 1 contains the values used for the experimental parameters of the proposed method. The results obtained for the classification datasets are depicted in Table 2 and for the regression datasets in Table 3. The following notations were used for the experimental tables:

The column DATASET is used to denote the name of the dataset.
The column BFGS represents the results obtained by the training of a neural network with $H = 10$ processing nodes using the BFGS optimization method [45]. This method terminates either when the derivative is zero or when a maximum number of iterations is reached. In the experiments performed, this number was set to 2000.
The column ADAM is used to denote the training of a neural network with $H = 10$ processing nodes using the ADAM optimization method [17]. The parameters used for the conducted experiments were the following: $b_{1} = 0.9, b_{2} = 0.999$ , and the maximum number of iterations was set to 10,000.
The column NEAT represents the incorporation of the NEAT method (NeuroEvolution of Augmenting Topologies) [89]. The population size was set to 500, as in the case of the proposed method.
The column RBF is used to denote the usage of a Radial Basis Function (RBF) network [90,91] with 10 processing nodes. The network was trained with the original training method incorporated in RBF networks with two distinct phases: during the first phase, the centers and the variances of the model were calculated using the k-means algorithm [92], and during the second phase, the weights of the network were obtained by solving a linear system of equations.
The column GENETIC denotes the usage of a genetic algorithm to train a neural network with $H = 10$ processing nodes. The parameters used in this algorithm are listed in Table 1.
The column PROPOSED denotes experimental results of the proposed method.
The row AVERAGE represents the average classification or regression error for all datasets.

Table 2 shows the error rates resulting from the incorporation of the mentioned machine learning models on the classification datasets used. The columns refer to the models (BFGS, ADAM, NEAT, RBF, GENETIC, PROPOSED), while the rows correspond to the datasets. From the analysis of the data, it is observed that the PROPOSED model exhibits the lowest error rates in many datasets, such as “HouseVotes” (3.05%), “Dermatology” (5.97%), and “ZONF_S” (2.35%). Furthermore, it has the lowest average error rate (19.49%) when a comparison is made against the other models, indicating overall superior performance. The NEAT model shows the highest error rates in several cases, such as “Cleveland” (77.55%) and “Segment” (68.97%), while ADAM and BFGS exhibit high errors in datasets like “Z_F_S” (47.81% and 39.37%, respectively). However, ADAM has a slightly better average error rate (33.73%) compared to other traditional models like BFGS (33.50%) and NEAT (32.77%). The GENETIC model, although competitive in some datasets like “Hayes–Roth” (56.18%) and “Z_O_N_F_S” (64.81%), has a higher average error rate (25.68%) compared to RBF (28.54%). RBF, though not the best in terms of accuracy, demonstrates balanced performance in many cases, with low error rates in datasets such as “Popfailures” (7.04%) and “HouseVotes” (6.13%). Overall, the PROPOSED model stands out as the most effective, with the lowest average error rates and exceptional performance across multiple dataset categories, making it the preferred choice for classification tasks.

Executions were carried out using scripts in the R language, based on the experimental measurement tables, to determine the significance levels of the experiments using the critical parameter p. In Figure 3, the significance levels are presented, referring to classification datasets and comparing the performance of the PROPOSED model with various machine learning models. The comparisons include the following cases: PROPOSED vs. BFGS with p = **** (very extremely significant), PROPOSED vs. ADAM with p = ****, PROPOSED vs. NEAT with p = ****, PROPOSED vs. RBF with p = ****, and PROPOSED vs. GENETIC with p = ****. The results provide a clear assessment of the statistical significance of the differences in performance between the PROPOSED model and the other models. The lower the p-value, the stronger the indication that the observed difference in performance is not due to random factors but reflects the genuine superiority of the PROPOSED model.

As an example of a comparison in terms of the drop in error in the control set between the proposed method and the simple genetic algorithm, consider Figure 4.

As can be seen from the figure above, the proposed method has a lower error in the control set compared to the simple genetic algorithm and furthermore presents very small fluctuations in the value of this specific error.

Table 3 presents the obtained regression errors from the usage of different machine learning models on regression datasets. The columns refer to the models (BFGS, ADAM, NEAT, RBF, GENETIC, PROPOSED), while the rows correspond to the datasets. From the analysis of the data, it is evident that the PROPOSED model managed to achieve the lowest average error (5.33). This model exhibits exceptionally low errors in datasets such as “BL” (0.001), “HO” (0.012), and “Concrete” (0.004). The RBF model ranks second in performance, with an average error of 9.19 and stands out for its low values in datasets like “Laser” (0.03) and “PY” (0.012). The GENETIC model, with an average error of 8.1, demonstrates competitive performance in certain datasets like “Plastic” (2.79) and “Treasury” (2.93) but generally falls short compared to PROPOSED and RBF. The NEAT model has a higher average error (12.84), though it performs well in datasets such as “Stock” (12.23). The ADAM and BFGS models exhibit the highest average errors, 19.62 and 26.43, respectively, indicating less reliable overall performance. However, ADAM performs well in datasets like “BK” (0.03) and “FY” (0.038), while BFGS achieves good values in specific datasets like “Airfoil” (0.003). Overall, the PROPOSED model significantly outperforms the others across most datasets, showcasing the best overall accuracy. RBF and GENETIC are also reliable in specific cases, while ADAM and BFGS, although less competitive, deliver good results in certain datasets.

Figure 5 displays the results of statistical significance tests conducted on the regression datasets, aiming to evaluate the statistical significance of performance differences between the proposed method (PROPOSED) and other machine learning methods. The p-values obtained from the statistical tests are extremely low, indicating strongly statistically significant differences: for the comparison PROPOSED vs. BFGS, the p-value is ***; for the comparison PROPOSED vs ADAM, the p-value is ***; for the comparison PROPOSED vs. NEAT, the p-value is **** for the comparison PROPOSED vs. RBF, the p-value is ***; and for the comparison PROPOSED vs. GENETIC, the p-value is ***. These findings demonstrate that the proposed method does not differ randomly from the other methods but exhibits statistically significant superiority in performance. The presence of three or four asterisks indicates that the differences are at least highly significant, meaning that the probability of the observed differences being due to chance is less than 0.1%.

Furthermore, to clarify the effect of the number of generations parameter on the speed of the method, another experiment was performed, in which the number of generations was gradually increased from 50 to 400 and the execution times were compared between the simple genetic algorithm method and the proposed procedure. The results are presented graphically in Figure 6.

As expected, the time required by the proposed method increases significantly as the number of generations increases. This is of course because the proposed method consists of a series of genetic algorithms executed in serial. However, the execution time could be significantly reduced by using parallel programming techniques, since genetic algorithms can be parallelized relatively easily.

An additional experiment was performed using a variety of values for the initialization factor

I_{0}

and the regression datasets. The average regression error from this experiment and for each value of

I_{0}

is depicted graphically in Figure 7.

The obtained regression error remains low for lower values of the initialization factor and increases as this factor obtains higher values. This means that initializing the parameters of the neural network in a value interval with smaller extreme values and a smaller range gives the artificial neural network better generalization capabilities.

Also, a similar experiment was conducted using different values of the scale factor f and the utilized regression datasets. The average regression error for this experiment is outlined graphically in Figure 8.

As it can observed, as the scale factor increases, the average regression error increases also. The conclusion from this experiment is that the artificial neural network is able to generalize more efficiently when its parameter values are limited to a smaller range of values than those identified in the first phase.

Furthermore, in order to assess the contribution of the local optimization method BFGS to the performance of the proposed method, an additional experiment was performed where the proposed method was executed without the use of the method BFGS in the final stage. The experimental results from the above experiment are presented in detail in Table 4 for the classification datasets.

As can be seen from the study of the above results, the local optimization method BFGS improves the results of the proposed method in some problems, but on average the classification error of the proposed method remains low compared to a simple genetic algorithm.

3.3. A Practical Example

As a practical example of an application with many patterns, consider the PIRvision dataset, which was presented in 2023 [93]. This dataset contains data from occupancy detection. The associated data were collected from a Synchronized Low-Energy Electronically Chopped Passive Infra-Red sensing node in residential and office environments. The dataset has 15,302 patterns and each pattern has 59 features. The following methods were applied to this dataset in the conducted experiments:

RBF, which represents the application of the RBF network with 10 processing nodes.
BFGS, which stands for the BFGS method, used to train a neural network with $H = 10$ processing nodes.
GENETIC, which represents a genetic algorithm incorporated to train a neural network with $H = 10$ processing nodes.
GEN2GEN, when represents the proposed method.

The results were validated using the ten-fold cross validation method and they are presented graphically in Figure 9.

As is evident from the experimental results, the proposed method significantly reduces the classification error, especially compared to the simple genetic algorithm and is around 5%.

4. Conclusions

The proposed method for training artificial neural networks is based on the application of genetic algorithms in three distinct phases, with the primary objective of efficient training and minimizing overfitting, a common challenge in modern optimization techniques. The first phase focuses on identifying an initial interval for the values of the network. This phase is crucial as it sets the initial positioning of the parameters within a range that avoids excessively large values, which could limit the model’s generalization capability. The proposed value range is determined using a genetic algorithm that incorporates a modified error calculation, penalizing large parameter values. This step reduces the risk of overfitting to the training data, enhancing the model’s capacity to respond effectively to unseen data. In the second phase, the method employs an optimized genetic algorithm to identify the ideal parameter value bounds within the initially defined range. This process makes the method particularly effective as it focuses on intervals already evaluated as suitable while incorporating representative samples from the initial value range to assess accuracy. The use of genetic algorithms allows for gradual and adaptive improvement, eliminating local minima, a frequent issue in traditional optimization methods. The third phase focuses on training the neural network within the optimized parameter bounds. In this phase, a genetic algorithm to minimize the training error followed by a local optimization step using the BFGS method are used. This local optimization ensures further accuracy improvement, fully utilizing the model’s potential. The experiments conducted demonstrate the clear superiority of the proposed method compared to other established techniques. For classification datasets, the method achieved significantly lower error rates compared to techniques like ADAM, BFGS, and NEAT. For example, in datasets such as Dermatology and HouseVotes, the error rate was nearly halved compared to alternative methods. Similar results were observed for the regression datasets, where the method achieved the highest accuracy across nearly all datasets. The lowest average error achieved highlights its consistent and versatile performance. A notable innovation of the method is its approach to tackling overfitting. The genetic algorithms enable exploration across a wide range of values without being constrained to local minima. Simultaneously, the incorporation of penalties for large parameter values prevents excessive adaptation to the training data. This is particularly important as overfitting often limits the performance of artificial neural networks when applied to unseen data. Experiments with varying initial parameters, such as the initialization factor and the scale factor, provide valuable insights into model configuration. For instance, smaller initial value ranges contributed to better generalization, while larger scale factor values led to higher error, emphasizing the importance of tighter parameter bounds. This indicates that careful parameter selection is critical for overall performance.

The proposed method paves new paths for the application of genetic algorithms in the training process of neural networks. Its adaptive nature makes it suitable for a wide range of applications, from medical diagnosis and forecasting of physical phenomena to the optimization of industrial processes. Future steps could include its application in deep learning networks, the integration of hybrid methods combining genetic algorithms with other optimization techniques, and the use of distributed computing environments to accelerate training. This approach has the potential to become a benchmark for effective and reliable training of artificial neural networks.

Although the proposed method demonstrates clear superiority across a wide range of classification and regression problems, it is important to note that the experiments and analysis focus primarily on relatively simple neural network architectures. While the authors mention the potential application of the method to deep neural networks and hybrid approaches, the article does not provide a thorough discussion of the challenges that may arise when extending the method to such contexts. Deep neural networks, characterized by their multilayered structure and large number of parameters, introduce significant issues related to the stability of the evolutionary process, the efficiency of training, and computational cost. Furthermore, the dynamic interaction between genetic algorithms and other optimization techniques may require specialized adaptations to ensure both effectiveness and scalability when applied to more complex or hybrid environments. Including a brief discussion of these aspects would enhance the completeness of the article, offering a more nuanced perspective on the strengths and limitations of the proposed approach and helping to guide future research efforts towards optimizing the method for use in truly deep and hybrid network architectures.

Author Contributions

V.C. and I.G.T. conducted the experiments, employing several datasets and provided the comparative experiments. V.C. performed the statistical analysis and prepared the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research has been financed by the European Union: Next Generation EU through the Program Greece 2.0 National Recovery and Resilience Plan, under the call RESEARCH-CREATE-INNOVATE, project name “iCREW: Intelligent small craft simulator for advanced crew training using Virtual Reality techniques” (project code: TAEDK-06195).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-art in artificial neural network applications: A survey. Heliyon 2018, 4, e00938. [Google Scholar] [CrossRef]
Suryadevara, S.; Yanamala, A.K.Y. A Comprehensive Overview of Artificial Neural Networks: Evolution, Architectures, and Applications. Rev. Intel. Artif. Med. 2021, 12, 51–76. [Google Scholar]
Tsoulos, I.G.; Gavrilis, D.; Glavas, E. Neural network construction and training using grammatical evolution. Neurocomputing 2008, 72, 269–277. [Google Scholar] [CrossRef]
Guarnieri, S.; Piazza, F.; Uncini, A. Multilayer feedforward networks with adaptive spline activation function. IEEE Trans. Neural Netw. 1999, 10, 672–683. [Google Scholar] [CrossRef] [PubMed]
Ertuğrul, Ö.F. A novel type of activation function in artificial neural networks: Trained activation function. Neural Netw. 2018, 99, 148–157. [Google Scholar] [CrossRef] [PubMed]
Rasamoelina, A.D.; Adjailia, F.; Sinčák, P. A Review of Activation Function for Artificial Neural Network. In Proceedings of the 2020 IEEE 18th World Symposium on Applied Machine Intelligence and Informatics (SAMI), Herlany, Slovakia, 23–25 January 2020; pp. 281–286. [Google Scholar]
Egmont-Petersen, M.; de Ridder, D.; Handels, H. Image processing with neural networks—A review. Pattern Recognit. 2002, 35, 2279–2301. [Google Scholar] [CrossRef]
Zhang, G.P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 2003, 50, 159–175. [Google Scholar] [CrossRef]
Huang, Z.; Chen, H.; Hsu, C.-J.; Chen, W.-H.; Wu, S. Credit rating analysis with support vector machines and neural networks: A market comparative study. Decis. Support Syst. 2004, 37, 543–558. [Google Scholar] [CrossRef]
Baldi, P.; Cranmer, K.; Faucett, T.; Sadowski, P.; Whiteson, D. Parameterized neural networks for high-energy physics. Eur. Phys. J. C 2016, 76, 235. [Google Scholar] [CrossRef]
Baskin, I.I.; Winkler, D.; Tetko, I.V. A renaissance of neural networks in drug discovery. Expert Opin. Drug Discov. 2016, 11, 785–795. [Google Scholar] [CrossRef]
Bartzatt, R. Prediction of Novel Anti-Ebola Virus Compounds Utilizing Artificial Neural Network (ANN). Chem. Fac. 2018, 49, 16–34. [Google Scholar]
Peta, K.; Żurek, J. Prediction of air leakage in heat exchangers for automotive applications using artificial neural networks. In Proceedings of the 2018 9th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 8–10 November 2018; pp. 721–725. [Google Scholar]
Vora, K.; Yagnik, S. A survey on backpropagation algorithms for feedforward neural networks. Int. J. Eng. Dev. Res. 2014, 1, 193–197. [Google Scholar]
Pajchrowski, T.; Zawirski, K.; Nowopolski, K. Neural speed controller trained online by means of modified RPROP algorithm. IEEE Trans. Ind. Inform. 2014, 11, 560–568. [Google Scholar] [CrossRef]
Hermanto, R.P.S.; Nugroho, A. Waiting-time estimation in bank customer queues using RPROP neural networks. Procedia Comput. Sci. 2018, 135, 35–42. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J.L. ADAM: A method for stochastic optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
Reynolds, J.; Rezgui, Y.; Kwan, A.; Piriou, S. A zone-level, building energy optimisation combining an artificial neural network, a genetic algorithm, and model predictive control. Energy 2018, 151, 729–739. [Google Scholar] [CrossRef]
Das, G.; Pattnaik, P.K.; Padhy, S.K. Artificial neural network trained by particle swarm optimization for non-linear channel equalization. Expert Syst. Appl. 2014, 41, 3491–3496. [Google Scholar] [CrossRef]
Sexton, R.S.; Dorsey, R.E.; Johnson, J.D. Beyond backpropagation: Using simulated annealing for training neural networks. J. Organ. End User Comput. 1999, 11, 3–10. [Google Scholar] [CrossRef]
Wang, L.; Zeng, Y.; Chen, T. Back propagation neural network with adaptive differential evolution algorithm for time series forecasting. Expert Syst. Appl. 2015, 42, 855–863. [Google Scholar] [CrossRef]
Karaboga, D.; Akay, B. Artificial bee colony (ABC) algorithm on training artificial neural networks. In Proceedings of the 2007 IEEE 15th Signal Processing and Communications Applications, Eskisehir, Turkey, 11–13 June 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 1–4. [Google Scholar]
Sexton, R.S.; Alidaee, B.; Dorsey, R.E.; Johnson, J.D. Global optimization for artificial neural networks: A tabu search application. Eur. J. Oper. Res. 1998, 106, 570–584. [Google Scholar] [CrossRef]
Zhang, J.-R.; Zhang, J.; Lok, T.-M.; Lyu, M.R. A hybrid particle swarm optimization—Back-propagation algorithm for feedforward neural network training. Appl. Math. Comput. 2007, 185, 1026–1037. [Google Scholar] [CrossRef]
Zhao, G.; Wang, T.; Jin, Y.; Lang, C.; Li, Y.; Ling, H. The Cascaded Forward algorithm for neural network training. Pattern Recognit. 2025, 161, 111292. [Google Scholar] [CrossRef]
Oh, K.-S.; Jung, K. GPU implementation of neural networks. Pattern Recognit. 2004, 37, 1311–1314. [Google Scholar] [CrossRef]
Zhang, M.; Hibi, K.; Inoue, J. GPU-accelerated artificial neural network potential for molecular dynamics simulation. Comput. Commun. 2023, 285, 108655. [Google Scholar] [CrossRef]
Nowlan, S.J.; Hinton, G.E. Simplifying neural networks by soft weight sharing. Neural Comput. 1992, 4, 473–493. [Google Scholar] [CrossRef]
Hanson, S.J.; Pratt, L.Y. Comparing biases for minimal network construction with back propagation. In Advances in Neural Information Processing Systems; Touretzky, D.S., Ed.; Morgan Kaufmann: San Mateo, CA, USA, 1989; Volume 1, pp. 177–185. [Google Scholar]
Augasta, M.; Kathirvalavakumar, T. Pruning algorithms of neural networks—A comparative study. Cent. Eur. Comput. Sci. 2003, 3, 105–115. [Google Scholar] [CrossRef]
Prechelt, L. Automatic early stopping using cross validation: Quantifying the criteria. Neural Netw. 1998, 11, 761–767. [Google Scholar] [CrossRef]
Wu, X.; Liu, J. A New Early Stopping Algorithm for Improving Neural Network Generalization. In Proceedings of the 2009 Second International Conference on Intelligent Computation Technology and Automation, Changsha, China, 10–11 October 2009; pp. 15–18. [Google Scholar]
Treadgold, N.K.; Gedeon, T.D. Simulated annealing and weight decay in adaptive learning: The SARPROP algorithm. IEEE Trans. Neural Netw. 1998, 9, 662–668. [Google Scholar] [CrossRef]
Carvalho, M.; Ludermir, T.B. Particle Swarm Optimization of Feed-Forward Neural Networks with Weight Decay. In Proceedings of the 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS’06), Rio de Janeiro, Brazil, 13–15 December 2006; p. 5. [Google Scholar]
Arifovic, J.; Gençay, R. Using genetic algorithms to select architecture of a feedforward artificial neural network. Phys. A Stat. Mech. Appl. 2001, 289, 574–594. [Google Scholar] [CrossRef]
Benardos, P.G.; Vosniakos, G.C. Optimizing feedforward artificial neural network architecture. Eng. Appl. Artif. Intell. 2007, 20, 365–382. [Google Scholar] [CrossRef]
Garro, B.A.; Vázquez, R.A. Designing Artificial Neural Networks Using Particle Swarm Optimization Algorithms. Comput. Neurosci. 2015, 2015, 369298. [Google Scholar] [CrossRef]
Siebel, N.T.; Sommer, G. Evolutionary reinforcement learning of artificial neural networks. Int. Hybrid Intell. Syst. 2007, 4, 171–183. [Google Scholar] [CrossRef]
Jaafra, Y.; Laurent, J.L.; Deruyver, A.; Naceur, M.S. Reinforcement learning for neural architecture search: A review. Image Vis. Comput. 2019, 89, 57–66. [Google Scholar] [CrossRef]
Pham, H.; Guan, M.; Zoph, B.; Le, Q.; Dean, J. Efficient neural architecture search via parameters sharing. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4095–4104. [Google Scholar]
Xie, S.; Zheng, H.; Liu, C.; Lin, L. SNAS: Stochastic neural architecture search. arXiv 2018, arXiv:1812.09926. [Google Scholar]
Zhou, H.; Yang, M.; Wang, J.; Pan, W. Bayesnas: A bayesian approach for neural architecture search. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 7603–7613. [Google Scholar]
Huqqani, A.A.; Schikuta, E.; Ye, S.; Chen, P. Multicore and GPU Parallelization of Neural Networks for Face Recognition. Procedia Comput. Sci. 2013, 18, 349–358. [Google Scholar] [CrossRef]
Kaelo, P.; Ali, M.M. Integrated crossover rules in real coded genetic algorithms. Eur. J. Oper. Res. 2007, 176, 60–76. [Google Scholar] [CrossRef]
Powell, M.J.D. A Tolerant Algorithm for Linearly Constrained Optimization Calculations. Math. Program. 1989, 45, 547–566. [Google Scholar] [CrossRef]
Kelly, M.; Longjohn, R.; Nottingham, K. The UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu (accessed on 19 August 2025).
Alcalá-Fdez, J.; Fernandez, A.; Luengo, J.; Derrac, J.; García, S.; Sánchez, L.; Herrera, F. KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework. J.-Mult. Log. Soft Comput. 2011, 17, 255–287. [Google Scholar]
Weiss, S.M.; Kulikowski, C.A. Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert Systems; Morgan Kaufmann Publishers Inc.: Burlington, MA, USA, 1991. [Google Scholar]
Tzimourta, K.D.; Tsoulos, I.; Bilero, I.T.; Tzallas, A.T.; Tsipouras, M.G.; Giannakeas, N. Direct Assessment of Alcohol Consumption in Mental State Using Brain Computer Interfaces and Grammatical Evolution. Inventions 2018, 3, 51. [Google Scholar] [CrossRef]
Quinlan, J.R. Simplifying Decision Trees. Int. J. Man-Mach. Stud. 1987, 27, 221–234. [Google Scholar] [CrossRef]
Shultz, T.; Mareschal, D.; Schmidt, W. Modeling Cognitive Development on Balance Scale Phenomena. Mach. Learn. 1994, 16, 59–88. [Google Scholar] [CrossRef]
Zhou, Z.H.; Jiang, Y. NeC4.5: Neural ensemble based C4.5. IEEE Trans. Knowl. Data Eng. 2004, 16, 770–773. [Google Scholar] [CrossRef]
Setiono, R.; Leow, W.K. FERNN: An Algorithm for Fast Extraction of Rules from Neural Networks. Appl. Intell. 2000, 12, 15–25. [Google Scholar] [CrossRef]
Demiroz, G.; Govenir, H.A.; Ilter, N. Learning Differential Diagnosis of Eryhemato-Squamous Diseases using Voting Feature Intervals. Artif. Intell. Med. 1998, 13, 147–165. [Google Scholar]
Hayes-Roth, B.; Hayes-Roth, B.F. Concept learning and the recognition and classification of exemplars. J. Verbal Learn. Verbal Behav. 1977, 16, 321–338. [Google Scholar] [CrossRef]
Kononenko, I.; Šimec, E.; Robnik-Šikonja, M. Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF. Appl. Intell. 1997, 7, 39–55. [Google Scholar] [CrossRef]
French, R.M.; Chater, N. Using noise to compute error surfaces in connectionist networks: A novel means of reducing catastrophic forgetting. Neural Comput. 2002, 14, 1755–1769. [Google Scholar] [CrossRef] [PubMed]
Dy, J.G.; Brodley, C.E. Feature Selection for Unsupervised Learning. J. Mach. Learn. Res. 2004, 5, 845–889. [Google Scholar]
Perantonis, S.J.; Virvilis, V. Input Feature Extraction for Multilayered Perceptrons Using Supervised Principal Component Analysis. Neural Process. Lett. 1999, 10, 243–252. [Google Scholar] [CrossRef]
Garcke, J.; Griebel, M. Classification with sparse grids using simplicial basis functions. Intell. Data Anal. 2002, 6, 483–502. [Google Scholar] [CrossRef]
Mcdermott, J.; Forsyth, R.S. Diagnosing a disorder in a classification benchmark. Pattern Recognit. Lett. 2016, 73, 41–43. [Google Scholar] [CrossRef]
Cestnik, G.; Konenenko, I.; Bratko, I. Assistant-86: A Knowledge-Elicitation Tool for Sophisticated Users. In Progress in Machine Learning; Bratko, I., Lavrac, N., Eds.; Sigma Press: Wilmslow, UK, 1987; pp. 31–45. [Google Scholar]
Elter, M.; Schulz-Wendtland, R.; Wittenberg, T. The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. Med. Phys. 2007, 34, 4164–4172. [Google Scholar] [CrossRef] [PubMed]
Little, M.A.; Mcsharry, P.E.; Roberts, S.J.; Costello, D.; Moroz, I. Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection. BioMed Eng. OnLine 2007, 6, 23. [Google Scholar] [CrossRef] [PubMed]
Little, M.A.; McSharry, P.E.; Hunter, E.J.; Spielman, J.; Ramig, L.O. Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. IEEE Trans. Biomed. Eng. 2009, 56, 1015–1022. [Google Scholar] [CrossRef] [PubMed]
Smith, J.W.; Everhart, J.E.; Dickson, W.C.; Knowler, W.C.; Johannes, R.S. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care, Washington, DC, USA, 6–9 November 1988; IEEE Computer Society Press: Piscataway, NJ, USA, 1988; pp. 261–265. [Google Scholar]
Lucas, D.D.; Klein, R.; Tannahill, J.; Ivanova, D.; Brandon, S.; Domyancic, D.; Zhang, Y. Failure analysis of parameter-induced simulation crashes in climate models. Geosci. Dev. 2013, 6, 1157–1171. [Google Scholar] [CrossRef]
Giannakeas, N.; Tsipouras, M.G.; Tzallas, A.T.; Kyriakidi, K.; Tsianou, Z.E.; Manousou, P.; Hall, A.; Karvounis, E.C.; Tsianos, V.; Tsianos, E. A clustering based method for collagen proportional area extraction in liver biopsy images. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, Milano, Italy, 25–29 August 2015; art. no. 7319047. pp. 3097–3100. [Google Scholar]
Hastie, T.; Tibshirani, R. Non-parametric logistic and proportional odds regression. JRSS-C (Appl. Stat.) 1987, 36, 260–276. [Google Scholar] [CrossRef]
Dash, M.; Liu, H.; Scheuermann, P.; Tan, K.L. Fast hierarchical clustering and its validation. Data Knowl. Eng. 2003, 44, 109–138. [Google Scholar] [CrossRef]
Gorman, R.P.; Sejnowski, T.J. Analysis of Hidden Units in a Layered Network Trained to Classify Sonar Targets. Neural Netw. 1988, 1, 75–89. [Google Scholar] [CrossRef]
Cortez, P.; Silva, A.M.G. Using data mining to predict secondary school student performance. In Proceedings of the 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008), Porto, Portugal, 9–11 April 2008; pp. 5–12. [Google Scholar]
Yeh, I.C.; Yang, K.J.; Ting, T.M. Knowledge discovery on RFM model using Bernoulli sequence. Expert. Appl. 2009, 36, 5866–5871. [Google Scholar] [CrossRef]
Jeyasingh, S.; Veluchamy, M. Modified bat algorithm for feature selection with the Wisconsin diagnosis breast cancer (WDBC) dataset. Asian Pac. J. Cancer Prev. APJCP 2017, 18, 1257. [Google Scholar]
Alshayeji, M.H.; Ellethy, H.; Gupta, R. Computer-aided detection of breast cancer on the Wisconsin dataset: An artificial neural networks approach. Biomed. Signal Process. Control 2022, 71, 103141. [Google Scholar] [CrossRef]
Raymer, M.; Doom, T.E.; Kuhn, L.A.; Punch, W.F. Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm. IEEE Trans. Syst. Cybernetics. Part B Cybern. 2003, 33, 802–813. [Google Scholar] [CrossRef] [PubMed]
Zhong, P.; Fukushima, M. Regularized nonsmooth Newton method for multi-class support vector machines. Optim. Methodsand Softw. 2007, 22, 225–236. [Google Scholar] [CrossRef]
Andrzejak, R.G.; Lehnertz, K.; Mormann, F.; Rieke, C.; David, P.; Elger, C.E. Indications of nonlinear deterministic and finite-dimensional structures in time series of brain electrical activity: Dependence on recording region and brain state. Phys. E 2007, 64, 061907. [Google Scholar] [CrossRef] [PubMed]
Tzallas, A.T.; Tsipouras, M.G.; Fotiadis, D.I. Automatic Seizure Detection Based on Time-Frequency Analysis and Artificial Neural Networks. Comput. Intell. Neurosci. 2007, 2007, 80510. [Google Scholar] [CrossRef]
Koivisto, M.; Sood, K. Exact Bayesian Structure Discovery in Bayesian Networks. J. Mach. Learn. Res. 2004, 5, 549–573. [Google Scholar]
Nash, W.J.; Sellers, T.L.; Talbot, S.R.; Cawthor, A.J.; Ford, W.B. The Population Biology of Abalone (Haliotis Species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait, Sea Fisheries Division; Technical Report No. 48; Department of Primary Industry and Fisheries, Tasmania: Hobart, Australia, 1994; ISSN 1034-3288.
Brooks, T.F.; Pope, D.S.; Marcolini, A.M. Airfoil Self-Noise and Prediction. Technical Report, NASA RP-1218. July 1989. Available online: https://ntrs.nasa.gov/citations/19890016302 (accessed on 14 November 2024).
Yeh, I.C. Modeling of strength of high performance concrete using artificial neural networks. Cem. Concr. Res. 1998, 28, 1797–1808. [Google Scholar] [CrossRef]
Friedman, J. Multivariate Adaptative Regression Splines. Ann. Stat. 1991, 19, 1–141. [Google Scholar]
Harrison, D.; Rubinfeld, D.L. Hedonic prices and the demand for clean ai. J. Environ. Econ. Manag. 1978, 5, 81–102. [Google Scholar] [CrossRef]
Mackowiak, P.A.; Wasserman, S.S.; Levine, M.M. A critical appraisal of 98.6 degrees f, the upper limit of the normal body temperature, and other legacies of Carl Reinhold August Wunderlich. J. Am. Med. Assoc. 1992, 268, 1578–1580. [Google Scholar] [CrossRef]
King, R.D.; Muggleton, S.; Lewis, R.A.; Sternberg, M.J.E. Drug design by machine learning: The use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. Proc. Nat. Acad. Sci. USA 1992, 89, 11322–11326. [Google Scholar] [CrossRef]
Tsoulos, I.G.; Charilogis, V.; Kyrou, G.; Stavrou, V.N.; Tzallas, A. OPTIMUS: A Multidimensional Global Optimization Package. J. Open Source Softw. 2025, 10, 7584. [Google Scholar] [CrossRef]
Stanley, K.O.; Miikkulainen, R. Evolving Neural Networks through Augmenting Topologies. Evol. Comput. 2002, 10, 99–127. [Google Scholar] [CrossRef]
Park, J.; Sandberg, I.W. Universal Approximation Using Radial-Basis-Function Networks. Neural Comput. 1991, 3, 246–257. [Google Scholar] [CrossRef]
Montazer, G.A.; Giveki, D.; Karami, M.; Rastegar, H. Radial basis function neural networks: A review. Comput. Rev. J. 2018, 1, 52–74. [Google Scholar]
MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Berkeley Symposium on Mathematical Statistics & Probability; University of California Press: Oakland, CA, USA, 1967; Volume 1, pp. 281–297. [Google Scholar]
Emad-Ud-Din, M.; Wang, Y. Promoting occupancy detection accuracy using on-device lifelong learning. IEEE Sens. J. 2023, 23, 9595–9606. [Google Scholar] [CrossRef]

Figure 1. An example plot of the sigmoid function in the range

[- 10, 10] .

The function tends quickly to 0 as

x \to - \infty

and moves quickly to 1 as

x \to \infty

.

Figure 1. An example plot of the sigmoid function in the range

[- 10, 10] .

The function tends quickly to 0 as

x \to - \infty

and moves quickly to 1 as

x \to \infty

.

Figure 2. An example of the one-point crossover procedure.

Figure 3. Comprehensive statistical evaluation of the performance of different machine learning algorithms on benchmark classification datasets.

Figure 4. Example of the execution progress of the genetic algorithm and the proposed method for the classification problem of Dermatology. The figure outlines the classification error as calculated on the test set.

Figure 5. Detailed statistical assessment of the experimental performance of machine learning algorithms on a range of regression datasets.

Figure 6. Average execution time and comparison between the original genetic algorithm and the proposed method.

Figure 7. Experiments using different values of the initialization factor

I_{0}

and the regression datasets used.

Figure 7. Experiments using different values of the initialization factor

I_{0}

and the regression datasets used.

Figure 8. The average regression error obtained by the usage of the proposed method on the regression datasets for a variety of values of the scale factor f.

Figure 9. Results obtained for the PIRvision dataset, using a variety of methods and the proposed method.

Table 1. The values for the parameters of the proposed method.

Parameter	Meaning	Value
$N_{c}$	Chromosomes	500
$N_{g}$	Maximum number of generations	200
$p_{S}$	Selection rate	0.1
$p_{M}$	Mutation rate	0.05
H	Number of nodes	10
$I_{0}$	Initialization factor	10.0
a	Bounding factor	10.0
f	Scale factor for the margins	2.0
$λ$	Value used for penalties	100.0

Table 2. Experimental results using the incorporated machine learning methods on the classification datasets. The numbers in cells represent average classification error for the test set.

DATASET	BFGS	ADAM	NEAT	RBF	GENETIC	PROPOSED
Alcohol	41.50%	57.78%	66.80%	49.38%	39.57%	26.24%
Appendicitis	18.00%	16.50%	17.20%	12.23%	18.10%	14.90%
Australian	38.13%	35.65%	31.98%	34.89%	32.21%	31.64%
Balance	8.64%	7.87%	23.14%	33.42%	8.97%	7.80%
Cleveland	77.55%	67.55%	53.44%	67.10%	51.60%	47.51%
Circular	6.08%	19.95%	35.18%	5.98%	5.99%	5.42%
Dermatology	52.92%	26.14%	32.43%	62.34%	30.58%	5.97%
Hayes Roth	37.33%	59.70%	50.15%	64.36%	56.18%	39.28%
Heart	39.44%	38.53%	39.27%	31.20%	28.34%	16.85%
HeartAttack	46.67%	45.55%	32.34%	29.00%	29.03%	23.77%
HouseVotes	7.13%	7.48%	10.89%	6.13%	6.62%	3.05%
Ionosphere	15.29%	16.64%	19.67%	16.22%	15.14%	8.75%
Liverdisorder	42.59%	41.53%	30.67%	30.84%	31.11%	29.53%
Lymography	35.43%	29.26%	33.70%	25.50%	28.42%	17.17%
Mammographic	17.24%	46.25%	22.85%	21.38%	19.88%	16.45%
Parkinsons	27.58%	24.06%	18.56%	17.41%	18.05%	17.46%
Pima	35.59%	34.85%	34.51%	25.78%	32.19%	27.25%
Popfailures	5.24%	5.18%	7.05%	7.04%	5.94%	4.66%
Regions2	36.28%	29.85%	33.23%	38.29%	29.39%	25.88%
Saheart	37.48%	34.04%	34.51%	32.19%	34.86%	31.59%
Segment	68.97%	49.75%	66.72%	59.68%	57.72%	42.43%
Sonar	25.85%	30.33%	34.10%	27.90%	22.40%	19.30%
Spiral	47.99%	48.90%	50.22%	44.87%	48.66%	44.67%
Statheart	39.65%	44.04%	44.36%	31.36%	27.25%	18.90%
Student	7.14%	5.13%	10.20%	5.49%	5.61%	4.33%
Transfusion	25.84%	25.68%	24.87%	26.41%	24.87%	23.60%
Wdbc	29.91%	35.35%	12.88%	7.27%	8.56%	8.69%
Wine	59.71%	29.40%	25.43%	31.41%	19.20%	7.27%
Z_F_S	39.37%	47.81%	38.41%	13.16%	10.73%	5.33%
Z_O_N_F_S	65.67%	78.79%	77.08%	48.70%	64.81%	53.15%
ZO_NF_S	43.04%	47.43%	43.75%	9.02%	21.54%	5.82%
ZONF_S	15.62%	11.99%	5.44%	4.03%	4.36%	2.35%
ZOO	10.70%	14.13%	20.27%	21.93%	9.50%	6.07%
AVERAGE	33.50%	33.73%	32.77%	28.54%	25.68%	19.49%

Table 3. Experimental results using the incorporated machine learning methods on the regression datasets. The numbers in cells represent average regression on the test set.

DATASET	BFGS	ADAM	NEAT	RBF	GENETIC	PROPOSED
Abalone	5.69	4.30	9.88	7.37	7.17	4.42
Airfoil	0.003	0.005	0.067	0.27	0.003	0.003
Auto	60.97	70.84	56.06	17.87	12.18	12.10
Baseball	119.63	77.90	100.39	93.02	103.60	79.30
BK	0.28	0.03	0.15	0.02	0.03	0.017
BL	2.55	0.28	0.05	0.013	5.74	0.001
Concrete	0.066	0.078	0.081	0.011	0.0099	0.004
Dee	2.36	0.630	1.512	0.17	1.013	0.21
Housing	97.38	80.20	56.49	57.68	43.26	20.74
Friedman	1.26	22.90	19.35	7.23	1.249	3.569
FA	0.426	0.11	0.19	0.015	0.025	0.011
FY	0.22	0.038	0.08	0.041	0.65	0.038
HO	0.62	0.035	0.169	0.03	2.78	0.012
Laser	0.015	0.03	0.084	0.03	0.59	0.004
MB	0.129	0.06	0.061	2.16	0.051	0.048
Mortgage	8.23	9.24	14.11	1.45	2.41	0.85
NT	0.129	0.12	0.33	8.14	0.006	0.006
PL	0.29	0.117	0.098	2.12	0.28	0.022
Plastic	20.32	11.71	20.77	8.62	2.79	2.20
PY	0.578	0.09	0.075	0.012	0.564	0.016
Quake	0.42	0.06	0.298	0.07	0.12	0.037
SN	0.40	0.026	0.174	0.027	2.95	0.024
Stock	302.43	180.89	12.23	12.23	3.88	3.25
Treasury	9.91	11.16	15.52	2.02	2.93	1.11
AVERAGE	26.43	19.62	12.84	9.19	8.10	5.33

Table 4. Experimental results using the genetic algorithm, the proposed method without the incorporation of the BFGS local search method and the proposed method with the addition of the BFGS method. The experiments were conducted on the mentioned classification datasets and the numbers in the cells represent average classification error, for the test set.

DATASET	GENETIC	PROPOSED (NO BFGS)	PROPOSED
Alcohol	39.57%	26.32%	26.24%
Appendicitis	18.10%	16.00%	14.90%
Australian	32.21%	28.09%	31.64%
Balance	8.97%	7.81%	7.80%
Cleveland	51.60%	46.24%	47.51%
Circular	5.99%	5.51%	5.42%
Dermatology	30.58%	8.83%	5.97%
Hayes Roth	56.18%	42.38%	39.28%
Heart	28.34%	18.37%	16.85%
HeartAttack	29.03%	19.50%	23.77%
HouseVotes	6.62%	3.48%	3.05%
Ionosphere	15.14%	10.03%	8.75%
Liverdisorder	31.11%	30.94%	29.53%
Lymography	28.42%	20.79%	17.17%
Mammographic	19.88%	16.59%	16.45%
Parkinsons	18.05%	16.21%	17.46%
Pima	32.19%	31.11%	27.25%
Popfailures	5.94%	4.61%	4.66%
Regions2	29.39%	25.10%	25.88%
Saheart	34.86%	31.20%	31.59%
Segment	57.72%	40.87%	42.43%
Sonar	22.40%	25.55%	19.30%
Spiral	48.66%	46.13%	44.67%
Statheart	27.25%	17.59%	18.90%
Student	5.61%	3.88%	4.33%
Transfusion	24.87%	22.99%	23.60%
Wdbc	8.56%	8.43%	8.69%
Wine	19.20%	6.53%	7.27%
Z_F_S	10.73%	6.73%	5.33%
Z_O_N_F_S	64.81%	49.68%	53.15%
ZO_NF_S	21.54%	7.52%	5.82%
ZONF_S	4.36%	2.28%	2.35%
ZOO	9.50%	13.90%	6.07%
AVERAGE	25.68%	20.04%	19.49%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tsoulos, I.G.; Charilogis, V. Gen2Gen: Efficiently Training Artificial Neural Networks Using a Series of Genetic Algorithms. Knowledge 2025, 5, 17. https://doi.org/10.3390/knowledge5030017

AMA Style

Tsoulos IG, Charilogis V. Gen2Gen: Efficiently Training Artificial Neural Networks Using a Series of Genetic Algorithms. Knowledge. 2025; 5(3):17. https://doi.org/10.3390/knowledge5030017

Chicago/Turabian Style

Tsoulos, Ioannis G., and Vasileios Charilogis. 2025. "Gen2Gen: Efficiently Training Artificial Neural Networks Using a Series of Genetic Algorithms" Knowledge 5, no. 3: 17. https://doi.org/10.3390/knowledge5030017

APA Style

Tsoulos, I. G., & Charilogis, V. (2025). Gen2Gen: Efficiently Training Artificial Neural Networks Using a Series of Genetic Algorithms. Knowledge, 5(3), 17. https://doi.org/10.3390/knowledge5030017

Article Menu

Gen2Gen: Efficiently Training Artificial Neural Networks Using a Series of Genetic Algorithms

Abstract

1. Introduction

2. Method Description

2.1. The Algorithm of the First Stage

2.2. The Algorithm of the Second Stage

2.3. The Final Training Algorithm

3. Results

3.1. Experimental Datasets

3.2. Experimental Results

3.3. A Practical Example

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI