Next Article in Journal
Study of Applicability of Triangular Impulse Response Function for Ultimate Strength of LNG Cargo Containment Systems under Sloshing Impact Loads
Next Article in Special Issue
Adaptive Dynamic Programming-Based Cross-Scale Control of a Hydraulic-Driven Flexible Robotic Manipulator
Previous Article in Journal
Classification of Bugs in Cloud Computing Applications Using Machine Learning Techniques
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Gradient-Based Particle-Bat Algorithm for Stochastic Configuration Network

1
Department of Basic Courses, Shenyang Institute of Technology, Shenfu Demonstration Area 113122, China
2
Liaoning Key Laboratory of Information Physics Fusion and Intelligent Manufacturing for CNC Machine, Shenyang Institute of Technology, Shenfu Demonstration Area 113122, China
3
Department of Computer Science, University of Bradford, Bradford BD7 1DP, UK
*
Author to whom correspondence should be addressed.
Appl. Sci. 2023, 13(5), 2878; https://doi.org/10.3390/app13052878
Submission received: 28 December 2022 / Revised: 20 February 2023 / Accepted: 21 February 2023 / Published: 23 February 2023

Abstract

:
Stochastic configuration network (SCN) is a mathematical model of incremental generation under a supervision mechanism, which has universal approximation property and advantages in data modeling. However, the efficiency of SCN is affected by some network parameters. An optimized searching algorithm for the input weights and biases is proposed in this paper. An optimization model with constraints is first established based on the convergence theory and inequality supervision mechanism of SCN; Then, a hybrid bat-particle swarm optimization algorithm (G-BAPSO) based on gradient information is proposed under the framework of PSO algorithm, which mainly uses gradient information and local adaptive adjustment mechanism characterized by pulse emission frequency to improve the searching ability. The algorithm optimizes the input weights and biases to improve the convergence rate of the network. Simulation results over some datasets demonstrate the feasibility and validity of the proposed algorithm. The training RMSE of G-BAPSO-SCN increased by 5.57 × 10 5 and 3.2 × 10 3 compared with that of SCN in the two regression experiments, and the recognition accuracy of G-BAPSO-SCN increased by 0.07% on average in the classification experiments.

1. Introduction

Randomized algorithms have the characteristics of fast learning and show great potential in machine learning [1,2,3]. These algorithms generally select the input parameters randomly and then calculate the output parameters based on the least square method. There are many categories of random weight networks. In addition to the typical feedforward neural network, there are also recurrent neural networks with random weights and randomized kernel approximations [4].
At the beginning, most randomized weights networks could not determine the appropriate number of hidden nodes. To solve this problem, the incremental learning algorithm was proposed. The network gradually increases the hidden nodes until the tolerance is reached. This method optimizes the network structure and avoids the waste of hidden nodes. Based on this idea, SLFNN, which has approximation properties, was proposed by Kwok et al [5]. They also adopted a modified Quickprop algorithm to update the network weights. The random Vector Function Link (RVFL) network also used the incremental strategy to build its structure, which probabilistically converges when the input side parameters of the network are appropriately selected [6,7]. Otherwise, the convergence can not be guaranteed [8]. In 2009, Tyukin I proposed that the RVFL network needed a supervision mechanism, and they verified that the RVFL network could not approximate a given objective function through the experiments [9]. Subsequently, this phenomenon had been further proved mathematically [10]. The proposal of SCN solves the problem that the network did not have universal approximation property [8]. SCN used an inequality constraint mechanism to allocate input parameters randomly and expanded the selection range of random parameters adaptively, so as to ensure the approximation of the established stochastic learning network. It was an incremental generative network, and the input parameters were stochastically configurated under the inequality constraint, and the output parameters were determined by constructional method or by solving the least square problem. Subsequently, SCN based on the depth model was also proposed, and depth SCN still has universal approximation properties under the inequality supervision mechanism [11]. At present, SCN has been successfully applied to underground airflow quantity modeling, ball mill load condition recognition, sewage quality index estimation, and nonlinear system intelligent modeling [12,13,14,15].
The universal approximation property of SCN is an advantage that other randomized learning techniques do not have, as the adaptive search algorithm of the weights and biases tries to keep the parameters in a small range as possible. However, the output weights are usually large, so the generalization performance of the network is affected. Overall, the optimization of SCN focuses on the following aspects: first, improving the generalization performance, for instance, L 1 and L 2 norm regularization methods based on SCN are proposed respectively to avoid the risk of the model overfitting [13,15]. The partial least squares method is used to calculate the output parameters, replacing the ordinary least squares method [14]. In addition, the famous negative correlation learning method could evaluate the output side parameters based on the SCN mode. The Block Jacobi and Gauss–Seidel methods are used to solve the output weights of ill-posed equations iteratively based on heterogeneous feature groups, in which the convergence analysis is given and the uniqueness of these iterative solutions is proved [16]. The second aspect is to optimize the generation mode of the network; some scholars have proposed to change the incremental mode of the hidden nodes in the process of SCN construction, and the incremental mode of hidden nodes was improved from single incremental to block incremental [17,18]. The third aspect is to optimize the hyperparameters of the network. In the discussion of the input weights, some scholars indicate that the method based on the inequality supervision mechanism to search weights leads to a certain rejection rate, therefore, the conditions of inequality constraints should be relaxed, so as to improve the acceptance rate of the random value and reduce the number of iterations for the searching of input weights [19]. However, the accuracy of the network will be reduced once the inequality constraints are relaxed, which has a certain cost. Some scholars also focus on regularization parameters and scale factors of weights and bias that affect the network performance, and optimize them using a chaotic sparrow search algorithm, which makes the network achieve better performance [20]. Meanwhile, some scholars also applied various feedback learning mechanisms to optimize SCN [21,22].
There are few studies on how to optimize the input side parameters directly. SCN searches the input side parameters randomly until the parameters could satisfy the inequality constraints. The specific method is to expand the search interval gradually linearly, and randomly generate a large number of weights and biases in the interval to judge whether the inequality constraints are satisfied, then select the best one. The method will affect the efficiency of SCN. Therefore, an efficient optimization algorithm needs to be designed. The gradient descent method is often used to solve optimization problems in practical engineering, that is, to solve for a minimum in the direction of gradient descent or to solve for a maximum in the direction of gradient ascent. However, this method is easy to fall into the local minimum for non-convex optimization problems, so it needs to be combined with an intelligent optimization algorithm to achieve better performance [23,24]. The bat algorithm [25,26] and Particle Swarm Optimization (PSO) algorithm [27] are population-based stochastic optimization technical algorithms, which seek the optimal solution by imitating the group behavior of the biological world. The bat algorithm is a search algorithm to simulate the predation behaviors of bats, which has been successfully applied to microgrid scheduling [28], harmful plants classification [29], text categorization [30], and the categorization salesman problem [31]. These algorithms could search more regions in the solution space of the objective function simultaneously, which is flexible and easy to implement, and requires fewer parameters to be adjusted. In addition, the gradient information could be used to determine the search direction because of the randomness of intelligent optimization algorithms. Therefore, using the hybrid optimization algorithm to search the input parameters of SCN could optimize the network structure and improve the convergence speed.
This paper focuses on improving the search algorithm for the input weights and biases. In order to accelerate the convergence speed, an optimization model was established based on the inequality constraint mechanism by analyzing the convergence principle of SCN. Then, an improved optimization algorithm (G-BAPSO) is proposed, which combines the gradient information of the objective function and searches for the optimal weights and biases using the hybrid PSO and bat algorithm. Finally, an optimized stochastic configuration network based on the G-BAPSO algorithm is established.
The structure of this paper is as follows: The theory of SCN is introduced in Section 2; G-BAPSO-SCN is proposed in Section 3, including the optimization model and algorithm. In Section 4, the G-BAPSO algorithm is verified on some benchmark functions, and the performance of G-BAPSO-SCN and other models is compared in some regression and classification problems. The experimental results are analyzed in Section 5, and Section 6 gives the conclusion of this paper.

2. Preliminaries

2.1. Stochastic Configuration Network

SCN has been a powerful data modeling tool in recent years. The model is incrementally generated by stochastic configuration algorithms, which performs well in classification and regression problems and has the ability of universal approximation. The structure of SCN is shown in Figure 1, where x denotes the input data, s denotes the activation function, and f denotes the output of network.
For the dataset X = [ x 1 , x 2 , . . . , x N ] , x i R d , i = 1 , 2 , . . . , N , and the corresponding output datasets T = [ t 1 , t 2 , . . . , t N ] , t i R m . If L 1 nodes have been configured on the network, the outputs of L-th node in the hidden layer is calculated by Formula (1)
s L = h ( w L T x u + b L ) , ( L = 1 , 2 , . . . ; u = 1 , 2 , . . . , d )
where h stands for the activation function, w L and b L is the weight and bias of the first layer. The weights of the second layer is calculated by the least squares algorithm, as shown in Formula (2).
β L , q = e L 1 , q , s L s L 2
e L 1 , q = t q f q , ( q = 1 , 2 , . . . , m )
where e L 1 , q stands for the q-th error of the network output when there are already L 1 nodes in the hidden layer. f q stands for the q-th output of SCN, which is calculated by Formula (4), and the error of SCN with L-nodes is shown in Formula (5).
f q = i = 1 L β i q s i , ( q = 1 , 2 , . . . m )
e L = [ e L , 1 , . . . , e L , m ] = t f
f = [ f 1 , . . . , f m ]
w L and b L need satisfy the supervision mechanism (7).
q = 1 m e L 1 , q , s L 2 b o u n d s 2 δ L
δ L = ( 1 r μ L ) e L 1 2
where b o u n d s stands for the upper bound of the activation function, μ L is a sequence that goes to zero, and r is a regularization parameter close to 1. Hidden nodes are added in this sequence until the model reaches the tolerance, that is, when the error between the model outputs and target outputs reaches the preset threshold, the structure, weight, and threshold parameters of the SCN are determined, then the learning process stops. It has been proved mathematically that the model has universal approximation capability, namely lim L + e L = 0 . Principles and Formulas (1)–(8) of SCN are from Ref. [8].
The specific search algorithm for the input weights and biases of SCN is as follows.
Step 1. Set the sequence of interval length for searching the parameters,
λ = λ m i n : Δ λ : λ m a x ;
Step 2. Generate a set of random values in the interval [ λ m i n , λ m i n ] , then judge whether it satisfies the inequality constraint or not;
Step 3. If there are parameters satisfying the inequality constraints, the parameters that could maximize ϵ L (Formula (9)) are selected to configure the new hidden node. Otherwise, expand the search interval to [ ( λ m i n + Δ λ ) : ( λ m i n + Δ λ ) ] , and repeat the above steps until the tolerance is reached [8].
ϵ L = q = 1 m ( ( e L 1 , q ( X ) T s L ( X ) ) 2 s L ( X ) T s L ( X ) ( 1 r μ L ) e L 1 , q ( X ) T e L 1 , q ( X ) )
s L ( X ) = [ s L ( w L T x 1 + b L ) , s L ( w L T x 2 + b L ) , . . . , s L ( w L T x N + b L ) ]

2.2. Particle Swarm Optimization Algorithm and Bat Algorithm

PSO is an iterative algorithm based on the simulation of the foraging process of birds. Firstly, the population is randomly initialized, including the position and velocity of the individual. Secondly, we calculate the fitness of each particle to obtain the initial optimal solution of each particle. Thirdly, the velocity and position are updated by Formulas (11) and (12) in each iteration until the solution meets the conditions.
v t + 1 = v t + c 1 r a n d 1 ( p b t p t ) + c 2 r a n d 2 ( g b t p t )
p t + 1 = p t + v t + 1
where v t stands for the velocity of the individual at time t, p t stands for the location of the individual at time t, p b t and g b t stand for the individual extremum and global extremum, respectively, at time t. c 1 and c 2 are learning factors, r a n d 1 and r a n d 2 are random numbers.
The bat algorithm is a heuristic search algorithm to simulate the predation behaviors of bats. The bats constantly adjust the positions to determine the optimal solution according to the unique frequency, wavelength, and loudness. It has simple model and fast convergence speed. However, the optimization accuracy of the algorithm is low. It is based on the following assumptions: Firstly, all bats could sense distance by echolocation, and distinguish targets and obstacles in a special way; Secondly, bats fly randomly at the location p i with the speed v i , search for targets with loudness, frequency, and variable wavelength, and adjust the wavelength (or frequency) and pulse emission rate r p automatically by the distance from the target; Thirdly, the loudness gradually decreases from A 0 to A m i n [32].
The updated formulas of frequency, speed, and position are as follows:
f r e i = f r e m i n + ( f r e m a x f r e m i n ) β
v i t + 1 = v i t + ( p i t p * ) f r e i
p i t + 1 = p i t + v i t + 1
where f r e i stands for the frequency of the i-th bat, its adjustment range is [ f r e m i n , f r e m a x ] , β [ 0 , 1 ] , p * is the current optimal solution.
Generates a random number r a n d 0 1 for the current local search field. If r a n d 0 1 > r p i , update the value of p by Formula (16).
p n e w = p o l d + ϵ A t
where ϵ [ 1 , 1 ] , A t stands for the average loudness of bats at time t.
The loudness drops to a fixed value as the bat approaches its target, then r p continues to increase. Generate a random number r a n d 0 2 , if r a n d 0 2 < A i and f ( p n e w ) > f ( p o l d ) , p n e w is accepted, namely p i t + 1 = p n e w . The loudness and the pulse emission rate are updated under the Formulas (17) and (18).
A i t + 1 = α A i t
r p i t + 1 = r p i 0 [ 1 e δ t ]
where 0 < α < 1 and δ > 0 , A i t and r p i t stand for the loudness and pulse emission rate of the i-th bat at time t, respectively.

3. G-BAPSO-SCN

3.1. Establish the Optimization Model

Based on Equation (19) [8], SCN will converge if the inequality (20) is satisfied.
e L 2 ( r + μ L ) e L 1 2 = ( 1 r μ L ) e L 1 2 ( q = 1 m e L 1 , s L 2 ) / s L 2
ϕ ( w L , b L ) = ( q = 1 m e L 1 , s L 2 ) s L 2 ( 1 r μ L ) e L 1 2 0
According to the convergence principle of SCN [8], taking the difference between the error of the model with L + 1 hidden nodes and the error of the model with L hidden nodes as the objective function is beneficial to directly improve the convergence rate of the model. The larger the ϕ ( w L , b L ) is, the faster the network convergence rate is. Therefore, the optimization configuration of the input parameters is transformed into solving the following optimization problem under the constraint of inequality (20). ϕ ( w L , b L ) is a continuous function, and the convergence speed of the network could be improved when the input weights and biases are configured by the solutions of the optimal problem (21).
m a x : ϕ ( w L , b L ) = ( q = 1 m e L 1 , s L 2 ) s L 2 ( 1 r μ L ) e L 1 2

3.2. Solve the Optimization Problem Based on the G-BAPSO Algorithm

The input parameters are generated randomly in SCN, and the selected range will gradually and linearly expand to search the random parameters that meet the criteria (7). In order to make the search algorithm more flexible and speed up the convergence, this section is devoted to improving the algorithm of the input weights and biases. If the network output is 1-dimensional (same for multi-dimensional output), according to the Cauchy Schwartz inequality e L 1 , s L 2 e L 1 , e L 1 s L , s L , equal is true when e L 1 and s L are linearly dependent, and the function ϕ ( w L , b L ) could be maximized. However, it is obviously impossible to find consistent and linear correlation for different inputs, and the non-convexity of the function is obvious, so it is not feasible to find the theoretical optimal solution by the gradient descent method. Therefore, an intelligent optimization algorithm combined with gradient information should be considered to optimize the objective function.

3.2.1. Calculate the Gradient Information

The gradient direction plays an important role in the search process. Calculate the gradient [ ϕ w L , 1 , ϕ w L , 2 , . . . , ϕ w L , d , ϕ b L ] based on the objective function (21). The results are shown in Formulas (22)–(25), where d stands for the features of the input samples, m stands for the output dimension, N stands for the number of samples, and L stands for the hidden node.
ϕ w L , u = 2 q = 1 m [ ( i = 1 N e L 1 , q i s L i ) ( ( i = 1 N e L 1 , q i s L i w L , u ) i = 1 N ( s L i ) 2 ( i = 1 N s L i s L i w L , u ) ( i = 1 N e L 1 , q i s L i ) ) ] s L 4
s L i w L , u = s L ( 1 s L ) x u , u = 1 , 2 , . . . , d
ϕ b L = 2 q = 1 m [ ( i = 1 N e L 1 , q i s L i ) ( ( i = 1 N e L 1 , q i s L i b L ) i = 1 N ( s L i ) 2 ( i = 1 N s L i s L i b L ) ( i = 1 N e L 1 , q i s L i ) ) ] s L 4
s L i b L = s L ( 1 s L )

3.2.2. G-BAPSO Algorithm

The G-BAPSO algorithm introduces the local search mechanism and adaptive individual elimination strategy of the BAT algorithm under the framework of PSO. When the pulse frequency is low and the current individual fitness is less than the average fitness, Formula (16) is used to update the local individual position. Otherwise, individual velocity is updated by combining gradient information in the process of global search. With the guidance of gradient direction, the search speed is accelerated; the search mechanism of the bat-PSO algorithm could fully explore the solution space and help to jump out of the local minimum.
The combination of the two could improve the search efficiency. The global update formula is shown in Formulas (26)–(28), where grad represents the gradient and c 3 is a learning factor. The specific description of the G-BAPSO algorithm is shown in Algorithm 1, and the flow chart is presented in Figure 2.
v t + 1 = v t + c 1 r a n d 1 ( p b t p t ) + c 2 r a n d 2 ( g b t p t ) + c 3 g r a d
p t + 1 = p t + v t + 1
g r a d = [ ϕ w L , 1 , ϕ w L , 2 , . . . , ϕ w L , d , ϕ b L ]
where grad represents the gradient and c 3 is the learning factor.
Algorithm 1: The G-BAPSO algorithm
1:
Parameters initialization: c 1 , c 2 , c 3 , maxgen, popsize, popmin, popmax, initial pulse frequencies r p , pulse frequency enhancement coefficient σ ;
2:
Initialize the population position (w and b) and the velocity v;
3:
Calculate the fitness by (21), record the individual optimal position of the primary particle;
4:
if the optimum fitness satisfies the inequality (20) then
5:
    record the global best position;
6:
else
7:
    Adjust the value of r and continue to judge until inequality (20) is satisfied;
8:
end if
9:
for   i = 1 m a x g e n do
10:
    for  j = 1 s i z e p o p  do
11:
        if  r j < r a n d and f i t n e s s j < m e a n ( f i t n e s s )  then
12:
           Update individual location according to (16);
13:
        else
14:
           Calculate the gradient according to (22)–(25), update individual speed according to (26), and then update individual position according to (27);
15:
        end if
16:
        Updating individual fitness values and pulse frequencies;
17:
    end for
18:
    for  k = 1 s i z e p o p  do
19:
        if  f i t n e s s k > f i t n e s s g b e s t ( i )  then
20:
           Update individual optimal position;
21:
        end if
22:
        if  f i t n e s s k > f i t n e s s z b e s t ( i )  then
23:
           Update optimal location of the group;
24:
        end if
25:
    end for
26:
end for
27:
return zbest
Set L m a x as the maximum number of hidden layer nodes. Under the G-BAPSO algorithm, the optimized configuration algorithm of SCN, called ’G-BAPSO-SC’, is shown in Algorithm 2.
Algorithm 2: The G-BAPSO-SC algorithm
1:
Initialize network parameters e 0 = [ t 1 , t 2 , . . . , t N ] T , 0 < r < 1 ;
2:
while do L L m a x
3:
    Calculate w L and b L by G-BAPSO;
4:
    Calculate W L , q = ( e L 1 , q T h L * ) / ( h L * T h L * ) ,
5:
                  q = 1 , 2 , . . . , m ;
6:
                 W L = [ W L , 1 , W L , 2 , . . . , W L , m ] T ;
7:
    Calculate e L = e L 1 W L h L * ;
8:
    Update e 0 : = e L ; L = L + 1 ;
9:
end while
10:
return W 1 , W 2 , . . . , W L ,
11:
           w * = [ w 1 * , w 2 * , . . . , w L * ] ,
12:
           b * = [ b 1 * , b 2 * , . . . , b L * ] .

4. Numerical Experiments

4.1. Evaluation of the G-BAPSO Algorithm

In order to illustrate the efficiency of G-BAPSO, four benchmark functions (29)–(32) are selected to evaluate its performance, and the minimum value of each function is zero. The G-BAPSO algorithm is used to optimize these functions, and it will be compared with Gradient-based PSO (GPSO), PSO, bat, and FOA algorithm. The error of the optimal solution is used as an evaluation index for comparison.
f 1 = i = 1 3 x i 2 , x i [ 5.12 , 5.12 ]
f 2 = 100 ( x 1 x 2 ) 2 + ( 1 x 1 ) 2 , x i [ 2 , 2 ]
f 3 = 1 4000 ( x 1 2 + x 2 2 ) c o s x 1 c o s x 2 2 + 1 , x i [ 6 , 6 ]
f 4 = i = 1 3 ( j = 1 i x j ) 2 , x i [ 5 , 5 ]
Continuous functions are selected as the benchmark functions in this section because the objective function proposed in Section 2 is multivariate continuous functions. f 3 is multi-peak function and the rest are concave function. The graphs of the four functions are shown in Figure 3, Figure 4, Figure 5 and Figure 6. Figure 3 and Figure 6 show the graphs of binary functions corresponding to f 1 and f 4 , respectively.
Popsize, c 1 , c 2 , c 3 and the number of iterations have an important influence on the G-BAPSO algorithm. In order to determine the optimal value, Refs. [33,34] suggested that c 1 and c 2 should be 2. Refs. [35,36] conducted sensitivity analysis on popsize, c 1 , c 2 and other parameters of acceleration. Based on the benchmark test function f1, the sensitivity analysis of relevant parameters of GBAPSO is shown in Table 1, Table 2 and Table 3, and the parameters used in other experiments also follow this method.
Table 1 shows the comparison results of the minimum values of the function f1 when c 1 , c 2 and popsize are changing. It can be seen that when c 1 , c 2 and c 3 are fixed, the larger the popsize, the smaller the error. However, the order of magnitude of the error only changes slightly as the population size increases to 75 or 100, and the larger the popsize, the longer the calculation time consumed by the algorithm. Therefore, it is necessary to select an appropriate value that can balance the calculation time and error. Fifty is selected as the population number in solving the minimum value of f 1 . When the popsize and c 3 are fixed, the smaller the c 1 and c 2 are, the smaller the error will be. When c 1 and c 2 drop to 0.3, the magnitude of the error has reached 10 23 , which is within the acceptable range. Table 2 shows the sensitivity analysis for c 3 . As c 3 increases from 0.2 to 0.8, the error tends to increase first and then decrease. This is because the BAPSO algorithm plays a major role in the algorithm when c 3 is very small, and the gradient descent algorithm plays a major role when c 3 becomes large. When the value of c 3 is near 0.4, there is a balance between the two, which also verifies the effectiveness of the hybrid algorithm. Table 3 shows the sensitivity analysis for the iterations. As the iterations increases, the error of the optimal solution becomes smaller and smaller. However, the more iterations, the greater the time cost. It can be seen that when iterations are 20, the error has reached the magnitude of 10 23 , which is within the acceptable range.
Table 4 shows the parameter Settings. The program was written on MATLAB R2019a and run on 8G memory and 1.6 Ghz CPU. Each algorithm was independently run 200 times. The experimental results are shown in Table 5.
It can be seen from the results of four groups of experiments that G-BAPSO performs well. Compared with PSO, bat, and FOA, the optimal solution has a significant advantage in order of magnitude. It is worth noting that the performance of G-PSO is comparable to that of G-BAPSO when gradient information is introduced into PSO, indicating that gradient factors are relatively important for these four functions, especially concave functions with obvious characteristics, and it is easy to find the optimal solution quickly according to the gradient direction search. Therefore, the first two algorithms are more efficient than the last three algorithms. When comparing the first two algorithms, one can observe that G-BAPSO performs well because it adds a local search and adaptive elimination strategy of the bat algorithm, which can improve the convergence speed of the algorithm. Note that the error of G-BAPSO and G-PSO for f 3 is zero, which means the error is very small, approaching zero.

4.2. Performance of SCN Based on the G-BAPSO Algorithm

In order to illustrate the performance of SCN based on the G-BAPSO algorithm, four groups of numerical simulation experiments are used in this section. We include two groups of classification problems and two groups of regression problems: the problem of function regression, synchronous motor excitation current, identification of bank notes, and the iris data set.

4.2.1. Dataset Information

Dataset 1 is generated by function f ( x ) = 0.2 e ( 10 x 4 ) 2 + 0.5 e ( 80 x 40 ) 2 + 0.3 e ( 80 x 20 ) 2 , x [ 0 , 1 ] , 1000 points are randomly collected as the training set, and the test set is 300 equidistant points on the interval [ 0 , 1 ] [9].
Dataset 2 is the current data (SM) [37,38] in the UCI database. The task is to find the nonlinear relationship between synchronous motor excitation current and the input features including power factor, load current, synchronous motor excitation current change, and power factor error. There are 557 groups of samples, where 400 samples are used as the training set and 157 samples are used as the test set.
Dataset 3 is the Banknote authentication dataset in the UCI database, which is extracted from images of samples of real and counterfeit bills. The task is to verify the authenticity of bills (classification). The data contains 4 input features (the variance, the skewness, the curtosis, and the entropy of the image), and the output corresponding to each group of input features is the banknote category (true or false). The dataset has 1372 samples, where 1038 samples are used for training and 334 samples are used for testing.
Dataset 4 is the Iris dataset in the UCI database. There are three categories of Iris, and the task is to identify the iris category based on four input attributes including the length and width of calyx, and the length and width of the petal. The dataset contains 150 groups of samples, of which each category contains 50 groups of samples. where 120 groups of samples are used for training and 30 groups of samples are used for testing.
Table 6 shows the relevant information of the four datasets.

4.2.2. Evaluate Metrics and Parameter Settings

Select the sigmoid function as the activation function, as shown in Formula (33). For regression problems, Root mean square error (RMSE) is used as the evaluation index, as shown in Formula (34). The smaller RMSE, the better the network performance. For classification problems, the accuracy rate (ACC) is taken as an evaluation index, as shown in Formula (35). The closer ACC is to 1, the better the network performance is.
S i g m o i d ( x ) = 1 / ( 1 + e x p ( x ) )
R M S E = 1 N k = 1 N ( y k o k ) 2
A C C = A c N u m i / T o t a l N u m i
where y k stands for the ideal output of the k-th sample, o k stands for the network output of the k-th sample. A c N u m i stands for the number of correctly identified samples of class i, and T o t a l N u m i stands for the total sample number of class i.
Based on the above four datasets, G-BAPSO-SCN is compared with SCN [8], RVFL network [7] and ELM [39]. The parameters are shown in Table 7.

4.2.3. The Experimental Results

Due to the influence of random parameters, each group of experiments was run independently for 50 times, and the average results were finally used for the comparative analysis. The convergence curves of the average results of G-BAPSO-SCN and SCN in 50 experiments are drawn in Figure 7, Figure 8, Figure 9 and Figure 10, respectively, and the average running results are shown in Table 8, Table 9, Table 10 and Table 11.

5. Discussion

Table 8, Table 9, Table 10 and Table 11 show that G-BAPSO-SCN converges faster than other models when the number of hidden nodes is the same, and the RVFL network and ELM have relatively poor performance.In regression experiment 1, the training RMSE of G-BAPSO-SCN increased by 5.57 × 10 5 compared with that of SCN; in regression experiment 2, the training RMSE of G-BAPSO-SCN increased by 0.0032, indicating that the improvement degree of training RMSE not only depends on the algorithm itself, but is also closely related to the distribution characteristics of the data. In classification problems, G-BAPSO-SCN also has an advantage in recognition accuracy, with an average improvement of about 0.07% over SCN. At the same time, it could also be seen from the comparison of convergence curves in Figure 7, Figure 8, Figure 9 and Figure 10 that G-BAPSO-SCN has faster convergence than SCN. Since RVFL and ELM are not incremental generation networks, and their hidden nodes are generated once, the convergence speed between them is not compared with the former two curves.
G-BAPSO-SCN performs well because it introduces a more flexible mechanism in the search of input parameters, allowing it to explore the solution space more thoroughly. The algorithm combines the advantages of the intelligent optimization method and traditional optimization method, and the introduction of gradient makes it possible to approach the optimal solution quickly. By introducing the emission frequency mechanism of bats into the PSO framework, local search could be carried out according to individual fitness and emission frequency at the initial stage, and the poor individuals could be eliminated adaptively, thus accelerating the convergence rate. SCN uses the linear expansion searching method to select the input parameters, which cannot detect the optimal solution with a long distance in the initial stage. The initial configuration is not optimized, thus increasing the searching iterations and the number of nodes to a certain extent. RVFLN is a typical incremental network, so we also compare it with G-BAPSO-SCN. However, RVFLN could not guarantee the convergence, and the accuracy of the network is slightly inferior. ELM’s performance is relatively poor under the same number of hidden nodes as G-BAPSO-SCN. In addition to the optimization of input weights and biases, CSSA-SCN [20] optimizes their search intervals and the regularization factor r of the model, which is the optimization of hyperparameters, and its objective function and optimized parameters are different from those in this paper. Most of the other methods used to optimize SCNS focus on the the output parameters or the generation of model structures.
In conclusion, G-BAPSO-SCN, first proposed in this paper, has a faster convergence rate and a more optimized structure, which can improve the training efficiency of the model and have better performance in the application of regression or prediction.

6. Limitations and Future Works

Table 12 presents the comparison results of the running time of four models based on Dataset 1. It can be seen that G-BAPSO-SCN takes a relatively long time, which is determined by two factors: On the one hand, both G-BAPSO-SCN and SCN are incremental generation models, and the model structure needs to be determined adaptively in the training process, so both of them take a longer time than RVFL and ELM. On the other hand, G-BAPSO-SCN has the search process of the optimal solution based on the hybrid algorithm, and the gradient calculation process obviously increases the time cost. Therefore, in terms of time cost, G-BAPSO-SCN cannot save the training time, which is one of its limitations. However, G-BAPSO-SCN increases the rate of convergence of the model, which is very meaningful for practical problems. Meanwhile, the G-BAPSO algorithm uses some empirical values to initialize the value range (the domain of the objective function) of the input weights and biases, which should be studied and optimized in the future. In addition, other intelligent optimization algorithms can also be used to optimize the input parameters of SCN. The genetic algorithm is also an effective algorithm with fast convergence rate [40], which can be used in the subsequent research.

7. Conclusions

The proposed algorithm G-BAPSO provides an alternative solution for improving the convergence speed of SCN. The G-BAPSO algorithm focuses on the search for input weights and biases, considering the gradient information, the global optimal solution, and the individual optimal solution. In the process of searching for the optimal solution. the algorithm adaptively adjusts the local solutions according to the individual emission frequency and fitness in the initial stage. Thus, the convergence speed is improved by combining the traditional optimization algorithm with the intelligent optimization algorithm. At the same time, the algorithm strictly follows the inequality constraint mechanism of SCN, and each optimal solution is generated under the inequality supervision mechanism, which ensures the unique convergence of the SCN network. Numerical experiments also illustrate that SCN based on the G-BAPSO algorithm could be used to solve regression or classification problems. Compared with SCN, the training RMSE of G-BAPSO-SCN increased by 5.57 × 10 5 and 3.2 × 10 3 , respectively, and the recognition accuracy of G-BAPSO-SCN increased by 0.07% on average. In industrial applications, the G-BAPSO-SCN proposed in this paper is suitable for nonlinear approximation, regression prediction, and pattern recognition, based on industrial big data.

Author Contributions

Conceptualization, J.L.; methodology, J.L. and Y.L.; software, J.L. and Q.Z.; writing—original draft preparation, J.L.; writing—review and editing, Y.L. All authors have read and agreed to the final version of the manuscript.

Funding

This paper is partly supported by National Science Foundation of China under Grants (62073226), Liaoning Province Natural Science Foundation (2020-KF-11-09, 2022-KF-11-01), Shen-Fu Demonstration Zone Science and Technology Plan Project (2020JH13, 2021JH07), Major Science and Technology Projects of Liaoning Province (2022JH1/10400033), Young Teacher Foundation of Shenyang Institute of Technology (QN202210).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data included in this paper are available at http://archive.ics.uci.edu/ml/index.php (accessed on 19 October 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lukosevicius, M.; Jaeger, H. Reservoir computing approaches to recurrent neural network training. Comput. Sci. Rev. 2009, 3, 127–149. [Google Scholar] [CrossRef]
  2. Scardapane, S.; Wang, D. Randomness in neural networks: An overview. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2017, 7, e1200. [Google Scholar] [CrossRef]
  3. Wang, D. Editorial: Randomized algorithms for training neural networks. Inf. Sci. 2016, 126–128. [Google Scholar] [CrossRef]
  4. Rahimi, A.; Recht, B. Advances in Neural Information Processing Systems; MIT Press: Vancouver, BC, Canada, 2007; pp. 1177–1184. [Google Scholar]
  5. Kwok, T.Y.; Yeung, D.Y. Objective functions for training new hidden units in constructive neural networks. IEEE Trans. Neural Netw. 1997, 8, 1131–1148. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Igelnik, B.; Pao, Y.H. Stochastic choice of basis functions in adaptive function approximation and the functional-link net. IEEE Trans Neural Netw 1995, 6, 1320–1329. [Google Scholar] [CrossRef] [Green Version]
  7. Pao, Y.H.; Takefuji, Y. Functional-link net computing: Theory, system architecture, and functionalities. Computer 1992, 25, 76–79. [Google Scholar] [CrossRef]
  8. Wang, D.; Li, M. Stochastic configuration networks: Fundamentals and algorithms. IEEE Trans. Cybern. 2017, 47, 3466–3479. [Google Scholar] [CrossRef] [Green Version]
  9. Tyukin, I.; Prokhorov, D. Feasibility of random basis function approximators for modeling and control. In Proceedings of the IEEE Multi-Conference on Systems and Control, Saint Petersburg, Russia, 8–10 July 2009; pp. 1391–1396. [Google Scholar]
  10. Gorban, A.; Tyukin, I.; Prokhorov, D.; Sofeikov, K.I. Approximation with random bases: Pro etcontra. Inf. Sci. 2016, 364, 129–145. [Google Scholar] [CrossRef] [Green Version]
  11. Wang, D.; Li, M. Deep stochastic configuration networks with universal approximation property. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
  12. Tao, J.; Niu, H.; Zhang, Y.; Li, X. An intelligent modeling method for nonlinear systems based on random Configuration networks. Control Decis. 2022, 37, 2559–2564. [Google Scholar] [CrossRef]
  13. Wang, Q.; Yang, C.; Ma, X.; Zhang, C.; Peng, S. Underground airflow quantity modeling based on SCN. Acta Autom. Sin. 2021, 47, 1963–1975. [Google Scholar]
  14. Zhao, L.; Wand, J.; Huang, M.; Wang, G. Estimation of effluent quality index based on partial least squares stochastic configuration networks. CIESC J. 2020, 71, 5672–5680. [Google Scholar]
  15. Zhao, L.; Zou, S.; Guo, S.; Huang, M. Ball mill load condition recognition model based on regularized stochastic configuration networks. Control. Eng. China 2020, 27, 1–7. [Google Scholar]
  16. Wang, D.; Cui, C. Stochastic configuration networks ensemble with heterogeneous features for large-scale data analytics. Inf. Sci. 2017, 417, 55–71. [Google Scholar] [CrossRef] [Green Version]
  17. Dai, W.; Li, D.; Zhou, P. Stochastic configuration networks with block increments for data modeling in process industries. Inform. Sci. 2019, 484, 367–386. [Google Scholar] [CrossRef]
  18. Tian, Q.; Yuan, S.; Qu, H. Intrusion signal classification using stochastic configuration network with variable increments of hidden nodes. Opt. Eng. 2019, 58, 026105. [Google Scholar] [CrossRef]
  19. Zhu, X.; Feng, X.; Wang, W. A further study on the inequality constraints in stochastic configuration networks. Inf. Sci. 2019, 487, 77–83. [Google Scholar] [CrossRef]
  20. Zhang, C.; Ding, S. A stochastic configuration network based on chaotic sparrow search algorithm. Knowl.-Based Syst. 2021, 220, 106924. [Google Scholar] [CrossRef]
  21. Li, W.; Tao, H.; Li, H. Greengage grading using stochastic configuration networks and a semi-supervised feedback mechanism. Inform. Sci. 2019, 488, 1–12. [Google Scholar] [CrossRef]
  22. Zhang, Q.; Li, W.; Li, H. Self-blast state detection of glass insulators based on stochastic configuration networks and a feedback transfer learning mechanism. Inf. Sci. 2020, 522, 259–274. [Google Scholar] [CrossRef]
  23. Momeni, E.; Armaghani, D.J.; Hajihassani, M.; Amin, M.M. Prediction of uniaxial compressive strength of rock samples using hybrid particle swarm optimization-based artificial neural networks. Measurement 2015, 60, 50–63. [Google Scholar] [CrossRef]
  24. Armaghani, D.J.; Mohamad, E.T.; Narayanasamy, M.S.; Narita, N.; Yagiz, S. Development of hybrid intelligent models for predicting TBM penetration rate in hard rock condition. Tunn. Undergr. Space Technol. 2017, 63, 29–43. [Google Scholar] [CrossRef]
  25. Gandomi, A.H.; Yang, X.S. Chaotic bat algorithm. J. Comput. Sci. 2014, 5, 224–232. [Google Scholar] [CrossRef]
  26. Yang, X.S.; Gandoomi, A.H. BAT Algorithm: A Novel Approach for Global Engineering Optimization; Professional Publications: Hyderabad, India, 2012. [Google Scholar]
  27. Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the IEEE Proceedings of ICNN’95—International Conference on Neural Networks, Perth, WA, Australia, 27 November–1 December 1995; pp. 1942–1948. [Google Scholar]
  28. Whz, A.; Yxw, B.; Yan, Z.C.; Jian, X.C. Research on multi-energy complementary microgrid scheduling strategy based on improved bat algorithm. Energy Rep. 2022, 8, 1258–1272. [Google Scholar]
  29. Ibrahim, M.H. WBA-DNN: A hybrid weight bat algorithm with deep neural network for classification of poisonous and harmful wild plants. Comput. Electron. Agric. 2021, 190, 106478. [Google Scholar] [CrossRef]
  30. Eliguzel, N.; Cetinkaya, C.; Dereli, T. A novel approach for text categorization by applying hybrid genetic bat algorithm through feature extraction and feature selection methods. Expert Syst. Appl. 2022, 202, 117433. [Google Scholar] [CrossRef]
  31. Saji, Y.; Barkatou, M. A discrete bat algorithm based on lévy flights for euclidean traveling salesman problem. Expert Syst. Appl. 2021, 172, 114639. [Google Scholar] [CrossRef]
  32. Fan, Q.; Fan, T. A Hybrid Model of Extreme Learning Machine Based on Bat and Cuckoo Search Algorithm for Regression and Multiclass Classification. J. Math. 2021, 2021, 4404088. [Google Scholar] [CrossRef]
  33. Shi, Y.H.; Eberhart, R.C. Empirical study of particle swarm optimization. In Proceedings of the 1999 Congress on Evolutionary Computation-CEC99, Washington, DC, USA, 6–9 July 1999. [Google Scholar]
  34. Rui, M.; Cortez, P.; Rocha, M.; Neves, J. Particle swarms for feed forward neural net training. In Proceedings of the IEEE International Joint Conference on Neural Networks, Honolulu, HI, USA, 12–17 May 2002; pp. 1895–1899. [Google Scholar]
  35. Gordan, B.; Armaghani, D.J.; Hajihassani, M.; Monjezi, M. Prediction of seismic slope stability through combination of particle swarm optimization and neural network. Eng. Comput. 2019, 32, 85–97. [Google Scholar] [CrossRef]
  36. Armaghani, D.J.; Asteris, P.G.; Fatemi, S.A.; Hasanipanah, M.; Tarinejad, R.; Rashid, A.S.A.; Huynh, V.V. On the Use of Neuro-Swarm System to Forecast the Pile Settlement. Appl. Sci. 2020, 10, 1904. [Google Scholar] [CrossRef] [Green Version]
  37. Kahraman, H.T.; Bayindir, R.; Sagiroglu, S. A new approach to predict the excitation current and parameter weightings of synchronous machines based on genetic algorithm-based k-NN estimator. Energy Convers. Manag. 2012, 64, 129–138. [Google Scholar] [CrossRef]
  38. Kahraman, H.T. Metaheuristic linear modeling technique for estimating the excitation current of a synchronous motor. Turk. J. Electr. Eng. Comput. Sci. 2014, 22, 1637–1652. [Google Scholar] [CrossRef]
  39. Deng, C.; Huang, G.; Xu, J. Extreme learning machines: New trends and applications. Sci. China Inf. Sci 2015, 58, 020301. [Google Scholar] [CrossRef] [Green Version]
  40. Tsoulos, I.G.; Stavrou, V.; Mastorakis, N.E.; Tsalikakis, D. Genconstraint: A programming tool for constraint optimization problems. SoftwareX 2019, 10, 100355. [Google Scholar] [CrossRef]
Figure 1. The structure of SCN with a hidden layer.
Figure 1. The structure of SCN with a hidden layer.
Applsci 13 02878 g001
Figure 2. The flow chart of G-BAPSO.
Figure 2. The flow chart of G-BAPSO.
Applsci 13 02878 g002
Figure 3. The figure of function f 1 .
Figure 3. The figure of function f 1 .
Applsci 13 02878 g003
Figure 4. The figure of function f 2 .
Figure 4. The figure of function f 2 .
Applsci 13 02878 g004
Figure 5. The figure of function f 3 .
Figure 5. The figure of function f 3 .
Applsci 13 02878 g005
Figure 6. The figure of function f 4 .
Figure 6. The figure of function f 4 .
Applsci 13 02878 g006
Figure 7. The average convergence curve (Dataset 1).
Figure 7. The average convergence curve (Dataset 1).
Applsci 13 02878 g007
Figure 8. The average convergence curve (Dataset 2).
Figure 8. The average convergence curve (Dataset 2).
Applsci 13 02878 g008
Figure 9. The average convergence curve (Dataset 3).
Figure 9. The average convergence curve (Dataset 3).
Applsci 13 02878 g009
Figure 10. The average convergence curve (Dataset 4).
Figure 10. The average convergence curve (Dataset 4).
Applsci 13 02878 g010
Table 1. The sensitivity analysis of c 1 , c 2 and popsize.
Table 1. The sensitivity analysis of c 1 , c 2 and popsize.
PopsizeThe Error of the Minimum Value of `f1’
   c 1 , c 2 , c 3 = 0 . 3 , 0 . 3 , 0 . 4 c 1 , c 2 , c 3 = 1 , 1 , 0 . 4 c 1 , c 2 , c 3 = 2 , 2 , 0 . 4
25 3.4575 × 10 23 1.0590 × 10 20 6.4554 × 10 7
50 1.1816 × 10 23 3.1654 × 10 21 1.8236 × 10 10
75 6.2293 × 10 24 1.9501 × 10 21 4.0223 × 10 12
100 4.5259 × 10 24 1.2117 × 10 21 3.8343 × 10 13
Table 2. The sensitivity analysis of c 3 .
Table 2. The sensitivity analysis of c 3 .
c 3 The Error of the Minimum Value of ’f1’
0.2 6.2255 × 10 18
0.4 1.1816 × 10 23
0.6 2.9205 × 10 23
0.8 4.1324 × 10 18
Table 3. The sensitivity analysis of the iterations.
Table 3. The sensitivity analysis of the iterations.
IterationsThe Error of the Minimum Value of `f1’Run Time (s)
10 8.0748 × 10 12 0.0202
20 1.1816 × 10 23 0.0246
50 7.7729 × 10 70 0.0511
Table 4. Parameters setting.
Table 4. Parameters setting.
FunctionPopsizeIterations c 1 c 2 c 3 Rf r 0
f 1 50200.30.30.40.93
f 2 20020220.000010.91
f 3 100500.20.30.50.90.7
f 4 50100.40.40.20.91
Table 5. The average error of the optimal solution in 200 experiments.
Table 5. The average error of the optimal solution in 200 experiments.
FunctionG-BAPSOG-PSOPSOBatFOA
f 1 1 . 1816 × 10 23 2.6410 × 10 23 5.4593 × 10 6 4.0801 × 10 3 1.2026 × 10 2
f 2 2 . 3441 × 10 15 3.1405 × 10 15 2.8753 × 10 15 5.0260 × 10 3 2.2302 × 10 3
f 3 00 1.3776 × 10 13 9.3501 × 10 5 0.0114 × 10 2
f 4 6 . 8003 × 10 6 7.9036 × 10 6 3.2363 × 10 5 3.9358 × 10 3 1.8720 × 10 1
Table 6. Information about the datasets.
Table 6. Information about the datasets.
DatasetSamplesInput VariablesOutput VariablesTraining SamplesTest Samples
11300111000300
255741400157
31372421038334
41504312030
Table 7. Parameters setting for performance test.
Table 7. Parameters setting for performance test.
DatasetThe Hidden NodesAverage RunsPopsizeIterationsSearch Scope
180503020 [ 300 , 300 ]
220503020 [ 0.3 , 0.3 ]
320503020 [ 5 , 5 ]
440503020 [ 5 , 5 ]
Table 8. Performance comparison results (Dataset 1).
Table 8. Performance comparison results (Dataset 1).
NetworkTraining RMSETest RMSE
G-BAPSO-SCN0.00080.0021
SCN0.00400.0050
RVFL0.14910.1645
ELM0.20560.2235
Table 9. Performance comparison results (Dataset 2).
Table 9. Performance comparison results (Dataset 2).
NetworkTraining RMSETest RMSE
G-BAPSO-SCN 6 . 4809 × 10 6 4 . 0114 × 10 5
SCN 6.2139 × 10 5 5.7800 × 10 4
RVFL 2.0778 × 10 2 1.0210 × 10 1
ELM 2.2864 × 10 1 2.7515 × 10 1
Table 10. Performance comparison results (Dataset 3).
Table 10. Performance comparison results (Dataset 3).
NetworkTraining ACCTest ACC
G-BAPSO-SCN99.93%99.96%
SCN99.79%99.80%
RVFL97.88%97.90%
ELM99.04%98.50%
Table 11. Performance comparison results (Dataset 4).
Table 11. Performance comparison results (Dataset 4).
NetworkTraining ACCTest ACC
G-BAPSO-SCN98.58%94.20%
SCN98.58%91.27%
RVFL82.50%90.00%
ELM98.33%86.67%
Table 12. The comparison of run time based on Dataset 1.
Table 12. The comparison of run time based on Dataset 1.
NetworkRun Time (s)
G-BAPSO-SCN1.8874
SCN1.6420
RVFL0.0063
ELM0.0153
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, J.; Liu, Y.; Zhang, Q. A Gradient-Based Particle-Bat Algorithm for Stochastic Configuration Network. Appl. Sci. 2023, 13, 2878. https://doi.org/10.3390/app13052878

AMA Style

Liu J, Liu Y, Zhang Q. A Gradient-Based Particle-Bat Algorithm for Stochastic Configuration Network. Applied Sciences. 2023; 13(5):2878. https://doi.org/10.3390/app13052878

Chicago/Turabian Style

Liu, Jingjing, Yefeng Liu, and Qichun Zhang. 2023. "A Gradient-Based Particle-Bat Algorithm for Stochastic Configuration Network" Applied Sciences 13, no. 5: 2878. https://doi.org/10.3390/app13052878

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop