1. Introduction
Randomized algorithms have the characteristics of fast learning and show great potential in machine learning [
1,
2,
3]. These algorithms generally select the input parameters randomly and then calculate the output parameters based on the least square method. There are many categories of random weight networks. In addition to the typical feedforward neural network, there are also recurrent neural networks with random weights and randomized kernel approximations [
4].
At the beginning, most randomized weights networks could not determine the appropriate number of hidden nodes. To solve this problem, the incremental learning algorithm was proposed. The network gradually increases the hidden nodes until the tolerance is reached. This method optimizes the network structure and avoids the waste of hidden nodes. Based on this idea, SLFNN, which has approximation properties, was proposed by Kwok et al [
5]. They also adopted a modified Quickprop algorithm to update the network weights. The random Vector Function Link (RVFL) network also used the incremental strategy to build its structure, which probabilistically converges when the input side parameters of the network are appropriately selected [
6,
7]. Otherwise, the convergence can not be guaranteed [
8]. In 2009, Tyukin I proposed that the RVFL network needed a supervision mechanism, and they verified that the RVFL network could not approximate a given objective function through the experiments [
9]. Subsequently, this phenomenon had been further proved mathematically [
10]. The proposal of SCN solves the problem that the network did not have universal approximation property [
8]. SCN used an inequality constraint mechanism to allocate input parameters randomly and expanded the selection range of random parameters adaptively, so as to ensure the approximation of the established stochastic learning network. It was an incremental generative network, and the input parameters were stochastically configurated under the inequality constraint, and the output parameters were determined by constructional method or by solving the least square problem. Subsequently, SCN based on the depth model was also proposed, and depth SCN still has universal approximation properties under the inequality supervision mechanism [
11]. At present, SCN has been successfully applied to underground airflow quantity modeling, ball mill load condition recognition, sewage quality index estimation, and nonlinear system intelligent modeling [
12,
13,
14,
15].
The universal approximation property of SCN is an advantage that other randomized learning techniques do not have, as the adaptive search algorithm of the weights and biases tries to keep the parameters in a small range as possible. However, the output weights are usually large, so the generalization performance of the network is affected. Overall, the optimization of SCN focuses on the following aspects: first, improving the generalization performance, for instance,
and
norm regularization methods based on SCN are proposed respectively to avoid the risk of the model overfitting [
13,
15]. The partial least squares method is used to calculate the output parameters, replacing the ordinary least squares method [
14]. In addition, the famous negative correlation learning method could evaluate the output side parameters based on the SCN mode. The Block Jacobi and Gauss–Seidel methods are used to solve the output weights of ill-posed equations iteratively based on heterogeneous feature groups, in which the convergence analysis is given and the uniqueness of these iterative solutions is proved [
16]. The second aspect is to optimize the generation mode of the network; some scholars have proposed to change the incremental mode of the hidden nodes in the process of SCN construction, and the incremental mode of hidden nodes was improved from single incremental to block incremental [
17,
18]. The third aspect is to optimize the hyperparameters of the network. In the discussion of the input weights, some scholars indicate that the method based on the inequality supervision mechanism to search weights leads to a certain rejection rate, therefore, the conditions of inequality constraints should be relaxed, so as to improve the acceptance rate of the random value and reduce the number of iterations for the searching of input weights [
19]. However, the accuracy of the network will be reduced once the inequality constraints are relaxed, which has a certain cost. Some scholars also focus on regularization parameters and scale factors of weights and bias that affect the network performance, and optimize them using a chaotic sparrow search algorithm, which makes the network achieve better performance [
20]. Meanwhile, some scholars also applied various feedback learning mechanisms to optimize SCN [
21,
22].
There are few studies on how to optimize the input side parameters directly. SCN searches the input side parameters randomly until the parameters could satisfy the inequality constraints. The specific method is to expand the search interval gradually linearly, and randomly generate a large number of weights and biases in the interval to judge whether the inequality constraints are satisfied, then select the best one. The method will affect the efficiency of SCN. Therefore, an efficient optimization algorithm needs to be designed. The gradient descent method is often used to solve optimization problems in practical engineering, that is, to solve for a minimum in the direction of gradient descent or to solve for a maximum in the direction of gradient ascent. However, this method is easy to fall into the local minimum for non-convex optimization problems, so it needs to be combined with an intelligent optimization algorithm to achieve better performance [
23,
24]. The bat algorithm [
25,
26] and Particle Swarm Optimization (PSO) algorithm [
27] are population-based stochastic optimization technical algorithms, which seek the optimal solution by imitating the group behavior of the biological world. The bat algorithm is a search algorithm to simulate the predation behaviors of bats, which has been successfully applied to microgrid scheduling [
28], harmful plants classification [
29], text categorization [
30], and the categorization salesman problem [
31]. These algorithms could search more regions in the solution space of the objective function simultaneously, which is flexible and easy to implement, and requires fewer parameters to be adjusted. In addition, the gradient information could be used to determine the search direction because of the randomness of intelligent optimization algorithms. Therefore, using the hybrid optimization algorithm to search the input parameters of SCN could optimize the network structure and improve the convergence speed.
This paper focuses on improving the search algorithm for the input weights and biases. In order to accelerate the convergence speed, an optimization model was established based on the inequality constraint mechanism by analyzing the convergence principle of SCN. Then, an improved optimization algorithm (G-BAPSO) is proposed, which combines the gradient information of the objective function and searches for the optimal weights and biases using the hybrid PSO and bat algorithm. Finally, an optimized stochastic configuration network based on the G-BAPSO algorithm is established.
The structure of this paper is as follows: The theory of SCN is introduced in
Section 2; G-BAPSO-SCN is proposed in
Section 3, including the optimization model and algorithm. In
Section 4, the G-BAPSO algorithm is verified on some benchmark functions, and the performance of G-BAPSO-SCN and other models is compared in some regression and classification problems. The experimental results are analyzed in
Section 5, and
Section 6 gives the conclusion of this paper.
5. Discussion
Table 8,
Table 9,
Table 10 and
Table 11 show that G-BAPSO-SCN converges faster than other models when the number of hidden nodes is the same, and the RVFL network and ELM have relatively poor performance.In regression experiment 1, the training RMSE of G-BAPSO-SCN increased by
compared with that of SCN; in regression experiment 2, the training RMSE of G-BAPSO-SCN increased by 0.0032, indicating that the improvement degree of training RMSE not only depends on the algorithm itself, but is also closely related to the distribution characteristics of the data. In classification problems, G-BAPSO-SCN also has an advantage in recognition accuracy, with an average improvement of about 0.07% over SCN. At the same time, it could also be seen from the comparison of convergence curves in
Figure 7,
Figure 8,
Figure 9 and
Figure 10 that G-BAPSO-SCN has faster convergence than SCN. Since RVFL and ELM are not incremental generation networks, and their hidden nodes are generated once, the convergence speed between them is not compared with the former two curves.
G-BAPSO-SCN performs well because it introduces a more flexible mechanism in the search of input parameters, allowing it to explore the solution space more thoroughly. The algorithm combines the advantages of the intelligent optimization method and traditional optimization method, and the introduction of gradient makes it possible to approach the optimal solution quickly. By introducing the emission frequency mechanism of bats into the PSO framework, local search could be carried out according to individual fitness and emission frequency at the initial stage, and the poor individuals could be eliminated adaptively, thus accelerating the convergence rate. SCN uses the linear expansion searching method to select the input parameters, which cannot detect the optimal solution with a long distance in the initial stage. The initial configuration is not optimized, thus increasing the searching iterations and the number of nodes to a certain extent. RVFLN is a typical incremental network, so we also compare it with G-BAPSO-SCN. However, RVFLN could not guarantee the convergence, and the accuracy of the network is slightly inferior. ELM’s performance is relatively poor under the same number of hidden nodes as G-BAPSO-SCN. In addition to the optimization of input weights and biases, CSSA-SCN [
20] optimizes their search intervals and the regularization factor r of the model, which is the optimization of hyperparameters, and its objective function and optimized parameters are different from those in this paper. Most of the other methods used to optimize SCNS focus on the the output parameters or the generation of model structures.
In conclusion, G-BAPSO-SCN, first proposed in this paper, has a faster convergence rate and a more optimized structure, which can improve the training efficiency of the model and have better performance in the application of regression or prediction.
7. Conclusions
The proposed algorithm G-BAPSO provides an alternative solution for improving the convergence speed of SCN. The G-BAPSO algorithm focuses on the search for input weights and biases, considering the gradient information, the global optimal solution, and the individual optimal solution. In the process of searching for the optimal solution. the algorithm adaptively adjusts the local solutions according to the individual emission frequency and fitness in the initial stage. Thus, the convergence speed is improved by combining the traditional optimization algorithm with the intelligent optimization algorithm. At the same time, the algorithm strictly follows the inequality constraint mechanism of SCN, and each optimal solution is generated under the inequality supervision mechanism, which ensures the unique convergence of the SCN network. Numerical experiments also illustrate that SCN based on the G-BAPSO algorithm could be used to solve regression or classification problems. Compared with SCN, the training RMSE of G-BAPSO-SCN increased by and , respectively, and the recognition accuracy of G-BAPSO-SCN increased by 0.07% on average. In industrial applications, the G-BAPSO-SCN proposed in this paper is suitable for nonlinear approximation, regression prediction, and pattern recognition, based on industrial big data.