Next Article in Journal
A Fractional Order Model to Study the Effectiveness of Government Measures and Public Behaviours in COVID-19 Pandemic
Previous Article in Journal
On the Conditional Value at Risk Based on the Laplace Distribution with Application in GARCH Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hybrid Sparrow Search Algorithm of the Hyperparameter Optimization in Deep Learning

1
School of Mechanical Engineering, Yanshan University, Qinhuangdao 066004, China
2
School of Electrical Engineering, Yanshan University, Qinhuangdao 066004, China
3
Department of Mechanical Engineering, University of Manitoba, Winnipeg, MB R3T 5V6, Canada
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(16), 3019; https://doi.org/10.3390/math10163019
Submission received: 9 July 2022 / Revised: 11 August 2022 / Accepted: 12 August 2022 / Published: 22 August 2022
(This article belongs to the Section Engineering Mathematics)

Abstract

:
Deep learning has been widely used in different fields such as computer vision and speech processing. The performance of deep learning algorithms is greatly affected by their hyperparameters. For complex machine learning models such as deep neural networks, it is difficult to determine their hyperparameters. In addition, existing hyperparameter optimization algorithms easily converge to a local optimal solution. This paper proposes a method for hyperparameter optimization that combines the Sparrow Search Algorithm and Particle Swarm Optimization, called the Hybrid Sparrow Search Algorithm. This method takes advantages of avoiding the local optimal solution in the Sparrow Search Algorithm and the search efficiency of Particle Swarm Optimization to achieve global optimization. Experiments verified the proposed algorithm in simple and complex networks. The results show that the Hybrid Sparrow Search Algorithm has the strong global search capability to avoid local optimal solutions and satisfactory search efficiency in both low and high-dimensional spaces. The proposed method provides a new solution for hyperparameter optimization problems in deep learning models.

1. Introduction

The abilities of data collection, storage, and processing in scientific research and engineering applications have been greatly improved with the rapid development of technologies [1]. Deep learning has been introduced in different fields of science and engineering, especially in data processing and analysis [2]. Based on artificial neural networks, deep learning imitates the mechanism of the human brain to process data information such as images, sounds, and texts. Deep learning has shown its potential in having great learning ability, wide coverage, high adaptability, and excellent portability in data processing and analysis [3].
Deep learning uses a network structure with multiple layers. Each layer processes the received signal and passes it to the next layer. In a deep neural network, many layers can be used between the input and output layers. These layers can perform linear and non-linear data transformations [4].
Hyperparameters are parameters that need to be determined before forming a neural network, such as the batch size, learning rate, and dropout rate. The performance of a deep learning model will be greatly impacted by the hyperparameter configuration.
In general, building an effective deep learning model is a complex and time-consuming process that involves determining an appropriate structure of a deep learning model and its hyperparameters. The performance of a deep learning algorithm is greatly affected by its hyperparameters. Therefore, determining hyperparameters is an important task in applications of deep learning. The purpose of this paper is to propose a hyperparameter optimization algorithm with superior performance.
Different methods have been applied in setting hyperparameters for deep learning, such as the manual search, grid search, and random search. However, these methods have the problems of poor performance in high-dimensional models, inefficiency, and low accuracy. Therefore, an effective method is required for the optimization of hyperparameters for deep learning algorithms.
Hyperparameter optimization finds a set of hyperparameters to fit the needs of a deep learning model. The search is conducted using a mathematical model of optimization [5]. The hyperparameter optimization is normally considered a “black box” searching process to determine the hyperparameter configuration of a deep learning model [6].
This paper proposes a method for hyperparameter optimization, by combining the Sparrow Search Algorithm and Particle Swarm Optimization, called the Hybrid Sparrow Search Algorithm. This method uses the advantages of the avoidance of the local optimal solution in the Sparrow Search Algorithm, and the search efficiency of Particle Swarm Optimization, to achieve global optimization.
The contributions of this research are as follows: (1) development of a heuristic algorithm with a strong global search ability; (2) improvement of the hyperparameter optimization; (3) application of the proposed algorithm in neural networks.
The remainder of this paper is organized as follows. Section 2 introduces the existing research. Section 3 provides details of the proposed method. Experiments to verify the proposed method are discussed in Section 4. Section 5 presents a discussion and future work, and the conclusion of this paper is presented in Section 6.

2. Related Research

2.1. Hyperparameter Optimization

Different methods have been proposed for the hyperparameter optimization including manual search, grid search, random search, and Bayesian optimization.
Manual search [7] can determine hyperparameters for simple models. Before the emergence of the big data era, neural network models used to process data were generally not complicated. Hyperparameters of a model could be manually decided by field experts. However, with big data applications, there are increased hyperparameters in a neural network model, for which manual search is not able to meet the demand [8]. Automatic hyperparameter design has become a new research field.
Larochelle [9] proposed a grid search algorithm to balance the system performance and computation efficiency in determining hyperparameters. Grid search is an exhaustive method that takes a compromise between computational overhead and performance. For a large dataset, it has an “exponential explosion” problem [10] with reduced search efficiency. Bergstra [11] proposed a random search method that is simple and easy to use. However, this method is largely blind and has low adaptability. It is not a high-performance hyperparameter search method.
For sequential models [12,13], Bayesian optimization [14,15,16,17] is one of the most classic methods of hyperparameter optimizations. Compared with manual and random search methods, Bayesian optimization can fully use the information of previous searches in solving some complex problems [18]. However, this method can easily fall into a local optimal solution as it samples only around the optimal point.
Heuristic algorithms are based on experience that provides a feasible solution to the problem at an acceptable cost (computing time and space) [19]. They have been widely used in hyperparameter optimization. Sun [20] used a Simulated Annealing (SA) algorithm with a fast convergence rate for clustering problems of neural networks. Zhang [21] and Francescomarino [22] used Genetic Algorithms (GAs) to optimize network models based on ideas of heredity, crossover, and mutation in practical problems. As an improved evolutionary algorithm, Covariance Matrix Adaptation Evolutionary Strategies (CMA-ESs) [23] have been used to optimize neural networks. Lorenzo [24] and Djenouri [25] used Particle Swarm Optimization (PSO) with a fast convergence rate to optimize a deep learning network. However, these heuristic algorithms have a common problem, namely, that they easily fall into a local optimal solution when the objective function is non-convex. They are not effective in applications.
There are some recently proposed hyperparameter optimization methods. Ozturk [26] used a stochastic gradient descent to optimize the echo state network. Hu [27] used the Chimp Optimization Algorithm to increase the reliability and real-time capability of the network for classifying chest X-ray images. Kalita [28] developed Moth flame optimization and knowledge-based-search to optimize the hyperparameters of Support Vector Machine (SVM) [29]. Wu [30] used the sine–cosine algorithm to tune parameters of the Extreme Learning Machine (ELM) for diagnosing COVID-19 positive cases. Wang [31] used Whale Optimization for a real-time COVID-19 detector with parallel implementation capability. Several other algorithms have also been used to diagnose COVID-19 in recent years [32,33,34]. However, these methods only process some specific models.
In summary, the existing hyperparameter optimization algorithms generally have the problems of poor global search ability and propensity to fall into local optimal solutions [35,36].

2.2. Particle Swarm Optimization (PSO) and Sparrow Search Algorithm (SSA)

PSO [37] is an evolutionary algorithm inspired by the regularity of bird swarm activities. Based on the bird swarm activity behavior, PSO shares information about individuals in the entire swarm, for the evolution process from disorder to order in a problem space, to obtain the optimal solution.
PSO is an optimization algorithm based on iteration. The system is initialized as a set of random solutions to iteratively search for the optimal value. The algorithm searches through particles following the optimal particle in the solution space. Although this algorithm has a high search efficiency, it cannot avoid local optimal solutions when all individuals are concentrated near a local optimal solution.
The Sparrow Search Algorithm (SSA) [38] uses heuristic search to simulate the foraging process of sparrows as a kind of discoverer–follower model with a scouting and early warning mechanism. SSA has the ability to avoid the local optimal solution. However, it converges slowly and cannot obtain the optimal solution within an acceptable time [38]. In addition, our research found that this algorithm has data overflow problems caused by excessive exponents when the solution value is large. As a result, a large number of solutions are concentrated on the upper bound of the feasible region, which reduces the diversity of solutions and affects the algorithm performance. In addition, SSA has not been used to optimize hyperparameters of the neural network model.
In summary, the existing methods generally have poor global search ability. Although SSA has the ability to avoid local optimal solutions, it cannot be used in hyperparameter optimization problems due to the problem mentioned above. As a swarm intelligence algorithm, PSO is efficient for complex problems such as hyperparameter optimization. However, it also easily falls into a local optimal solution. This paper proposes a Hybrid Sparrow Search Algorithm (HSSA) having the advantages of both SSA and PSO for strong global search ability in complex problems to obtain the optimal solution within an acceptable time for hyperparameter optimization.

3. Proposed Approach

3.1. Hybrid Sparrow Search Algorithm (HSSA)

In order to address the problem of the existing hyperparameter optimization methods, which easily fall into local optimal solutions, we combine SSA with PSO, termed HSSA, by taking advantage of the velocity and displacement formula in PSO on the basis of SSA. The details are as follows.
(1)
Algorithmic Modeling
Among sparrows, individuals with a high fitness value act as discoverers, and other individuals act as followers. At the same time, a certain proportion of individuals in the population is selected for detection and early warning. If any danger is found, they will search alternatives. The population with n sparrows is as follows:
X = x 1 1 x 1 2 x 1 m x 2 1 x 2 2 x 2 m x n 1 x n 2 x n m
where m represents the dimension of variables to be optimized, and n is the number of individuals. m depends on the dimension of the problem. n represents the size of the population. In general, a large n leads to high population diversity and high optimization accuracy, but the iteration speed is slow. The fitness value of all sparrows can be expressed as follows:
F = f ( x 1 1 x 1 2 x 1 m ) f ( x 2 1 x 2 2 x 2 m ) f ( x n 1 x n 2 x n m )
where f is the fitness value. F contains the fitness of all individuals in the entire population.
(2)
Basic Rules
Discoverers usually have high energy reserves and are responsible for searching for food-rich areas. They provide foraging areas and directions for all followers. The level of energy reserve depends on the fitness value of every individual.
As soon as predators are detected, sparrows begin to chirp to send alarm signals. If the alarm value is greater than the safe value, discoverers will take followers to other safe areas.
Discoverers and followers change dynamically. As long as a better source of food can be found, a sparrow can become a discoverer, but the proportion of discoverers and followers in the entire population remains unchanged. In other words, whenever a sparrow becomes a discoverer, another sparrow becomes a follower.
Followers with less food have poor foraging positions in the entire population. Hungry followers are more likely to fly to other places to get food.
During the foraging process, followers can always search for the discoverer who provides the best food, or forage around the discoverer. At the same time, in order to increase their food reserves, some followers may constantly monitor discoverers for food resources.
Once aware of the danger, individuals at the edge of population will quickly move to a safe area for better positions. Individuals located in the middle of the population will randomly fly to other sparrows.
(3)
Discoverers
Discoverers account for 10–20% of the entire population. Location updates of the discoverers are given as follows:
X i , j s + 1 = X i , j s exp ( i α s max ) , R < T X i , j s + g L , R T
where i, j, and s are the ith sparrow, jth dimension, and sth iteration, respectively. X represents location information. smax is the maximum number of iterations. α (α ∈ (0, 1]) is a random number. R (R ∈ [0, 1]) and T (T ∈ [0.5, 1]) represent the warning value and safety value, respectively. g is a random number of a normal distribution. L represents a 1 × m matrix, and each element in the matrix is 1. Generally, the larger the value of s, the better the optimization effect; however, it takes more time. The likelihood of an individual being frightened depends on T.
(4)
Followers
All sparrows except discoverers are followers. The original location update formula in SSA is as follows:
X i , j s + 1 = g exp ( X w s X i , j s i 2 ) , X P s + 1 + X i , j X P s + 1 A + L , i > n / 2 i n / 2
where Xw is the worst position. Our research found that Equation (4) has data overflow problems caused by excessive exponents in the case i > n/2 when the solution value is large. For example, if Xsw = 5000, Xsi,j = 1000, i = 4. Xsi,j = g × exp(250). Obviously, this causes a data overflow problem.
In order to solve this problem and improve the global search ability, we combine Equation (4) with the velocity and displacement formula in PSO. The new location updates of followers are as follows:
V i , j s + 1 = ω V i , j s + c 1 r 1 ( B i , j s X i , j s ) + c 2 r 2 ( B g , j s X i , j s )
X i , j s + 1 = X i , j s + V i , j s + 1 , X P s + 1 + X i , j X P s + 1 A + L , i > n / 2 i n / 2
where V represents speed. ω (ω ∈ [0, 1]) is inertia weight. c1 and c2 are learning factors that generally take values of 0–4. r1 and r2 are random numbers between 0 and 1. Bi is the historical optimal solution of the ith sparrow. Bg is the global optimal solution of the entire population. Xp is the best position occupied by the discoverers. A represents a 1 × m matrix, and each element in the matrix is 1 or −1, and A+ = AT(AAT)−1. ω is non-negative. In general, when it is large, the global search ability is strong; when it is small, the local search ability is strong. c1 and c2 are the individual learning factor and social learning factor of each individual, respectively. Nandar’s [39] experiment showed that satisfactory solutions can be obtained when c1 and c2 are constants; usually, c1 = c2 = 2.
Equation (5) is the speed update formula. Equation (6) is formed by adding the speed update formula to the position update formula of followers. This improvement not only solves the data overflow problem, but also improves the search speed. It enables HSSA to search for the hyperparameter optimization in an acceptable time.
(5)
Vigilantes
Location updates of vigilantes are as follows:
X i , j s + 1 = B g , j s + β X i , j s B g , j s , f i > f g X i , j s + k ( X i , j s X w s ( f i f w ) + ε ) , f i f g
where Xw is the worst position. β is a parameter for controlling the step length. It is a random number that obeys the standard normal distribution. k is a random number between −1 and 1. fi is the fitness value of the current sparrow. fg and fw are global best and global worst fitness values, respectively. ε is a constant to avoid zero in the denominator.
(6)
Algorithm Framework
HSSA is composed of SSA and PSO. As shown in Figure 1, the left side of the flowchart is the SSA part, and the right side is the PSO part. The SSA part includes the position calculation of discoverers, followers, and vigilantes. The PSO part calculates the position of followers when i > n/2. The steps of HSSA are as follows.
Firstly, the initial population is randomly generated. Secondly, a fitness function is determined to evaluate the fitness of each individual and update the best solution. Thirdly, the population is divided into discoverers and followers based on fitness to update positions of discoverers and followers. If i > n/2, the PSO part is used to calculate the position of followers. Vigilantes are then randomly generated to update their position. Finally, the global best solution is calculated to determine whether the ending condition is met. The above steps are repeated until an individual that meets the ending condition is found. An individual who meets the ending condition is considered the best solution of HSSA. The pseudo-code is shown in Algorithm 1.
Algorithm 1. Procedure Hybrid Sparrow Search Algorithm
1Input: individuals n, dimension m, iterations smax
2Output: optimal value
3Initialize the population n, individual optimal value fi and global optimal value fg
4for s in smax do
5 Divide the population n into discoverers nd and followers nf
6for i in nd do
7  for j in m do
8   if R < T then
9     X i , j s + 1 = X i , j s exp ( i α s max )
10    else
11     X i , j s + 1 = X i , j s + g L
12for i in nf do
13  for j in m do
14   if i > n/2 then
15     V i , j s + 1 = ω V i , j s + c 1 r 1 ( B i , j s X i , j s ) + c 2 r 2 ( B g , j s X i , j s )
16     X i , j s + 1 = X i , j s + V i , j s + 1
17    else
18     X i , j s + 1 = X P s + 1 + X i , j X P s + 1 A + L
19 Randomly generate vigilantes nv
20for i in nv do
21  for j in m do
22   if fi > fg then
23     X i , j s + 1 = B g , j s + β X i , j s B g , j s
24    else
25     X i , j s + 1 = X i , j s + k ( X i , j s X w s ( f i f w ) + ε )
26for i in n do
27  Update fi
28 Update fg
29if fg meet the requirement then
30   exit for
31Return the optimal value

3.2. Fitness Function

In order to evaluate the algorithm objectively, we chose the average accuracy of a validation set as the training accuracy of the learning algorithm [40]. In order to facilitate the process and comparison of results, the error is defined as the fitness function value [41]. Equation (8) is the fitness function used for the evaluation of hyperparameters. A smaller fitness function value indicates better hyperparameters.
F i t n e s s = 1 i = 1 n a c c u r a c y i n

4. Experiments

Experiments were conducted to verify the performance of HSSA by comparing it with several algorithms that are currently recognized as being excellent. A computer with an Intel i7 CPU, NVIDIA RTX3070 GPU, and 32 GB memory was used. The neural network model was built using TensorFlow. Random Search, Bayesian Optimization, CMA-ES, SA, GA, PSO, SSA, and HSSA were all written in Python 3. The details are as follows.

4.1. Convolutional Neural Network

The convolutional neural network (CNN) is a well-known deep learning architecture inspired by the biological vision mechanism [42]. Convolutional neural networks rely on convolution and pooling to identify information [43] and have been widely used in many fields such as target detection and image classification [44]. LetNet-5 [45] is a classic convolutional neural network and is often used to test algorithms. We used LetNet-5 in subsequent experiments to verify the performance of our method in low-dimensional space. A LetNet-5 model is shown in Figure 2.
In Figure 2, C1 and C2 are convolutional layers, P1 and P2 are pooling layers, and F1 and F2 are fully connected layers. In order to verify HSSA in complex networks, we propose a more complex convolutional neural network based on AlexNet [46], as shown in Figure 3.
In Figure 3, C1–C5 are convolutional layers, P1–P3 are pooling layers, and F1–F4 are fully connected layers. The role of each layer in the neural network is as follows.
  • The input layer determines the type and style of the data entered.
  • The convolutional layer is performed on two matrices. The convolution kernel moves on the input matrix with a certain step length. The output matrix is obtained after the convolution operation.
  • The pooling layer is used for down sampling. The pooling layer continuously reduces the size of the data space. The number of parameters and calculations are decreased to control data over-fitting.
  • Fully connected layers connect to all nodes of the previous layer, thus integrating extracted features in mapping distributed features to the sample label space.
  • The output layer outputs the final result.

4.2. Performance Verification

The performance of HSSA was verified using two experiments with MNIST and Five Flowers datasets. The MNIST dataset is relatively simple and has a high classification accuracy. The Five Flowers dataset is relatively complex, and it is difficult to achieve a high classification accuracy using it. For the search effect in low-dimensional and high-dimensional spaces, two neural network models with different levels of complexity were used.
The experiment with the MNIST dataset uses the LetNet-5 neural network model, which is relatively simple. The optimized number of hyperparameters is small. As an optimization process of HSSA in low-dimensional space, the experimental process is relatively simple. The experiment on the Five Flowers dataset uses a deep convolutional neural network with a complex structure and more parameters. As an optimization process of HSSA in high-dimensional space, the experimental process is relatively complex. The experimental results were used to compare HSSA with Random Search, Bayesian Optimization, CMA-ES, SA, and GA, which are all recognized as excellent methods in the field.
The settings of all algorithms are shown in Table 1.

4.2.1. Experiment on MNIST Dataset

MNIST [47,48] is a classic dataset used for simple classification problems in machine learning. The dataset consists of 70,000 handwritten digital grayscale images, which are divided into 10 categories including numbers 0–9. These 70,000 images are divided into training and verification sets. Some sample pictures in the MNIST dataset are shown in Figure 4.
Lecun evaluated hyperparameters in very high dimensions and found that their performance changes were only attributed to a few hyperparameters [45]. We selected six important hyperparameters of the LetNet-5 convolutional neural network as the optimization objects, including the number of F1 units, number of F2 units, L2 weight decay, batch size, learning rate, and dropout rate. The range of each hyperparameter is listed in Table 2.
Table 3 shows the parameter settings of the LetNet-5 convolutional neural network that remain unchanged during the simple network experiment.
A criterion of evaluating the algorithm performance in experiments is the verification error generated during the training when the training time is fixed [49]. Since the running time in the experiment is mainly consumed by neural network training, the running time of the optimization algorithm itself can be negligible. Let n be the run time of the neural network, where n = iterations × individuals. The complexity of the algorithm is O(n). Moreover, different algorithms require different training times for each iteration, so it is unreasonable to use only the number of iterations to represent the training time. Therefore, in order to ensure that the training time of each algorithm is the same, each algorithm during the experiment was set according to Table 4.
The error of each algorithm changes with the number of iterations is shown in Figure 5.
The comparison shows that HSSA is more effective than the other five algorithms at 280 iterations, and achieves the best effect at 530 iterations. HSSA has a strong global search capability of finding better hyperparameters than the other algorithms in simple neural networks. The experiment proves that HSSA performs well in a simple neural network.

4.2.2. Experiment on Five Flowers Dataset

The Five Flowers dataset (https://www.kaggle.com/alxmamaev/flowers-recognition, accessed on 3 April 2021) is a classic dataset used for complex classification problems in the machine learning field. The dataset consists of 3670 RGB images in 5 categories, including daisies, dandelions, roses, sunflowers, and tulips. These images are divided into a training set and validation set. Some sample pictures in the five flowers dataset are shown in Figure 6.
In this experiment, 11 important hyperparameters of complex convolutional neural networks were selected as optimization objects, including the number of F1 units, number of F2 units, number of F3 units, number of F4 units, L2 weight decay, batch size, learning rate, F1 dropout rate, F2 dropout rate, F3 dropout rate, and F4 dropout rate. The range of each hyperparameter is shown in Table 5.
Table 6 shows parameter settings of the complex convolutional neural network that remain unchanged during experiment 2.
Settings of each algorithm in the experiment are shown in Table 7.
The error of each algorithm varies with the number of iterations, as shown in Figure 7. Figure 7 shows that the performance of SA in the complex convolutional neural network model is unexpectedly poor. In order to test whether this situation is caused by errors in the experimental operation, the same experiment was carried out twice according to the same standard. Finally, we found that this situation is not caused by experimental operation errors, but by defects of SA itself. SA tends to fall into a local optimal solution and continues to oscillate around it during the search process. This shows that SA is not appropriate in high-dimensional spaces. Errors of the model optimized by the other four methods are all around 0.3. However, HSSA is still the best. Compared with the other excellent algorithms, HSSA shows a better global search capability and does not easily fall into local optimal solutions. HSSA can find better hyperparameters than the other algorithms in complex neural networks. The experiment proves that HSSA performs well in a complex neural network.

4.3. Meaning Verification

The meaning verification experiment compares the performance of HSSA, SSA, and PSO in neural networks, and analyzes the improvement. Although results of the performance verification experiment show that HSSA is superior to several other excellent algorithms, it does not prove meaningful improvements. In order to further verify the significance of the algorithm improvement, the meaning verification experiment was carried out.
The meaning verification experiment used the same neural networks model and datasets. We selected the original SSA and PSO as the comparison objects of HSSA. Algorithm settings during the experiment are shown in Table 8.
Results of the meaning verification experiment on the MNIST dataset are shown in Figure 8.
Figure 8 shows that the effect of HSSA surpasses that of the other two algorithms at 280 iterations, and the best effect is achieved at 530 iterations. Results of the meaning verification experiment on the Five Flowers dataset are shown in Figure 9.
Figure 9 shows that effects of SSA and PSO are almost the same. HSSA is more effective than the other two algorithms at 80 iterations and achieves the best effect at 470 iterations.
According to the experimental results for both MNIST and Five Flowers datasets, HSSA is better than SSA and PSO. It is proved that the optimization performance of the algorithm is greatly improved in both low-dimensional and high-dimensional spaces. It can be concluded that the improvement is of great significance.

4.4. Result Analysis

4.4.1. Optimization Effect Analysis

In order to further compare the eight methods used in experiments, experimental results are analyzed in Table 9 and Table 10 for the mean error, the minimum error, and the number of iterations required to reach the minimum error. The minimum error represents the ability of the algorithm to search for the optimal solution. A small minimum error indicates strong global search capability. The mean error represents the overall effect of the algorithm in the iterative process. It does not change dramatically for several generations. The number of iterations represents the speed of convergence. A small number of iterations indicates fast convergence.
From Table 9, it is observed that, except for the random search, the mean error of each algorithm is concentrated around 0.01, indicating that the overall effect of each algorithm is not significantly different. The minimum error of HSSA is more than 10.42% lower than that of the other classic algorithms, indicating that HSSA has a strong optimization performance. Compared with PSO and SSA, the minimum error of HSSA is reduced by 12.24% and 17.31%, indicating that our improvement is significant.
From Table 10, as SA tends to fall into a local optimal solution in high-dimensional problems, it is not suitable for complex neural networks. The mean error of HSSA is at least 4.34% lower than that of the other algorithms, which shows that HSSA has the best overall optimization result. The minimum error of HSSA is at least 8.16% lower than that of the other classic algorithms, indicating that the optimization of HSSA is excellent. The minimum error of HSSA is 16.82% and 16.37% lower than that of PSO and SSA, respectively, indicating that our improvement is significant.
In summary, HSSA has an excellent performance in both low-dimensional and high-dimensional spaces. Its performance has been greatly improved.

4.4.2. Global Search Capability Analysis

Although previous experiments prove the excellent performance of HSSA, they cannot prove the global optimization solution of the improved HSSA.
In order to verify the global search capability of the improved HSSA, further analysis was conducted, as shown in Table 11, Table 12, Table 13, Table 14, Table 15 and Table 16, which indicate the location of each individual in the population at the end of the algorithm operation. Small differences between individuals in the population indicate that the algorithm can easily fall into a local optimal solution. On the contrary, large differences between individuals indicate that the population contains more global information due to the global search ability [39]. In these tables, Bs is the batch size, L2 is L2 weight decay, F1–F4 is the number of F1–F4 units, Dr is the dropout rate, and Lr is the learning rate.
In order to visually compare the degree of dispersion of each group of data, their standard deviations were used as evaluation criteria. A large standard deviation indicates a high degree of dispersion [50]. Because these data are multi-dimensional, they can be regarded as discrete points in a high-dimensional space, and the difference between the data is the spatial distance between the points. For n points in the m-dimensional space, the standard deviation can be calculated using the following formulas:
x a x b = i = 1 n ( x a i x b i ) 2
x ¯ = ( 1 n i = 1 n x i 1 , 1 n i = 1 n x i 2 , , 1 n i = 1 n x i m )
S = 1 n i = 1 n ( x i x ¯ ) 2
In addition, due to the huge difference in the value range of each parameter, using original values of parameters will cause different parameters to have different degrees of influence on the standard deviation. Therefore, each parameter needs to be processed, so the original values of each parameter were converted to its relative position within the value range [51]. The processing method is shown in Equation (12):
t = t t min t max t min
where t is the original value of the parameter, and t′ is the relative position of the parameter. The converted relative positions are not affected by the value range. Each relative position has the same influence on the standard deviation.
Figure 10 shows the standard deviations when the iteration of PSO, SSA, and HSSA is completed in experiments for two different datasets.
Figure 10 shows the standard deviations of PSO, SSA, and HSSA on the MNIST dataset and Five Flowers dataset. It can be observed that regardless of experiments on the MNIST dataset or Five Flowers dataset, the standard deviation of PSO is very small. It means that PSO can easily fall into a local optimal solution as assumed. In experiments of the MNIST dataset and Five Flowers dataset, the standard deviation of HSSA is larger than that of PSO, by 0.5574 and 0.5323, respectively. In experiments of the MNIST dataset, the standard deviation of HSSA is 0.1174 higher than that of SSA. This shows that individuals of HSSA are even more scattered than those of SSA in experiments with the MNIST dataset. In experiments with the Five Flowers dataset, the standard deviation of HSSA is 0.085 lower than that of SSA, which is acceptable. In summary, the global search ability of HSSA could be verified by the experiment results.

4.5. Stability Analysis of HSSA

In the experiments, initial values of hyperparameters in optimization algorithms were randomly generated. Different initial values may lead to different experimental results. In order to verify the optimization ability of HSSA in the neural network under randomness, we conducted five experiments on MNIST and Five Flowers, respectively. The experiment settings were the same for both experiments. The mean and standard deviation of experimental results represent the effect of randomness on the HSSA performance.
Table 17 and Table 18 show the mean and standard deviation of the experimental results of HSSA in LetNet-5 and complex networks. Table 17 shows that the LetNet-5 model optimized by HSSA obtains a low error on the MNIST dataset. In addition, both average and minimum errors are very low. Standard deviations of the average and minimum errors are also very low. This shows that results of the five experiments are all very satisfactory, and the optimization performance of HSSA in a simple convolutional neural network model is very stable. Table 18 shows that the complex neural network optimized by HSSA has a gap between the mean and minimum errors on the Five Flowers dataset. The main reason for this phenomenon is that Five Flowers is a complex dataset, which is different from MNIST. A high classification accuracy cannot be achieved in the initial and middle stage optimization. Therefore, the final mean is high due to the high error in the initial and middle optimization stage. Even so, standard deviations of mean errors and minimum errors in the complex neural network are within an acceptable range. This shows that the optimization performance of HSSA in the complex convolutional neural network model is relatively stable.
Comparing Table 17 and Table 18 with Table 9 and Table 10, respectively, it can be seen that the minimum error of each experiment of HSSA is smaller than that of the other algorithms. This shows that the optimization performance of HSSA is not affected by randomness and verifies the stability of HSSA.

5. Discussion

Bayesian optimization, random search, and grid search are classic methods in neural network hyperparameter optimization. Although optimization methods based on heuristic algorithms and other methods proposed in recent years have also been used in the hyperparameter optimization of neural networks, they generally have poor global search capabilities and easily fall into local optimal solutions. Most of them are only suitable for solving a specific problem without universality. Therefore, the current heuristic algorithms are not effective for hyperparameter optimization. This paper improves heuristic algorithms for the hyperparameter optimization of neural networks. It provides a new research direction for solving problems of neural network hyperparameter optimization, which is to study new heuristic algorithms and apply them in the hyperparameter optimization.
The research in this paper validates the hybrid heuristic algorithm. The combination of different algorithms results in a new method that can benefit from the advantages of each algorithm. As a hybrid heuristic algorithm, HSSA embodies advantages of both PSO and SSA.
However, HSSA outperforms other methods when it iterates 280 times in the simple network. Although the optimal solution of HSSA is better than that of other methods, it requires sufficient iterations. In addition, HSSA contains some parameters that need to be manually set. If these parameters are not set properly, the algorithm may not be as effective as some traditional methods. This aspect also needs to be studied through a large number of experiments in future work.
HSSA is essentially an optimization method. It can also be used to solve other optimization problems. Moreover, as a novel heuristic algorithm, HSSA is a kind of swarm intelligence algorithm with potential parallelism, which has not been fully utilized. Due to the lack of study of the SSA, its true value has not yet been fully discovered. In future work we will further improve the SSA to take full advantage of its global search capability.

6. Conclusions

In order to improve neural network hyperparameter optimization for global search ability, we propose a new HSSA algorithm to avoid local optimal solutions. This algorithm fixes the data overflow defect of the original SSA and combines the advantages of the strong ability of SSA to find a global optimization solution with the high search speed of PSO. It performs well in both simple and complex networks.
The performance verification experiments on simple and complex networks prove that HSSA is an excellent hyperparameter optimization method. The minimum error of HSSA in simple networks is about 10% lower than that of the other classic algorithms. It is about 8% lower than that of the other classic algorithms in complex networks. Experimental results show that HSSA has excellent optimization performance to find better solutions than other algorithms.
Results of the meaning verification experiment on simple and complex networks prove the significance of HSSA’s improvement. By combining SSA and PSO, the data overflow problem caused by SSA is solved and the search speed is improved. The performance of the algorithm in neural networks is improved. In a simple network, the minimum error of HSSA is about 12% and 17% lower than that of PSO and SSA, respectively. In a complex network, the minimum error of HSSA is about 12% lower than that of PSO and SSA.
In addition, the stability analysis of HSSA proves that the optimization performance of HSSA is not affected by randomness, and shows that HSSA is stable and adaptable. In short, based on the research results of this article, HSSA is proven to be an excellent hyperparameter optimization algorithm.
However, this study has limitations. During the optimization process, the neural network may run hundreds or thousands of times. It takes hours to process a large dataset for some of the new network models proposed in recent years, which thus have a high time cost. Therefore, it is difficult to experiment with overly complex cases. In the future, the optimization algorithm will be applied to more complex situations with the continuous improvement in computer performance.
Although some achievements were made in this study, some future work needs to be undertaken. Firstly, in simple networks, the effect of HSSA is not significant when the number of iterations is small. Speeding up the early search speed will be a future research direction. Secondly, some parameters have an impact on the optimization performance of HSSA. Another future research direction will be to study setting appropriate parameters to improve the optimization performance. Thirdly, the combination of two algorithms provides a direction for new algorithms possessing the advantages of these two algorithms. Finally, limited by computer performance, current hyperparameter optimization algorithms are not suitable for particularly complex situations. Simplifying neural networks or improving computational efficiency will be a future research focus.

Author Contributions

Funding acquisition, B.G. and X.L.; Methodology, Y.F., Y.Z. and B.G.; Software, Y.Z.; Supervision, X.L., Q.P. and Z.J.; Writing—original draft, Y.F. and Y.Z.; Writing—review & editing, Y.Z., B.G., Q.P. and Z.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Hebei Province: F2020202103, the Natural Science Foundation of China (52175488) and the Scientific research youth top talent project of Hebei Province (BJ2021045). And The APC was funded by the Scientific research youth top talent project of Hebei Province (BJ2021045).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

The authors would like to thank the School of Mechanical Engineering of Yanshan University for providing experimental conditions. Funding from the Natural Science Foundation of China (52175488) and the Scientific research youth top talent project of Hebei Province (BJ2021045) is gratefully acknowledged. The authors would like to thank the editors and the reviewers for reviewing this paper.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Gorshenin, A.; Kuzmin, V. Statistical Feature Construction for Forecasting Accuracy Increase and Its Applications in Neural Network Based Analysis. Mathematics 2022, 10, 589. [Google Scholar] [CrossRef]
  2. Yuan, X.; Shi, J.; Gu, L. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst. Appl. 2021, 169, 114417. [Google Scholar] [CrossRef]
  3. Althubiti, S.A.; Escorcia-Gutierrez, J.; Gamarra, M.; Soto-Diaz, R.; Mansour, R.F.; Alenezi, F. Improved Metaheuristics with Machine Learning Enabled Medical Decision Support System. Comput. Mater. Contin. 2022, 73, 2423–2439. [Google Scholar] [CrossRef]
  4. Xiong, J.; Zuo, M. What does existing NeuroIS research focus on? Inf. Syst. 2020, 89, 101462. [Google Scholar] [CrossRef]
  5. Tantithamthavorn, C.; McIntosh, S.; Hassan, A.E.; Matsumoto, K. The Impact of Automated Parameter Optimization on Defect Prediction Models. IEEE Trans. Softw. Eng. 2019, 45, 683–711. [Google Scholar] [CrossRef] [Green Version]
  6. Li, W.; Ng, W.W.Y.; Wang, T.; Pelillo, M.; Kwong, S. HELP: An LSTM-based approach to hyperparameter exploration in neural network learning. Neurocomputing 2021, 442, 161–172. [Google Scholar] [CrossRef]
  7. van Rijn, J.N.; Hutter, F. Hyperparameter Importance Across Datasets. In Proceedings of the 24th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), London, UK, 19–23 August 2018; pp. 2367–2376. [Google Scholar] [CrossRef] [Green Version]
  8. Wang, Z.; Xuan, J. Intelligent fault recognition framework by using deep reinforcement learning with one dimension convolution and improved actor-critic algorithm. Adv. Eng. Inform. 2021, 49, 101315. [Google Scholar] [CrossRef]
  9. Larochelle, H.; Erhan, D.; Courville, A.; Bergstra, J.; Bengio, Y. An empirical evaluation of deep architectures on problems with many factors of variation. In Proceedings of the 24th International Conference on Machine Learning (ICML), Corvalis, OR, USA, 20–24 June 2007; pp. 473–480. [Google Scholar]
  10. Lerman, P.M. Fitting Segmented Regression Models by Grid Search. J. R. Stat. Soc. Ser. C Appl. Stat. 1980, 29, 77–84. [Google Scholar] [CrossRef]
  11. Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
  12. Frank, H.; Holger, H.H.; Kevin, L.B. Sequential Model-Based Optimization for General Algorithm Configuration. In Proceedings of the 5th International Conference on Learning and Intelligent Optimization, Rome, Italy, 17 January 2011; pp. 507–523. [Google Scholar] [CrossRef] [Green Version]
  13. Talathi, S.S. Hyper-parameter optimization of deep convolutional networks for object recognition. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; pp. 3982–3986. [Google Scholar]
  14. Cui, J.; Tan, Q.; Zhang, C.; Yang, B. A novel framework of graph Bayesian optimization and its applications to real-world network analysis. Expert Syst. Appl. 2021, 170, 114524. [Google Scholar] [CrossRef]
  15. Lee, M.; Bae, J.; Kim, S.B. Uncertainty-aware soft sensor using Bayesian recurrent neural networks. Adv. Eng. Inform. 2021, 50, 101434. [Google Scholar] [CrossRef]
  16. Kong, H.; Yan, J.; Wang, H.; Fan, L. Energy management strategy for electric vehicles based on deep Q-learning using Bayesian optimization. Neural Comput. Appl. 2019, 32, 14431–14445. [Google Scholar] [CrossRef]
  17. Jin, N.; Yang, F.; Mo, Y.; Zeng, Y.; Zhou, X.; Yan, K.; Ma, X. Highly accurate energy consumption forecasting model based on parallel LSTM neural networks. Adv. Eng. Inform. 2021, 51, 101442. [Google Scholar] [CrossRef]
  18. Chanona, E.A.d.R.; Petsagkourakis, P.; Bradford, E.; Graciano, J.E.A.; Chachuat, B. Real-time optimization meets Bayesian optimization and derivative-free optimization: A tale of modifier adaptation. Comput. Chem. Eng. 2021, 147, 107249. [Google Scholar] [CrossRef]
  19. Zhou, P.; El-Gohary, N. Semantic information alignment of BIMs to computer-interpretable regulations using ontologies and deep learning. Adv. Eng. Inform. 2021, 48, 101239. [Google Scholar] [CrossRef]
  20. Sun, L.-X.; Xie, Y.; Song, X.-H.; Wang, J.-H.; Yu, R.-Q. Cluster analysis by simulated annealing. Comput. Chem. 1994, 18, 103–108. [Google Scholar] [CrossRef]
  21. Zhang, Y.; Huang, G. Traffic flow prediction model based on deep belief network and genetic algorithm. IET Intell. Transp. Syst. 2018, 12, 533–541. [Google Scholar] [CrossRef]
  22. Di Francescomarino, C.; Dumas, M.; Federici, M.; Ghidini, C.; Maggi, F.M.; Rizzi, W.; Simonetto, L. Genetic algorithms for hyperparameter optimization in predictive business process monitoring. Inf. Syst. 2018, 74, 67–83. [Google Scholar] [CrossRef]
  23. Perera, R.; Guzzetti, D.; Agrawal, V. Optimized and autonomous machine learning framework for characterizing pores, particles, grains and grain boundaries in microstructural images. Comput. Mater. Sci. 2021, 196, 110524. [Google Scholar] [CrossRef]
  24. Lorenzo, P.R.; Nalepa, J.; Ramos, L.S.; Pastor, J.R. Hyper-parameter selection in deep neural networks using parallel particle swarm optimization. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO), Berlin, Germany, 15–19 July 2017; pp. 1864–1871. [Google Scholar] [CrossRef]
  25. Djenouri, Y.; Srivastava, G.; Lin, J.C.-W. Fast and Accurate Convolution Neural Network for Detecting Manufacturing Data. IEEE Trans. Ind. Inform. 2021, 17, 2947–2955. [Google Scholar] [CrossRef]
  26. Öztürk, M.M.; Cankaya, I.A.; Ipekçi, D. Optimizing echo state network through a novel fisher maximization based stochastic gradient descent. Neurocomputing 2020, 415, 215–224. [Google Scholar] [CrossRef]
  27. Hu, T.; Khishe, M.; Mohammadi, M.; Parvizi, G.-R.; Karim, S.H.T.; Rashid, T.A. Real-time COVID-19 diagnosis from X-Ray images using deep CNN and extreme learning machines stabilized by chimp optimization algorithm. Biomed. Signal Process. Control 2021, 68, 102764. [Google Scholar] [CrossRef]
  28. Kalita, D.J.; Singh, V.P.; Kumar, V. A dynamic framework for tuning SVM hyper parameters based on Moth-Flame Optimization and knowledge-based-search. Expert Syst. Appl. 2021, 168, 114139. [Google Scholar] [CrossRef]
  29. Cervantes, J.; Garcia-Lamont, F.; Rodríguez-Mazahua, L.; Lopez, A. A comprehensive survey on support vector machine classification: Applications, challenges and trends. Neurocomputing 2020, 408, 189–215. [Google Scholar] [CrossRef]
  30. Wu, C.; Khishe, M.; Mohammadi, M.; Karim, S.H.T.; Rashid, T.A. Evolving deep convolutional neutral network by hybrid sine–cosine and extreme learning machine for real-time COVID19 diagnosis from X-ray images. Soft Comput. 2021, 1–20. [Google Scholar] [CrossRef]
  31. Wang, X.; Gong, C.; Khishe, M.; Mohammadi, M.; Rashid, T.A. Pulmonary Diffuse Airspace Opacities Diagnosis from Chest X-Ray Images Using Deep Convolutional Neural Networks Fine-Tuned by Whale Optimizer. Wirel. Pers. Commun. 2022, 124, 1355–1374. [Google Scholar] [CrossRef]
  32. Yutong, G.; Khishe, M.; Mohammadi, M.; Rashidi, S.; Nateri, M.S. Evolving Deep Convolutional Neural Networks by Extreme Learning Machine and Fuzzy Slime Mould Optimizer for Real-Time Sonar Image Recognition. Int. J. Fuzzy Syst. 2021, 24, 1371–1389. [Google Scholar] [CrossRef]
  33. Khishe, M.; Caraffini, F.; Kuhn, S. Evolving Deep Learning Convolutional Neural Networks for Early COVID-19 Detection in Chest X-ray Images. Mathematics 2021, 9, 1002. [Google Scholar] [CrossRef]
  34. Chen, F.; Yang, C.; Khishe, M. Diagnose Parkinson’s disease and cleft lip and palate using deep convolutional neural networks evolved by IP-based chimp optimization algorithm. Biomed. Signal Process. Control 2022, 77, 103688. [Google Scholar] [CrossRef]
  35. Yang, X.-S.; Deb, S. Cuckoo search: Recent advances and applications. Neural Comput. Appl. 2014, 24, 169–174. [Google Scholar] [CrossRef] [Green Version]
  36. Ozcan, T.; Basturk, A. Transfer learning-based convolutional neural networks with heuristic optimization for hand gesture recognition. Neural Comput. Appl. 2019, 31, 8955–8970. [Google Scholar] [CrossRef]
  37. Freitas, D.; Lopes, L.G.; Morgado-Dias, F. Particle Swarm Optimisation: A Historical Review Up to the Current Developments. Entropy 2020, 22, 362. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  38. Xue, J.; Shen, B. A novel swarm intelligence optimization approach: Sparrow search algorithm. Syst. Sci. Control Eng. 2020, 8, 22–34. [Google Scholar] [CrossRef]
  39. Lynn, N.; Suganthan, P.N. Ensemble particle swarm optimizer. Appl. Soft Comput. 2017, 55, 533–548. [Google Scholar] [CrossRef]
  40. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
  41. Gašperov, B.; Begušić, S.; Šimović, P.P.; Kostanjčar, Z. Reinforcement Learning Approaches to Optimal Market Making. Mathematics 2021, 9, 2689. [Google Scholar] [CrossRef]
  42. Wu, X.; Sahoo, D.; Hoi, S.C.H. Recent advances in deep learning for object detection. Neurocomputing 2020, 396, 39–64. [Google Scholar] [CrossRef] [Green Version]
  43. Trappey, C.V.; Trappey, A.J.C.; Lin, S.C.-C. Intelligent trademark similarity analysis of image, spelling, and phonetic features using machine learning methodologies. Adv. Eng. Inform. 2020, 45, 101120. [Google Scholar] [CrossRef]
  44. Escalante, H.J. Automated Machine Learning—A Brief Review at the End of the Early Years. In Automated Design of Machine Learning and Search Algorithms; Natural Computing Series; Pillay, N., Qu, R., Eds.; Springer: Cham, Switzerland, 2021; pp. 11–28. [Google Scholar] [CrossRef]
  45. Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
  46. Schneider, P.; Biehl, M.; Hammer, B. Hyperparameter learning in probabilistic prototype-based models. Neurocomputing 2010, 73, 1117–1124. [Google Scholar] [CrossRef] [Green Version]
  47. Baldominos, A.; Saez, Y.; Isasi, P. A Survey of Handwritten Character Recognition with MNIST and EMNIST. Appl. Sci. 2019, 9, 3169. [Google Scholar] [CrossRef] [Green Version]
  48. Kido, D.; Fukuda, T.; Yabuki, N. Assessing future landscapes using enhanced mixed reality with semantic segmentation by deep learning. Adv. Eng. Inform. 2021, 48, 101281. [Google Scholar] [CrossRef]
  49. Omri, M.; Abdel-Khalek, S.; Khalil, E.M.; Bouslimi, J.; Joshi, G.P. Modeling of Hyperparameter Tuned Deep Learning Model for Automated Image Captioning. Mathematics 2022, 10, 288. [Google Scholar] [CrossRef]
  50. Quiroz, J.; Baumgartner, R. Interval Estimations for Variance Components: A Review and Implementations. Stat. Biopharm. Res. 2019, 11, 162–174. [Google Scholar] [CrossRef]
  51. Zhang, J.; Kou, G.; Peng, Y.; Zhang, Y. Estimating priorities from relative deviations in pairwise comparison matrices. Inf. Sci. 2021, 552, 310–327. [Google Scholar] [CrossRef]
Figure 1. Steps of HSSA.
Figure 1. Steps of HSSA.
Mathematics 10 03019 g001
Figure 2. LetNet-5 neural network model.
Figure 2. LetNet-5 neural network model.
Mathematics 10 03019 g002
Figure 3. Complex neural network model.
Figure 3. Complex neural network model.
Mathematics 10 03019 g003
Figure 4. MNIST dataset example.
Figure 4. MNIST dataset example.
Mathematics 10 03019 g004
Figure 5. Result of the performance verification experiment on MNIST.
Figure 5. Result of the performance verification experiment on MNIST.
Mathematics 10 03019 g005
Figure 6. Five Flowers dataset example.
Figure 6. Five Flowers dataset example.
Mathematics 10 03019 g006
Figure 7. Result of the performance verification experiment on Five Flowers.
Figure 7. Result of the performance verification experiment on Five Flowers.
Mathematics 10 03019 g007
Figure 8. Results of the meaning verification experiment on MNIST.
Figure 8. Results of the meaning verification experiment on MNIST.
Mathematics 10 03019 g008
Figure 9. Results of the meaning verification experiment on Five Flowers.
Figure 9. Results of the meaning verification experiment on Five Flowers.
Mathematics 10 03019 g009
Figure 10. Standard deviations of PSO, SSA, and HSSA.
Figure 10. Standard deviations of PSO, SSA, and HSSA.
Mathematics 10 03019 g010
Table 1. Settings of all algorithms.
Table 1. Settings of all algorithms.
MethodSetup
Random searchCompletely random
Bayesian OptimizationTree Parzen Estimator
Gaussian process
EI function
CMA-ESInitial step: σ(0) = 0.618 (ub-lb)
Initial evolutionary path: pσ(0) = 0, pc(0) = 0
Initial covariance matrix: C = I
SAInitial temperature: T0 = 100
Descent rate: α = 0.99
GAVariation rate: Pm = 0.2
Roulette wheel selection
PSOInertia weight: ω = 0.6
Learning factors: c1 = 2, c2 = 2
SSADiscoverer ratio: 20%
Detective ratio: 10%
Alert threshold: 0.8
HSSADiscoverer ratio: 20%
Detective ratio: 10%
Alert threshold: 0.8
Inertia weight: ω = 0.6
Learning factors: c1 = 2, c2 = 2
Table 2. Range of hyperparameters to be optimized on MNIST.
Table 2. Range of hyperparameters to be optimized on MNIST.
NameRange
Number of F1 units128–1024
Number of F2 units128–1024
L2 weight decay0.0001–0.01
Batch size16–128
Learning rate0.0001–0.01
Dropout rate0.1–0.5
Table 3. Parameter settings of the LetNet-5 convolutional neural network that remain unchanged.
Table 3. Parameter settings of the LetNet-5 convolutional neural network that remain unchanged.
NameValue
Epochs10
InputShape: 28 × 28; Dimensions: 1
Convolution layer 1Size: 5 × 5; Strides: 1
Pooling layer 1Size: 2 × 2; Strides: 2
Convolution layer 2Size: 5 × 5; Strides: 1
Pooling layer 2Size: 2 × 2; Strides: 2
Activation functionRelu; Softmax
Table 4. Settings of each algorithm in the performance verification experiment on MNIST.
Table 4. Settings of each algorithm in the performance verification experiment on MNIST.
MethodSetup
Random search700 iterations
Bayesian Optimization700 iterations
CMA-ES700 iterations
SA700 iterations
GA50 initial individuals; 700 generations
HSSA10 individuals per generation; 70 generations
Table 5. The range of hyperparameters to be optimized on Five Flowers.
Table 5. The range of hyperparameters to be optimized on Five Flowers.
NameRange
Number of F1 units128–1024
Number of F2 units128–1024
Number of F3 units128–1024
Number of F4 units128–1024
L2 weight decay0.0001–0.01
Batch size16–128
Learning rate0.0001–0.01
F1 Dropout rate0.1–0.5
F2 Dropout rate0.1–0.5
F3 Dropout rate0.1–0.5
F4 Dropout rate0.1–0.5
Table 6. Parameter settings of the complex convolutional neural network that remain unchanged.
Table 6. Parameter settings of the complex convolutional neural network that remain unchanged.
NameValue
Epochs20
InputShape: 32 × 32; Dimension: 3
Convolution layer 1Size: 3 × 3; Strides: 2
Pooling layer 1Size: 2 × 2; Strides: 2
Convolution layer 2Size: 3 × 3; Strides: 2
Pooling layer 2Size: 2 × 2; Strides: 2
Convolution layer 3Size: 3 × 3; Strides: 1
Convolution layer 4Size: 3 × 3; Strides: 1
Convolution layer 5Size: 3 × 3; Strides: 1
Pooling layer 3Size: 2 × 2; Strides: 2
Activation functionRelu; Softmax
Table 7. Settings of each algorithm in the experiment on Five Flowers.
Table 7. Settings of each algorithm in the experiment on Five Flowers.
MethodSetup
Random search700 iterations
Bayesian Optimization700 iterations
CMA-ES700 iterations
SA700 iterations
GA50 initial individuals; 700 generations
HSSA10 individuals per generation; 70 generations
Table 8. Settings of each algorithm in the meaning verification experiment.
Table 8. Settings of each algorithm in the meaning verification experiment.
MethodSetup
SSA10 individuals per generation; 70 generations
PSO10 individuals per generation; 70 generations
HSSA10 individuals per generation; 70 generations
Table 9. Results analysis on MNIST.
Table 9. Results analysis on MNIST.
MethodMean ErrorMinimum ErrorNumber of Iterations
Random search0.01190.0115277
Bayesian Optimization0.01000.009762
CMA-ES0.01070.0102612
SA0.00990.009668
GA0.01090.0107404
PSO0.01020.0098330
SSA0.01060.0104230
HSSA0.00970.0086530
Table 10. Results analysis on Five Flowers.
Table 10. Results analysis on Five Flowers.
MethodMean ErrorMinimum ErrorNumber of Iterations
Random search0.35370.3148312
Bayesian Optimization0.28370.2692371
CMA-ES0.28950.2830580
SA0.75550.75551
GA0.32500.2869570
PSO0.31130.297390
SSA0.30630.2957190
HSSA0.27140.2473460
Table 11. Positions of individuals of PSO in the experiment on MNIST.
Table 11. Positions of individuals of PSO in the experiment on MNIST.
BsL2F1F2DrLr
11270.00327163400.100.0008
21270.00327163400.100.0008
31270.00327163400.100.0008
41270.00327163400.100.0008
51270.00327163400.100.0008
61270.00327163400.100.0008
71270.00327163400.100.0008
81270.00327163400.100.0008
91280.00175214090.170.0029
10980.003510035070.240.0014
Table 12. Positions of individuals of SSA in the experiment on MNIST.
Table 12. Positions of individuals of SSA in the experiment on MNIST.
BsL2F1F2DrLr
11280.0100102410240.500.0100
21070.00849549730.410.0094
3990.009710179120.470.0086
4680.000657910150.150.0059
5970.00296292830.490.0031
6850.00613584200.300.0089
7610.00656904210.360.0099
8510.00744662120.340.0011
9910.00663066770.290.0055
10270.00814473790.350.0004
Table 13. Positions of individuals of HSSA in the experiment on MNIST.
Table 13. Positions of individuals of HSSA in the experiment on MNIST.
BsL2F1F2DrLr
1160.00011281280.100.0001
2210.00011321300.110.0001
3190.00081461830.130.0001
4170.00778389590.160.0066
5720.00203109140.340.0089
61230.00664364980.280.0041
7450.00364511800.230.0003
81110.00373149240.140.0024
91160.005810196580.450.0009
10420.00469576140.430.0024
Table 14. Positions of individuals of PSO in the experiment on Five Flowers.
Table 14. Positions of individuals of PSO in the experiment on Five Flowers.
BsL2F1F2F3F4Dr1Dr2Dr3Dr4Lr
1680.00105441372412320.500.500.500.500.0010
2680.00105441372412320.500.500.500.500.0010
3680.00105441372412320.500.500.500.500.0010
4680.00105441372412320.500.500.500.500.0010
5680.00105441372412320.500.500.500.500.0010
6680.00105441372412320.500.500.500.500.0010
7680.00105441372412320.500.500.500.500.0010
8830.00104721893801760.450.430.490.480.0010
9490.00387184444262640.340.400.370.390.0007
10720.00273852351923700.250.380.320.470.0015
Table 15. Positions of individuals of SSA in the experiment on Five Flowers.
Table 15. Positions of individuals of SSA in the experiment on Five Flowers.
BsL2F1F2F3F4Dr1Dr2Dr3Dr4Lr
1610.00121281841841840.480.470.500.470.0014
21190.00011534556165100.260.400.350.250.0022
3780.000899720848210100.390.230.200.160.0002
4610.00156307981767030.470.480.470.460.0090
5660.00488327453894460.480.430.180.280.0046
6670.00604329445936310.270.370.220.240.0002
7640.00977711993264540.430.100.140.320.0093
8210.00205984804668300.430.170.490.490.0030
9510.00054267379578700.360.250.310.150.0094
10430.00576107228109080.490.450.110.270.0008
Table 16. Positions of individuals of the HSSA in the experiment on Five Flowers.
Table 16. Positions of individuals of the HSSA in the experiment on Five Flowers.
BsL2F1F2F3F4Dr1Dr2Dr3Dr4Lr
11280.00015724975552220.410.410.410.410.0010
2350.00213545005731930.430.300.330.350.0080
31060.00321813264816550.230.260.480.130.0034
4550.00401387142442650.130.420.310.130.0059
5280.00592948582761500.220.340.170.140.0007
6630.00478565202112850.490.170.470.270.0008
7230.00962415756479660.390.370.210.190.0026
8430.00707154104893780.490.140.240.450.0068
9740.00312636186521580.340.300.400.250.0081
10280.00289225051626170.340.480.150.120.0028
Table 17. Mean and standard deviation of 5 experiments on MNIST.
Table 17. Mean and standard deviation of 5 experiments on MNIST.
ExperimentMean ErrorMinimum Error
Experiment 10.00950.0089
Experiment 20.01040.0092
Experiment 30.00940.0085
Experiment 40.00990.0081
Experiment 50.01080.0093
Mean Value0.01000.0088
Standard Deviation0.000530.00048
Table 18. Mean and standard deviation of 5 experiments on Five Flowers.
Table 18. Mean and standard deviation of 5 experiments on Five Flowers.
ExperimentMean ErrorMinimum Error
Experiment 10.26970.2457
Experiment 20.28210.2472
Experiment 30.27490.2464
Experiment 40.27010.2403
Experiment 50.26630.2396
Mean Value0.27260.2438
Standard Deviation0.005480.00322
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Fan, Y.; Zhang, Y.; Guo, B.; Luo, X.; Peng, Q.; Jin, Z. A Hybrid Sparrow Search Algorithm of the Hyperparameter Optimization in Deep Learning. Mathematics 2022, 10, 3019. https://doi.org/10.3390/math10163019

AMA Style

Fan Y, Zhang Y, Guo B, Luo X, Peng Q, Jin Z. A Hybrid Sparrow Search Algorithm of the Hyperparameter Optimization in Deep Learning. Mathematics. 2022; 10(16):3019. https://doi.org/10.3390/math10163019

Chicago/Turabian Style

Fan, Yanyan, Yu Zhang, Baosu Guo, Xiaoyuan Luo, Qingjin Peng, and Zhenlin Jin. 2022. "A Hybrid Sparrow Search Algorithm of the Hyperparameter Optimization in Deep Learning" Mathematics 10, no. 16: 3019. https://doi.org/10.3390/math10163019

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop