1. Introduction
With the rapid development of the information revolution and cyberspace, the Internet has greatly promoted the prosperity and progress of the economy and society. It has brought great convenience to people, along with new security risks and challenges. Traditional network security defense systems provide mainly passive defense based on firewalls and anti-virus software. These systems have huge defects in preventing security threats: they cannot deal with new viruses and cannot prevent attacks in advance. Once a system is invaded, it can suffer huge losses. As the final step of big data incorporated with AI-enabled cyber security situational awareness, network security situation prediction can analyze the previous and existing state of the network, and then can predict the future situation. It can formulate safe and effective preventive measures before the network is attacked, and shift from passive defense to active defense that can perform dynamic analysis, real-time monitoring, and trend prediction. Therefore, designing an effective and accurate network security situation prediction model is a key step in the transition from passive defense to active defense.
At present, there are various network security situation prediction models [
1]. Traditional prediction models are divided into gray prediction models [
2,
3], D-S evidence theory models [
4], and artificial intelligence models [
5]. Gray prediction models require accurate mathematical expressions for prediction. They are computationally intensive and can only predict the general trend of a network situation. Meanwhile, D-S evidence theory models are qualitative knowledge models based on expert experience and qualitatively described data. Such models cannot effectively use quantitative data and may face combinatorial explosion. The accuracy of prediction is also biased due to the uncertainty of expert experience. These models cannot meet the situation prediction requirements of large and complex networks. Artificial intelligence models use various neural networks to optimize training on quantitative data, but their learning speed is slow. These models are also prone to overfitting. Zhang et al. [
6] used the gray correlational-entropy method to analyze the correlation of the factors that affect network security, selected the key factors that affect the network security, established the corresponding process equations and prediction equations based on these factors, and predicted the network security situation recursively using Kalman filtering. Although the prediction accuracy is higher than that of the RBF algorithm, the calculation cost is higher. Furthermore, Wang et al. [
7] proposed an improved D-S evidence theory for correlation analysis to fuse the reliability and rationality of the prediction results. The theory reduces the false-alarm rate of the system, but it cannot be used in large-scale networks. Ren et al. [
8] established an improved BP neural network prediction model. The prediction results are consistent with actual situations, but the gradient descent characteristics of the BP neural network result in the algorithm having a lengthy training time. In order to reduce the training time, Zhu et al. [
9] used two nonlinear mathematical modeling methods, the ELM and the multilayer perceptron neural network (MLPNN), to establish a prediction model that also has good adaptability.
In order to seek better algorithms to establish network security situation prediction models, this paper introduces the meta-heuristic search algorithm into an ELM. We also use the unique optimization capability of the meta-heuristic search algorithm to overcome the shortcomings of the ELM itself so that we can realize better training effects. In recent years, many researchers have proposed a variety of meta-heuristic search algorithms and their improved versions. Meta-heuristic search algorithms are a type of bio-inspired optimization algorithm obtained by abstract research on observation and simulation of the natural habits of some biological populations in nature. These algorithms generally have multiple advantages, such as proximity, stability, and adaptability, and are widely used in image processing [
10] and feature selection [
11]. Common meta-heuristic search algorithms include particle-swarm optimization [
12], gray wolf optimization [
13], whale optimization [
14], sine cosine [
15], salp swarm [
16], and sparrow search [
17] algorithms. Meta-heuristic search algorithms have the problem of easily falling into a local optimum, reducing the diversity of the population in later iterations. In response to this problem, many researchers have proposed improvements to various algorithms. For example, Liu et al. [
18] introduced an adaptive leader-follower adjustment strategy to address the problem of unstable solution results of the bottle-ocean sheath group algorithm, which enhanced the stability of the algorithm. Zhou et al. [
19] used cat-mapped chaotic sequences combined with the inverse-solution method instead of randomly generated initial populations in order to avoid the defects of premature convergence of the whale optimization algorithm, which enhanced the whale optimization algorithm in terms of initial population diversity and solution-seeking traversal. Zhou et al. [
20] used tent chaotic mapping to improve the wolf initialization method to make the initial distribution of wolves more uniform and enhance the global search capability of the algorithm. Zhang et al. [
21] proposed an improved whale optimization algorithm (NGS-WOA) based on nonlinear adaptive weights and golden sine operator. Firstly, NGS-WOA introduced nonlinear adaptive weight to enable the search agent to explore the search space adaptively and balance the development and exploration stages. Secondly, the improved golden sine operator was introduced into WOA algorithm. The improved strategy can effectively improve the performance of the algorithm, so that NGS-WOA has the advantages of strong global convergence and avoiding falling into local optimization. Zhang et al. [
22] proposed a new Gaussian mutation operator for the fireworks algorithm, which makes sparks learn from more samples. At the same time, the rule-explosion operator of the fireworks algorithm was combined with the migration operator based on biogeographic optimization (BBO) to increase information sharing. Finally, a new overall selection strategy was adopted to make high-quality solutions possess a high probability of entering the next generation without high computing cost. Cheng et al. [
23] used an improved tent chaos mapping to initialize the population, increased the population diversity, and added an adaptive local search strategy to improve the global search ability. Liang et al. [
24] proposed an improved SSA search algorithm based on adaptive weights and improved boundary constraints. The adaptive weights improve the algorithm’s performance. The adaptive weights improve the convergence speed of the algorithm, and the improved boundary-handling strategy improves the convergence accuracy to a certain extent.
In this paper, we combine the advantages and disadvantages of the ELM and the meta-heuristic search algorithm to improve SSA in the meta-heuristic search algorithm. Then, by combining the improved SSA (ISSA) with an ELM, we propose an ISSA-ELM network security situation prediction model. By comparing the ISSA and six other algorithms on 15 benchmark functions, we verify the superior performance of the improved algorithm. We conduct network situation prediction experiments simultaneously with the traditional ELM algorithm and the GA-ELM algorithm presented by Gokul et al. [
25]. The comparison verifies the practicability and accuracy of our model.
2. Extreme Learning Machine
The extreme learning machine was first proposed by Huang et al. [
26] in 2004. By randomly selecting input-layer weights and hidden-layer biases, and based on a single hidden-layer feedforward neural network, the output-layer weights are calculated and analyzed according to the Moore–Penrose generalized inverse matrix theory. The extreme learning machine has the advantages of requiring few training parameters and being a fast learner with strong generalization ability. Let the number of nodes in the input layer, hidden layer, and output layer of the ELM be
,
and
, respectively. The network structure is shown in
Figure 1.
For a given
arbitrarily different samples,
, where
,
, the output of the ELM is as follows:
where
is the input weight between the input-layer neurons and the hidden layer neurons;
is the output weight between the hidden-layer neurons and the output-layer neurons; is the bias of the hidden-layer neurons, and is the activation function of hidden-layer neurons. The matrix expression of the ELM system is as follows:
where
;
;
.
In order to achieve the final training effect of the ELM, the least-squares solution,
, needs to be obtained so that:
where
is the hidden-layer output matrix of the ELM network and is the expected output matrix of the network’s samples. Finally, the output weight is obtained by solving the formula:
where
is the Moore–Penrose generalized inverse matrix of the output matrix.
It can be concluded that the ELM does not need to use the gradient-descent method when training samples. Compared with the traditional back-propagation neural network that uses the gradient-descent method, our model greatly reduces the training time while retaining more accurate prediction capabilities.
6. Conclusions
To tackle the problem of accuracy in network security situation prediction, we introduce and improve the sparrow search algorithm based on the extreme learning machine and propose the ISSA-ELM model. The ELM neural network can quickly train samples while the ISSA optimizes its initial weights. Together, they can accurately predict the next network security situation. The improved ISSA can overcome the shortcoming of being prone to falling into local optima, it has good global convergence performance and robustness, shows better optimization capabilities, and has better overall performance than the original algorithm.
Experimental comparisons show that the ISSA-ELM model has certain advantages over GA-ELM and SSA-ELM in a real network environment: it has fast convergence speed, and higher prediction accuracy. However, the ISSA-ELM also has shortcomings. For example, there is great uncertainty in the hidden-layer node-selection process. Meanwhile, the sliding-window size is too large, leading to ISSA-ELM being prone to overfitting. Future studies should focus on the number of adaptive hidden-layer nodes required to further improve the convergence speed and prediction accuracy.