Next Article in Journal
Semi-Nonlinear Deep Efficient Reconstruction for Unveiling Linear and Nonlinear Spatial Features of the Human Brain
Next Article in Special Issue
Leveraging Federated Learning for Malware Classification: A Heterogeneous Integration Approach
Previous Article in Journal
Pairing-Free Searchable Encryption for Enhancing Security Against Frequency Analysis Attacks
Previous Article in Special Issue
DeepOP: A Hybrid Framework for MITRE ATT&CK Sequence Prediction via Deep Learning and Ontology
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Network Security Situation Element Extraction Algorithm Based on Hybrid Deep Learning

1
School of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450002, China
2
James Watt School of Engineering, University of Glasgow, Glasgow G128QQ, UK
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(3), 553; https://doi.org/10.3390/electronics14030553
Submission received: 15 December 2024 / Revised: 24 January 2025 / Accepted: 27 January 2025 / Published: 29 January 2025
(This article belongs to the Special Issue AI-Based Solutions for Cybersecurity)

Abstract

:
Accurately extracting network security situation elements is an important basis for improving the situational awareness of industrial Internet security. This paper proposes an industrial internet security situation element extraction algorithm based on a hybrid neural network. Firstly, the powerful local feature extraction ability of convolutional neural networks (CNNs) was used to extract the features of key situation elements, and the obtained features were flattened and then input into long short-term memory networks (LSTMs) to solve the problem of the poor time feature extraction ability of CNNs. Then, the output features of the fully connected layer were input to the backpropagation (BP) network for classification, and LSTM was used to correct the prediction residual of the BP network to optimize the parameters of each module in the model and improve the classification effect and generalization ability. Comparative experimental results show that the accuracy of the model on the KDD Cup99 dataset and SCADA2014 dataset can reach 98.03% and 98.96%, respectively. Compared with other models, the model has higher classification accuracy and can provide more effective indicator data for security situation assessment.

Graphical Abstract

1. Introduction

The industrial Internet is the extension and deepening of the Internet in the industrial field. It is based on Internet technology and has carried out technical optimization and expansion according to the specific needs of the industrial field. Compared with the ordinary Internet, the data processed by the industrial Internet includes not only transaction data and social data but also a large number of sensor data and machine operation data. In addition, the industrial Internet has higher requirements for security, reliability, stability, real-time, and data privacy, because it is directly related to production safety and product quality.
The key to the stable operation of the industrial Internet system lies in security. In recent years, more and more new technologies have been integrated into the Internet system, bringing high efficiency to the industrial Internet. At the same time, a large number of network security issues have also emerged [1]. Massive industrial equipment is widely interconnected, and the industrial Internet has the characteristics of wide coverage, a complex data system, a great impact of security events, a weak foundation of enterprise security protection, and an increasing number of security vulnerabilities year by year [2]. In order to cope with the severe security challenges brought by the rapid development of the industrial Internet, the State Council and the Ministry of Industry and Information Technology attach great importance to the work of industrial Internet security situational awareness and have issued various documents and guidance, requiring attention to and improving the ability of industrial Internet security situational awareness and exploring the use of emerging technologies such as artificial intelligence to improve the level of security protection.
The concept of cyberspace situational awareness (CSA) was first proposed by Tim Bass in 1999. It aims to acquire, understand, and display the security factors that lead to the change in network situation in a large-scale network environment and predict its development trend. The main process of CSA includes data collection, situation element extraction, situation assessment, and prediction. Extracting the elements of a network security situation is the basis of network security situation assessment and prediction. It is necessary to analyze and process various data collected from network security equipment, extract the factors affecting the normal operation of the network, and classify and identify them according to specific rules, so as to identify different types of elements affecting the network security situation [3,4]. The purpose of extracting the elements of the network security situation is to comprehensively evaluate and understand the security status of the network so that the corresponding security measures can be formulated and implemented. This process involves the collection, analysis, and interpretation of various security-related data in the network to identify potential security risks and vulnerabilities. It usually needs to use advanced technologies such as big data analysis, artificial intelligence, and visualization to help analysts extract valuable information from massive data and form a macro understanding of the network security situation. Intrusion detection, on the other hand, focuses on identifying and preventing specific attacks. Its purpose is to protect the network from malicious acts. It mainly relies on security policies, attack signature matching, anomaly detection, and other methods to identify known attacks and abnormal behaviors and trigger responses when abnormal behaviors are detected, so as to prevent or reduce the impact of attacks.
The real-time security situation awareness system can guide the decision-makers to formulate and implement effective security strategies in time to ensure the stability and security of enterprise network operation [5]. The quality of extracted situation elements directly affects the accuracy of situation understanding, assessment, and prediction, thus affecting the network administrator’s cognition of the security situation of the whole network system and the security measures taken. Therefore, accurate and effective extraction of security situation elements is of great significance for network security situation awareness.

2. Relative Works

Extracting elements of network security situations is fundamental to developing an awareness of network security, a task that has attracted the attention of many scholars. Usually, researchers use feature extraction and dimensionality reduction methods to enhance the performance of situation element extraction models [6,7]. Principal component analysis (PCA) is an unsupervised learning method that does not need any label information for feature extraction and dimension reduction. Therefore, researchers often combine PCA with classifiers to extract situation elements. Literature [8] combines PCA with random forest for intrusion detection to improve classification efficiency. In the literature [9], a hybrid recurrent neural network was used to construct an intrusion detection model named SPIDER, and PCA was employed for data dimensionality reduction. Literature [10] combines probabilistic PCA with a generalized additive model to reduce features and capture nonlinear relationships, so as to enhance the understanding of intrusion detection. The disadvantages of PCA in the extraction of industrial Internet situation elements lie in its limited ability to capture the data of nonlinear relationships, its sensitivity to outliers, which may lead to the loss of information, and its high calculation cost in processing large-scale datasets.
In recent years, with the ongoing research into deep learning algorithms, introducing deep learning algorithms into situation element extraction has gradually become a research hotspot [11,12]. In the literature [13], the authors used an automatic denoising autoencoder (DAE) to compress the feature dimension of intrusion data and then used part of the compressed data to train the deep neural network classifier based on the multi-class supervision method. The literature [14] presents a predictive maintenance system designed for industrial Internet of Things (IoT) environments. The system utilizes Nicla Sense ME sensors, a Raspberry Pi-based concentrator for real-time monitoring, and a Long Short-Term Memory (LSTM) machine-learning model for predictive analysis. The attention mechanism is a kind of deep learning technology that is often used to enhance the performance of neural network models. For example, aiming at the problems of model construction and evaluation accuracy, the literature [15] proposes to analyze time series data by using a self-attention mechanism and LSTM neural network to generate security situation prediction and improve the evaluation accuracy. The transformer is a kind of deep learning architecture based on an attention mechanism, which is usually composed of an encoder and decoder; it has the advantages of strong parallel computing ability and high flexibility. The literature [16] proposes the CGAN-transformer model. CGAN is used to solve the problem of unbalanced data samples and improve the accuracy of detection of a small number of samples. Transformers have excellent long-distance feature extraction ability; the combination of the two can effectively improve the accuracy of network traffic detection.
The above methods have made some contributions to the extraction of network security situation elements, but there are still some significant deficiencies. Firstly, these methods have limitations in the ability to capture complex data, and it is difficult to fully mine nonlinear features. Secondly, some methods rely on feature selection and are easily affected by prior knowledge, which affects the adaptability of the model. In addition, the problem of computational complexity restricts the real-time detection ability in high-dimensional data environments [17]. Therefore, it is necessary to explore more advanced feature extraction technology, such as the method based on deep learning. Convolutional neural networks (CNNs) play an important role in the field of deep learning. Its characteristics include automatic feature extraction, spatial invariance, parameter sharing, multi-level feature learning, etc. These characteristics make CNNs have unique advantages in the extraction of network security situation elements and become an important deep learning model. This paper proposes an industrial Internet security situation element extraction model based on a hybrid deep neural network. The method uses data filling to process the original data into a two-dimensional matrix, and the key features are extracted by convolutional neural networks (CNNs). Aiming at the problem that CNNs cannot effectively deal with time series data, LSTM was used to learn time series features from the key features output by CNNs. Finally, the output feature data are input into the BP neural network for classification; the LSTM is used to correct the prediction residual of the BP network. To enhance the effect of situation element extraction.

3. Network Security Situation Element Extraction Algorithm Based on CNN-LSTM-BP

3.1. Convolutional Neural Networks

Convolutional neural networks (CNNs) are a class of feedforward neural networks that employ convolution operations, representing a key algorithm in deep learning [18,19]. In 1998, Yann LeCun introduced LeNet-5, one of the earliest CNN models, applying it to handwritten character recognition on the MNIST dataset. This pioneering work laid the foundation for future CNN research and deep learning [20]. Compared to a general neural network, CNNs introduce the convolutional layer and pooling layer structure, which has more powerful feature extraction ability; the structure of a CNN is shown in Figure 1.
  • Input layer: The input layer receives and preprocesses data for the convolutional layer, ensuring effective feature learning by handling outliers and missing values.
  • Convolution layer: The convolution process takes the position of the receptive field as the benchmark, extracts the features by sliding the convolution kernel at a specific step and multiplying it by the input data [21,22], and then activates the feature mapping through a nonlinear function. The design of convolution layer enables the convolution neural network to automatically learn and capture the spatial hierarchical features in the input data, providing a more meaningful representation for the subsequent network layer. To reduce the number of parameters, CNNs employ a strategy known as “shared parameters”. This parameter-sharing strategy not only helps to realize translation-invariant classification but also makes the model lighter, enhances its scalability, and improves the training speed and generalization ability of the model [23].
  • Pooling layer: After feature extraction, you can choose to add a pooling layer to reduce the representation of the mapping or eliminate redundant features. Therefore, the pooling operation can realize the compression of the original data under the premise of minor changes to the original data, further decrease the number of relevant parameters, and simplify the network’s calculations [24].
CNNs are capable of automatically learning the hierarchical features of input data without the need for manual feature engineering. The convolutional kernels in the convolutional layers use the same parameters when sliding across the entire input data, which significantly decreases the complexity of the operation. Additionally, they can be easily scaled according to the data format, which is crucial for network security systems that necessitate real-time monitoring and response. This paper leverages the characteristics of CNNs, such as automatic feature extraction, parameter sharing, and strong scalability, in network security situation analysis. It extracts features from data sources like network traffic and system logs to identify abnormal behaviors or potential security threats. However, CNNs are adept at capturing local features and not effectively recognizing anomalies that involve time series.

3.2. Long Short-Term Memory Neural Networks

Long short-term memory neural networks (LSTMs) are a unique type of recurrent neural network. Although recurrent neural networks solve the limitation of traditional neural networks in that they cannot associate sequence information, they are prone to gradient disappearance due to the characteristics of parameter sharing and multiple continuous multiplications during backpropagation, which makes the model difficult to train or fail to converge. LSTMs address the problem, and their core idea is to pass information through a cell state and decide the way information is transmitted through gating units, which consist of the input gate X, the forget gate C, and the output gate H [25,26]. The input gate controls what information updates the cell state, while the forget gate determines which information will be discarded, and the output gate determines the output based on the current input cell state. These gates continue throughout the training process with weights learned, thereby enhancing the network’s memory capacity and preventing forgetting. The structure of the LSTM network is shown in Figure 2. LSTMs are essentially the reuse of a neural network along the time axis; at time T = 1, it is an ordinary neural network. As time progresses, the hidden information Ht, Ct from the previous training cycle is reused in the next training cycle.
The LSTM architecture enables more effective learning of long-term dependencies, making it better equipped to deal with long-sequence data. As such, in this article, the data processed by the CNN convolutional layer is restructured and then fed into the LSTM network to alleviate the impact of the drawbacks such as catastrophic forgetting and vanishing gradients inherent in CNN models.

3.3. Backpropagation Algorithm

The backpropagation (BP) algorithm is a multi-layer feedforward artificial neural network training method. It trains the neural network through optimization methods and gradient search technology to minimize the error between input and output and the expected output value [27]. Theoretically, a three-layer BP neural network can approximate any nonlinear continuous function with an error that can be made arbitrarily small [28]. The BP algorithm typically consists of signal forward propagation and error backward propagation, as shown in Figure 3.
The backpropagation (BP) algorithm adjusts the model’s parameters based on the magnitude of the error during the continuous forward propagation process, enabling the model to classify or regress input data more accurately. At the same time, the LSTM module is used to effectively correct the prediction residuals of the BP neural network, achieving higher precision in classification prediction.

3.4. A Cybersecurity Situational Element Extraction Model Based on CNN-LSTM-BP

The cybersecurity situational element extraction model based on CNN-LSTM-BP mainly consists of six modules (Figure 4), in the following order: the data input module, the convolution module, the LSTM module, the fully connected module, the BP (backpropagation) module, and the output module.
The model algorithm mainly includes the following steps:
(1)
Preprocess data samples.
The original training data are processed by batch classification, data format conversion, and other operations to ensure that the input data format meets the requirements of the neural network and the quality of the data.
(2)
Convolution layer and pooling layer.
The preprocessed data passes through the convolution layer and the pooling layer in turn, and the key information of the data is extracted through the movement of the convolution kernel. The convolution kernel is composed of a weight matrix and a bias. The receptive field is the size of the area mapped by the pixels on the output feature map of each layer on the input image, that is, the sliding window with the same size as the convolution kernel on the input feature. The convolution operation is used to extract the local features of the data. The convolution operation formula is
x j l = f i p j x i l 1 k i j l + b j l .
In this formula, f ( · ) represents the activation function and P j denotes the local receptive field corresponding to the convolution kernel. x i l 1   is the value of the feature from the l 1 -th layer at the i -th window. k i j l is the weight of the convolutional kernel at position i , j   on layer l . b j l is the bias of the j -th window of the l -th layer. The formula extracts the local features of input data through local connection, parameter sharing, and translation invariance of weight. This method not only reduces the model parameters and computational complexity but also improves the generalization ability and robustness to input changes.
The pooling layer is used to reduce the spatial dimension of the feature map, reduce the amount of calculation, and maintain important features. The pool operation formula adopted is
x j l = f p o o l ( x j l 1 ) + b j l .
In this formula, f ( · ) represents the activation function, x j l 1 represents the value of the j -th window in the input feature of the l 1 -th layer, b j l is the bias of the j -th window of the l -th layer, and p o o l ( · ) represents the sampling function.
In order to prevent overfitting, a relatively simple convolution module is built in this paper. After each convolution layer, a batch normalization layer and a ReLU activation function are added. The batch normalization layer accelerates the convergence speed, reduces the risk of overfitting, and improves the stability of the model by standardizing the input of data (reducing the internal covariate offset) [29]. While ReLU introduces nonlinear characteristics, which enables the neural network to learn more complex function mappings. It has simple calculation, fast convergence speed, and is linear in the positive interval, which helps to reduce the gradient disappearance problem [30].
(3)
LSTM layer.
In order to adapt to the network structure of LSTM, the view function is used to reshape the data before inputting the data to LSTM. Then, the gating mechanism in LSTM is used to extract temporal features to solve the gradient disappearance problem, allowing the model to learn long-term dependencies, so as to improve the memory ability and temporal feature extraction ability of the whole model.
(4)
Full connection layer and BP layer.
The last time step of the LSTM layer is selected as the input feature of the module, and the input feature is predicted and classified.
In the forward propagation, the information enters the BP network from the input layer, and through the nonlinear transformation of each layer in turn, the final result is obtained. There will be errors between the results of the output layer and the actual results. At this time, it is necessary to adjust the network parameters, such as the weight matrix and bias, according to the error and enter the backpropagation process of the error. The calculation formula of nonlinear transformation in the forward propagation process is as follows:
a l = f w l a l 1 + b l .
In this formula, f ( · ) represents the activation function, w [ l ] represents the weight matrix of the l-th layer of the BP network, a l 1 represents the signal of the neurons in the l − 1-th layer, and b l represents the bias matrix of the l-th layer. The signal of the neurons in the l-th layer is obtained by multiplying the signal from the l − 1-th layer by the corresponding weight matrix and then processed through the activation function. Through this formula, the data are mapped to the nonlinear space so as to achieve the effect of classification.
The error calculation formula is used to evaluate the prediction performance of the model and guide the updating of model parameters in the training process. The backpropagation process of error mainly includes error calculation and parameter updating. If the mean square error is used, the error is calculated as follows:
E = 1 2 k = 1 m ( y k T k ) 2 .
In this formula, y k   represents the desired output and T k represents the actual output, with m denoting the number of neurons.
The module will output tensor data containing logarithmic probability and select the prediction value with the largest probability as the prediction category. Dropout functions are placed after the full connection layer and BP layer, respectively, which can randomly discard some network connections during the training process, reduce the dependence of the model on the training data, so as to improve the generalization ability of the model and prevent overfitting [31].
(5)
Error propagation.
The error between the model’s predicted and actual outputs is backpropagated, and a Stochastic Gradient Descent (SGD) optimizer adjusts hyperparameters. The optimizer selects training samples randomly, computes gradients, and updates model parameters based on gradient values and the learning rate, minimizing the loss function to enhance model performance.
(6)
Model evaluation.
After the model is trained, in order to comprehensively and objectively measure the performance and generalization ability of the model, the independent test data are used to conduct a detailed and multi-dimensional evaluation of the effect of the model.
In this study, the grid search strategy is used to find the best combination of super parameters. For the KDD Cup99 dataset, the convolution kernel size is set to 3, the step size is set to 1, the filling layer is set to 1, the parameters of the batch normalization layer are set to 6 and 16, the pooling parameter is set to 2, the input_dim, hidden_size, and num_layers in the LSTM network are set to 9, 128, and 1, respectively (the hidden_size in the SCADA2014 dataset is set to 1), the drop function parameter in the BP network is set to 0.5, and the other parameters are set according to the data format.
This algorithm combines CNN, LSTM networks, and BP neural networks, with each component responsible for different tasks: CNNs extract local features, LSTMs capture temporal dependencies, and BP maps extracted features to the correct categories and continuously optimizes the network. Techniques like batch normalization, ReLU, and dropout further enhance the model’s performance and generalization.

4. Experimental Results and Analysis

The experiment uses two datasets to test the effectiveness of the algorithm: the public KDD Cup99 dataset from MIT Lincoln Laboratory and the real data recorded by the SCADA system of the natural gas pipeline test platform at Mississippi State University, referred to as SCADA2014 in this paper. The KDD Cup99 dataset provides a wide range of network attack scenarios, while SCADA system data provides a practical industrial control system environment. Through testing on these two datasets, we can comprehensively evaluate the performance of the algorithm in different environments, so as to ensure the robustness and practicability of the algorithm.

4.1. Data Preprocessing and Parameter Setting

Each record in the KDD Cup99 dataset contains 41 features for a total of four types of attacks [32], which are used to describe various aspects of network connectivity. The KDD Cup 99 dataset provides a unified performance evaluation benchmark in the field of network security and provides a wealth of comparison data for subsequent research, which is helpful to find the shortcomings of existing algorithms and promote the development and improvement of new algorithms in experiments. Based on the demand for computing resources, time efficiency, and model performance, while considering the fairness and reliability of the experiment, 10% of the dataset is used for the experiment. We use the fetch_kddcup99 function to randomly extract a 10% subset of the data; it contains the statistical characteristics of the original data.
The samples in the SCADA2014 dataset truly reflect the complexity and potential security threats in the industrial Internet environment and provide us with an experimental platform close to the actual application scenario. At the same time, the dataset has high feature dimensions, comprehensive attack types, and a wide range of application scenarios, which has important application value. The SCADA2014 dataset contains about 10,000 pieces of data; every sample in the dataset contains 26 features and one tag value, a total of seven types of attacks. After dimensionality reduction, a dataset with 17 effective eigenvalues is finally obtained [33].
For the two datasets, this experiment uses the ratio of 8:2 to divide the training dataset and the test dataset and preprocesses the data before training. For the SCADA2014 dataset, the processing steps include 1. feature and label separation; 2. filling the characteristic data with constant values; 3. reshaping the features into four-dimensional tensors (128, 1, 5, and 5); 4. to adapt to the network model, converting the data into tensor-dataset objects. For the KDD Cup99 dataset, the processing method is similar, but the label data needs to be encoded as an integer, and the shape of the feature reshape is (128, 1, 8, 8). The preprocessed data are input into the model for training. After training, the model enters the prediction stage, and finally the prediction results are classified. The specific attack types and codes contained in each category in the dataset are shown in Table 1 and Table 2.

4.2. Comparative Experiment and Result Analysis

The experiment was conducted on a computer with 8GB of RAM and an Intel(R) Core (TM) i5-8250U CPU @1.60GHz (LENONO, Zhengzhou, China). It ran Windows 10, utilized Python version 3.12.0 for programming, and employed the PyTorch platform to build the model framework. The parameter optimizer used was SGD, and the stepLR function is used to optimize the parameters, with a weight decay regularization coefficient of 0.001. The training was set for five rounds; the data batch size is 128. The loss function is calculated as follows:
C E p , q = i = 1 c p i log q i ,
where c represents the number of classes, C E p , q represents the loss between   p and q , and p i and q i   represent the true value and predicted value of the model, respectively.
Each evaluation index is calculated as follows:
A c c u r a c y ( A C C ) = T N + T P T P + F N + F P + T N ,
R e c a l l = T P T P + F N ,
P r e c i s i o n = T P T P + F P ,
F 1 s c o r e = 2 R e c a l l P r e c i s i o n R e c a l l + P r e c i s i o n ,
where TP and TN, respectively, represent the number of samples correctly classified into attack class and non-attack class, FN and FP represent the number of samples incorrectly classified into attack class and normal class. Considering the unbalanced number of samples in the dataset, the evaluation index of this experiment adopts the weighted average method and takes the proportion of the number of samples of each type in the whole as the weight [34]. The weighted average method can be used to evaluate the performance of the model in the case of data imbalance and alleviate the impact of data imbalance on the performance evaluation of the model to a certain extent. In this method, the weight should be determined first, and the weight should be set according to the proportion of the number of samples in the category. The category with fewer samples has a higher weight. Secondly, the evaluation index shall be calculated, and the accuracy, recall, and F1-score of each category shall be calculated by using the confusion matrix. Finally, the weighted average value shall be calculated, and the weighted index value shall be obtained by multiplying the index value of each category by the corresponding weight, and then the weighted index value shall be accumulated. In view of the nature of the data, this experiment was designed as a five-category task and an eight-category task.
  • Effect comparison of situation element extraction models
In the experiment, the proposed CNN-LSTM-BP model is compared with several other scene feature extraction models in terms of ACC, recall, precision, and F1-score. Among them, probabilistic neural network (PNN) is a widely used situation element extraction classifier [35]. In addition, k-nearest neighbor (KNN) and random forest (RF) are commonly used classification methods in the field of machine learning [36,37]. In addition, the deep learning models such as transformer [38], CGAN-transformer [16], CNN-GRU [39], Res-CNN-SRU [40], and CNN are compared with the methods in this paper to evaluate their effectiveness in the extraction of scene elements. The experimental results of this experiment on two datasets are shown in Table 3 and Table 4, respectively.
By observing Table 3 and Table 4, it is found that as a relatively simple method for extracting situation elements, the RF method has poor performance in all indicators of the two datasets, which is related to its low feature extraction ability. Compared with the deep learning model, random forest, as an integrated learning method, has limited model complexity and expression ability. In addition, it may be easier to overfit in the face of a large number of data, which is the reason why it performs slightly worse on KDD Cup99 dataset. Observing other data, it is found that the effect of most mixed models is better than that of independent models. As an integrated model, the method proposed in this paper is superior to other methods in all evaluation indexes, which shows its effectiveness in the task of element extraction.
  • Comparison of classification effects of typical CNN models
In order to verify the effect of the convolutional neural network model optimized by LSTM-BP, this experiment added LSTM-BP to the classical convolutional neural network model LeNet5 and MiniVGGNet, respectively, and compared the effect with the original CNN model. In order to adapt to the data format, the model parameters are slightly adjusted. The parameters of the LeNet5 [20] and MiniVGGNet [41] models are shown in Table 5. In order to ensure the fairness of the experiment, several experiments were carried out to test its performance, and the test results are shown in Table 5.
It can be seen from Table 6 that the convolutional neural network model has improved the performance of the two datasets to varying degrees after fusing the LSTM and BP modules. Although the CNN model defined in this paper performs slightly worse than LeNet5 on the KDD Cup99 dataset, it has the best performance on both datasets after adding LSTM and BP modules. This shows that LSTM and BP modules can improve the classification accuracy of the convolutional neural network model, and this method has a certain generalization ability.
  • Ablation experiment
In the research of deep learning models, ablation experiments are an important method to systematically analyze and understand the specific impact of different components or factors in neural networks on the performance of the model. By selectively removing or modifying some parts of the network, researchers can observe the impact of these changes on the model output so as to identify the key components or parameters that contribute the most to the model performance. Ablation experiments included five steps: defining the neural network structure, training the neural network, determining the ablation object, performing the ablation operation, and comparative analysis. Among them, performing an ablation operation means that the part to be ablated is removed or changed; the same dataset is then used to retrain the model after removing various modules. The ablation objects in this experiment include LSTM and CNN modules, BP and CNN modules, LSTM and BP modules, BP modules, and LSTM modules. After removing different ablation objects from the model, experiments were carried out and compared in terms of ACC, recall, precision, and F1-score. The experimental results are shown in Table 7.
It can be seen from Table 7 that the BP model performs poorly on both datasets. The BP neural network may lack a specially designed structure (such as a convolution layer) to effectively extract features, which may indicate that using it alone is not suitable for such complex situation element extraction tasks. However, the CNN-LSTM-BP model integrated with the BP module has the best performance in terms of ACC, recall rate, accuracy, and F1-score. It is the model with the best overall performance. Removing any module will reduce its accuracy. This shows that in this task, a single model cannot achieve optimal performance, while the CNN model combined with LSTM and BP modules can make full use of the advantages of each model to achieve a better classification effect.

4.3. Algorithm Complexity Analysis

The complexity of the model is mainly concentrated in the convolution layer and LSTM layer. The time complexity of the convolution layer is O ( k 2 · m 2 · C i n · C o u t ) , where k is the size of the convolution kernel and m is the side length of the output characteristics of the convolution layer, C i n and C o u t are the number of input and output channels, respectively. Where m = ( X k + 2 P a d d i n g ) / S t r i d e + 1 , X is the size of the convolution kernel input data. The spatial complexity of the convolution layer is O l = 1 D k l 2 C l 1 C l + l = 1 D m l 2 C l , where D is the number of convolution layers in the model, C l represents the output channel ( C o u t ) of the l t h convolution layer, and C l 1 represents the input channel ( C i n ) of the l t h convolution layer. The time complexity of the LSTM layer is O ( 4 T · i n p u t _ d i m · h i d d e n _ s i z e + 4 T · h i d d e n _ s i z e 2 ) , T is the sequence length, and the space complexity is O ( i n p u t _ d i m · h i d d e n _ s i z e + h i d d e n _ s i z e 2 ) [18].
In general, the time complexity and space complexity of the model are related to the width and depth of the network. The CNN in this model uses a “parameter sharing strategy”, which can reduce the burden of the overall model. In terms of scalability, CNN can process text data and extract local features, while LSTM can process sequence data and capture time features. This structure enables the network to process multimodal data. Each module in the model is independent and can adapt to different data by adjusting network parameters. In addition, the model realizes the whole process from data preprocessing to model deployment, ensuring the efficiency and repeatability of the experiment. This automatic method not only improves the efficiency of the experiment but also helps to reduce human error and ensure the stability and reliability of the model in the industrial scene.

5. Conclusions

Based on the existing network security situation elements extraction model based on CNN, this paper proposes a network security situation elements extraction model based on hybrid deep learning. The model introduces the LSTM module and BP module to make up for the shortcomings of CNNs in processing long sequence data, enhance the accuracy of its classification, and improve the efficiency and accuracy of network security situation element extraction. The results of the precision comparison experiment and ablation experiment show that the proposed method can effectively improve the classification effect of the convolutional neural network model, more accurately classify and extract the elements of the network security situation, and lay a good foundation for network security situation assessment. However, this model lacks the consideration of data privacy, data imbalance, and other issues. Therefore, in the next research, we will further find a more effective method to extract the situation elements and integrate the federal learning technology to extract the network security situation elements while ensuring the data privacy and security of industrial Internet users.

Author Contributions

R.Z., methodology, software, formal analysis, writing—original draft, writing—review and editing; Q.W., software, formal analysis, data curation, writing—original draft; Y.Z., investigation, resources. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China No. 61902361, the Natural Science Foundation of Henan Province No. 202300410508, and the Zhengzhou Collaborative Innovation Project No. 2021ZDPY0106.

Data Availability Statement

Data is contained within the article. The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Barak, I. Critical infrastructure under attack: Lessons from a honeypot. Netw. Secur. 2020, 2020, 16–17. [Google Scholar] [CrossRef]
  2. Yang, J.; Chen, K.; Cao, K.; Guo, X. The core technology analysis of industrial Internet security situational awareness. Cyberspace Secur. 2019, 10, 61–66. [Google Scholar]
  3. Endsley, M.R. Design and evaluation for situation awareness enhancement. In Proceedings of the Human Factors Society Annual Meeting, Sage CA, LA, USA, 24–28 October 1988; pp. 97–101. [Google Scholar]
  4. Bass, T.; Gruber, D. A glimpse into the future of id. Mag. USENIX SAGE 1999, 24, 40–49. [Google Scholar]
  5. You, J.B.; Kim, S.K.; Jun, H.I.; Suh, D.H. A Novel Way of Recognition and Avoidance of Risk Factors in Residential Environments. In Proceedings of the 2015 9th International Conference on Future Generation Communication and Networking (FGCN), Jeju, Republic of Korea, 25–28 November 2015; pp. 45–48. [Google Scholar]
  6. Alavizadeh, H.; Jang-Jaccard, J.; Enoch, S.Y.; Al-Sahaf, H.; Welch, I.; Camtepe, S.A.; Kim, D.D. A survey on cyber situation-awareness systems: Framework, techniques, and insights. ACM Comput. Surv. 2022, 55, 1–37. [Google Scholar] [CrossRef]
  7. Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 2020, 36, 193–202. [Google Scholar] [CrossRef] [PubMed]
  8. Waskle, S.; Parashar, L.; Singh, U. Intrusion detection system using PCA with random forest approach. In Proceedings of the 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2–4 July 2020; pp. 803–808. [Google Scholar]
  9. Udas, P.B.; Karim, M.E.; Roy, K.S. SPIDER: A shallow PCA based network intrusion detection system with enhanced recurrent neural networks. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 10246–10272. [Google Scholar] [CrossRef]
  10. Singh, A.; Nagar, J.; Amutha, J.; Sharma, S. P2CA-GAM-ID: Coupling of probabilistic principal components analysis with generalised additive model to predict the k− barriers for intrusion detection. Eng. Appl. Artif. Intell. 2023, 126, 107137. [Google Scholar] [CrossRef]
  11. Wang, R.; Ma, C.; Wu, P. An intrusion detection method based on federated learning and convolutional neural network. Netinfo Secur. 2020, 20, 47–54. [Google Scholar]
  12. Sikiru, I.A.; Kora, A.D.; Ezin, E.C.; Imoize, A.L.; Li, C.-T. Hybridization of Learning Techniques and Quantum Mechanism for IIoT Security: Applications, Challenges, and Prospects. Electronics 2024, 13, 4153. [Google Scholar] [CrossRef]
  13. Lopes, I.O.; Zou, D.; Abdulqadder, I.H.; Ruambo, F.A.; Yuan, B. Effective network intrusion detection via representation learning: A Denoising AutoEncoder approach. Comput. Commun. 2022, 194, 55–65. [Google Scholar] [CrossRef]
  14. D’Agostino, P.; Violante, M.; Macario, G. A Scalable Fog Computing Solution for Industrial Predictive Maintenance and Customization. Electronics 2025, 14, 24. [Google Scholar] [CrossRef]
  15. Liu, Y.; Sun, Y.; Liu, C.; Weng, Y. Industrial Internet Security Situation Assessment Method Based on Self-Attention Mechanism. Proceedings of 2024 3rd International Conference on Artificial Intelligence, Internet of Things and Cloud Computing Technology (AIoTC), Wuhan, China, 13–15 September 2024; pp. 148–151. [Google Scholar]
  16. Yang, Y.; Yao, C.; Yang, J.; Yin, K. A network security situation element extraction method based on conditional generative adversarial network and transformer. IEEE Access 2022, 10, 107416–107430. [Google Scholar] [CrossRef]
  17. Taheri, R.; Ahmadzadeh, M.; Kharazmi, M. A new approach for feature selection in intrusion detection system. Fen Bilim. Derg. (CFD) 2015, 36, 1344–1357. [Google Scholar]
  18. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  19. Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Chen, T. Recent advances in convolutional neural networks. Pattern Recognit. 2018, 77, 354–377. [Google Scholar] [CrossRef]
  20. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  21. LeCun, Y.; Kavukcuoglu, K.; Farabet, C. Convolutional networks and applications in vision. In Proceedings of the 2010 IEEE International Symposium on Circuits and Systems, Paris, France, 30 May–2 June 2010; pp. 253–256. [Google Scholar]
  22. Gao, Y.; Rong, W.; Shen, Y.; Xiong, Z. Convolutional neural network based sentiment analysis using Adaboost combination. In Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN), Vancouver, BC, Canada, 24–29 July 2016; pp. 1333–1338. [Google Scholar]
  23. Song, J.; Park, S.; Lim, M. Detection of Limit Situation in Segmentation Network via CNN. In Proceedings of the 2020 20th International Conference on Control, Automation and Systems (ICCAS), Busan, Republic of Korea, 13–16 October2020; pp. 892–894. [Google Scholar]
  24. Yu, D.; Wang, H.; Chen, P.; Wei, Z. Mixed pooling for convolutional neural networks. In Proceedings of the Rough Sets and Knowledge Technology: 9th International Conference, RSKT 2014, Shanghai, China, 24–26 October 2014; pp. 364–375. [Google Scholar]
  25. Haralabopoulos, G.; Razis, G.; Anagnostopoulos, I. A Modified Long Short-Term Memory Cell. Int. J. Neural Syst. 2023, 33, 2350039. [Google Scholar] [CrossRef] [PubMed]
  26. Graves, A.; Graves, A. Long short-term memory. Superv. Seq. Label. Recurr. Neural Netw. 2012, 385, 37–45. [Google Scholar]
  27. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  28. McInerney, J.M.; Haines, K.G.; Biafore, S.; Hecht-Nielsen, R. Back propagation error surfaces can have local minima. In Proceedings of the International 1989 Joint Conference on Neural Networks, Washington, DC, USA, 18–22 June 1989; p. 627. [Google Scholar]
  29. Ioffe, S. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
  30. Nair, V.; Hinton, G.E. Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
  31. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
  32. Siddique, K.; Akhtar, Z.; Khan, F.A.; Kim, Y. KDD cup 99 data sets: A perspective on the role of data sets in network intrusion detection research. Computer 2019, 52, 41–51. [Google Scholar] [CrossRef]
  33. Morris, T.; Gao, W. Industrial control system traffic data sets for intrusion detection research. In Proceedings of the Critical Infrastructure Protection VIII: 8th IFIP WG 11.10 International Conference, ICCIP 2014, Arlington, VA, USA, 17–19 March 2014; pp. 65–78. [Google Scholar]
  34. Parsaei, M.; Taheri, R.; Javidan, R. Perusing the effect of discretization of data on accuracy of predicting naive bayes algorithm. J. Curr. Res. Sci. 2016, 2016, 457. [Google Scholar]
  35. Specht, D.F. Probabilistic neural networks. Neural Netw. 1990, 3, 109–118. [Google Scholar] [CrossRef]
  36. Peterson, L.E. K-nearest neighbor. Scholarpedia 2009, 4, 1883. [Google Scholar] [CrossRef]
  37. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  38. Subakan, C.; Ravanelli, M.; Cornell, S.; Bronzi, M.; Zhong, J. Attention is all you need in speech separation. In Proceedings of the ICASSP 2021–2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 21–25. [Google Scholar]
  39. Cao, B.; Li, C.; Song, Y.; Qin, Y.; Chen, C. Network intrusion detection model based on CNN and GRU model. Appl. Sci. 2022, 12, 4184. [Google Scholar] [CrossRef]
  40. Cai, Z.; Si, Y.; Zhang, J.; Zhu, L.; Li, P.; Feng, Y. Industrial Internet Intrusion Detection Based on Res-CNN-SRU. Electronics 2023, 12, 3267. [Google Scholar] [CrossRef]
  41. Kranthi Kumar, K.; Bharadwaj, R.; Ch, S.; Sujana, S. Effective deep learning approach based on VGG-mini architecture for iris recognition. Ann. Rom. Soc. Cell Biol. 2021, 25, 4718–4726. [Google Scholar]
Figure 1. Structure of CNN.
Figure 1. Structure of CNN.
Electronics 14 00553 g001
Figure 2. Structure of LSTM.
Figure 2. Structure of LSTM.
Electronics 14 00553 g002
Figure 3. Propagation diagram of BP.
Figure 3. Propagation diagram of BP.
Electronics 14 00553 g003
Figure 4. Schematic diagram of CNN-LSTM-BP model.
Figure 4. Schematic diagram of CNN-LSTM-BP model.
Electronics 14 00553 g004
Table 1. Attack classification and encoding on KDD Cup99.
Table 1. Attack classification and encoding on KDD Cup99.
LabelLabel DescriptionEncodeNumber of
Samples in the Training Set
Number of
Samples in the Test Set
NormalNormal data078,90319,395
DOSDenial-of-service attack1312,90678,552
ProbingSurveillance and other probing21997521
R2LUnauthorized access from a remote machine38224
U2RUnauthorized access to local superuser (root) privileges41328313
Table 2. Attack classification and encoding on SCADA2014.
Table 2. Attack classification and encoding on SCADA2014.
LabelLabel DescriptionEncodeNumber of
Samples in the Training Set
Number of
Samples in the Test Set
NormalNormal data053401335
NMRINaive malicious response injection attack126867
CMRIComplex malicious response injection attack21331333
MSCIMalicious state command injection attack37419
MPCIMalicious parameter command injection attack4674168
MFCIMalicious function command injection attack5328
DOSDenial-of-service attack615037
ReconReconnaissance attack7626157
Table 3. Comparison of classification accuracy of several situation element extraction models on KDD Cup99.
Table 3. Comparison of classification accuracy of several situation element extraction models on KDD Cup99.
Evaluation (%)ACCRecallPrecisionF1-Score
PNN90.7690.7698.1894.32
KNN90.1990.1898.2094.02
RF87.9587.9489.0388.48
Transformer91.0191.0191.8591.43
CGAN-transformer93.0793.0794.2993.68
Ours98.0398.0398.4198.22
Table 4. Comparison of classification accuracy of several situation element extraction models on SCADA2014.
Table 4. Comparison of classification accuracy of several situation element extraction models on SCADA2014.
Evaluation (%)ACCRecallPrecisionF1-Score
PNN91.2591.2591.8591.52
KNN93.4493.4494.0793.75
RF89.9889.9890.1790.07
Transformer95.0795.0791.8593.43
CNN-GRU94.6978.9278.9475.45
Res-CNN-SRU98.7995.0495.3495.38
Ours98.9698.9698.5398.74
Table 5. Parameters of the LeNet5 and MiniVGGNet models.
Table 5. Parameters of the LeNet5 and MiniVGGNet models.
Layer NameLeNet5MiniVGGNet
Conv 1Kernel size: (3,3); out channel: 6; stride: 1Kernel size: (3,3); out channel: 64; stride: 1
Pooling 1Average pooling (2,2)Max pooling (2,2)
Conv 2Kernel size: (3,3); out channel: 16; stride: 1Kernel size: (3,3); out channel: 128; stride: 1
Pooling 2Average pooling (2,2)Max pooling (2,2)
Conv 3/Kernel size: (3,3); out channel: 256; stride: 1
Pooling 3/Max pooling (2,2)
Fc 1Hiddensize × 256Hiddensize × 512
Fc 3128 × n_classn_class
Table 6. Results of comparative experiments.
Table 6. Results of comparative experiments.
DatasetKDD Cup99SCADA2014
Evaluation (%)ACCRecallPrecisionF1-ScoreACCRecallPrecisionF1-Score
LeNet597.5297.5297.3097.4196.3896.3896.5396.45
LeNet5-LSTM-BP97.6697.6697.8397.7496.5296.5296.4096.46
MiniVGGNet80.6080.6087.5083.9192.5892.5893.0192.79
MiniVGGNet-LSTM-BP97.3097.3097.2597.2795.2995.2996.4795.88
CNN97.1097.1097.0497.0797.3597.3597.5597.45
CNN-LSTM-BP98.0398.0398.4198.2298.9698.9698.5398.74
Table 7. Results of ablation experiments on KDD Cup99 and SCADA2014.
Table 7. Results of ablation experiments on KDD Cup99 and SCADA2014.
DatasetKDD Cup99SCADA2014
Evaluation (%)ACCRecallPrecisionF1-ScoreACCRecallPrecisionF1-Score
BP80.2580.2587.1483.5590.1390.1392.4691.20
CNN97.1097.1097.0497.0797.3597.3597.5597.45
LSTM97.1097.1097.2597.1792.7792.7793.0492.90
CNN-LSTM97.3897.3896.4496.9198.0198.0198.1798.09
CNN-BP97.4497.4497.5497.4997.3097.3098.4197.85
CNN-LSTM-BP98.0398.0398.4198.2298.9698.9698.5398.74
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, R.; Wu, Q.; Zhou, Y. Network Security Situation Element Extraction Algorithm Based on Hybrid Deep Learning. Electronics 2025, 14, 553. https://doi.org/10.3390/electronics14030553

AMA Style

Zhang R, Wu Q, Zhou Y. Network Security Situation Element Extraction Algorithm Based on Hybrid Deep Learning. Electronics. 2025; 14(3):553. https://doi.org/10.3390/electronics14030553

Chicago/Turabian Style

Zhang, Ran, Qianru Wu, and Yuwei Zhou. 2025. "Network Security Situation Element Extraction Algorithm Based on Hybrid Deep Learning" Electronics 14, no. 3: 553. https://doi.org/10.3390/electronics14030553

APA Style

Zhang, R., Wu, Q., & Zhou, Y. (2025). Network Security Situation Element Extraction Algorithm Based on Hybrid Deep Learning. Electronics, 14(3), 553. https://doi.org/10.3390/electronics14030553

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop