1. Introduction
In the current 5G era, the Internet penetration rate is rapidly rising, the network environment is becoming more complex, and the network security problem is becoming increasingly serious. Traditional firewalls, data encryption, and other network security protection methods have had difficulty coping with the needs of network security protection [
1]. Continuous improvements to the defensive capabilities of networks have become an increasingly important hot topic. Intrusion detection enhances a network’s security defense capabilities by collecting information in the network, detecting abnormal behavior, and identifying dangers and alarms. The working principle of the intrusion detection system is to classify the collected network traffic data and to divide them into normal or abnormal types, in which the abnormal-type data can be refined into different attack categories [
2]. The field of machine learning has been growing rapidly in recent years, and a large number of machine learning algorithms have been applied to intrusion detection systems, such as support vector machine [
3], decision tree [
4], and naive Bayes [
5].
However, with the advent of the era of big data, these traditional machine learning algorithms are having difficulty handling intrusion detection. Deep learning has shown a good feature extraction ability in the face of massive and high-dimensional data. Therefore, some scholars have applied deep learning algorithms such as convolutional neural network [
6] and recurrent neural network to intrusion detection. For example, Henry [
7] combined convolutional neural networks with a gated recurrent unit for intrusion detection, used different CNN-GRU combination sequences to optimize parameters, improved the dependence of deep learning on feature learning, and achieved a good classification effect of intrusion detection. Chen [
8] conducted research experiments on recurrent neural networks, used the window-based instance selection algorithm to further streamline the data, then adjusted the parameters in the model, and finally used them for intrusion detection to improve the overall performance of intrusion detection. Compared with the above two kinds of neural networks, the self-organizing neural network can automatically adjust the structure and parameters of the network to adapt to the data mode and changes according to the distribution and characteristics of the intrusion detection data and can dynamically learn and adjust its own representation ability. Therefore, some scholars have considered the use of self-organizing incremental learning to handle intrusion detection data. Li [
9] combined the possibility theory and SOINN algorithm to define the possibility membership to judge the sample category and formed the SOINN cluster input probability neural network to obtain the PS-PNN network, which finally improved the accuracy and recall rate of the intrusion detection system.
While self-organizing incremental neural networks can handle massive and high-dimensional dynamic data, this does not mean that the more data and attributes, the better. The attributes of many datasets are not conducive to the detection of network intrusion outcomes. At this point, an indicator, namely attribute importance, is used to select the attribute, which can detect the degree of association between the intrusion detection results and the attribute. Attribute importance is an indicator used in a rough set to measure the degree of correlation between conditional attributes and decision attributes and is often used for decision tree node division, such as in Wang [
10], where the heuristic function was defined for decision tree division criteria. Attribute importance is also often used for attribute reduction. For example, in order to improve the operation efficiency and accuracy of a classification algorithm, Zhou [
11] proposed a fast attribute reduction algorithm based on improving the importance of a neighborhood rough set, and Zheng [
12] gave an evaluation strategy of attribute importance and designed an algorithm by integrating the multi-fork tree theory. All the above studies used attribute importance to reduce the data, and the results were significantly improved.
In the aforementioned studies, only properties in the positive domain were considered in the calculation of attribute importance, and properties in the boundary domain were not considered. The attributes in the boundary domain also contributed to the decimation result and it is obviously not appropriate to delete them directly, so the attributes in the boundary domain need to be taken into account. The decision boundary entropy proposed by Zhang [
13] can not only consider the information in the boundary domain but also consider the information entropy, which can measure the uncertain information. Then, they defined a new measure of attribute importance, finally proposed a three-way attribute random selection algorithm to handle the dataset in random forest, and achieved good results in the intrusion detection datasets [
14]. Based on the above analysis, the algorithm process was improved, and the three-attribute reduction algorithm was proposed and combined with the self-organized incremental neural network. Finally, the T-SOINN algorithm was obtained. The precision and recall, which are commonly used evaluation indicators of intrusion detection, were selected to validate the performance of the model. The results validate the advantages and disadvantages of T-SOINN.
The contributions of this article are as follows:
The three-way attribute selection algorithm is improved, and a new attribute reduction method is proposed;
The distance between nodes is introduced into the weight update formula, so that the weight update value is adjusted with the distance between neurons, to optimize the dynamic update operation of the topology structure;
SOINN is used for network intrusion detection to obtain an incremental intrusion detection model and combined with three-way attribute reduction for the T-SOINN algorithm.
The structure of this paper is as follows:
Section 1 provides an introduction.
Section 2 provides the related work on incremental learning, attribute reduction, and three-way decisions.
Section 3 briefly introduces incremental learning, the self-organization of an incremental neural network, and attribute reduction.
Section 4 introduces the T-SOINN algorithm in detail.
Section 5 presents the experimental verification algorithm, and finally,
Section 6 provides a summary.
3. The Basics
3.1. Incremental Learning
Incremental learning is a simulation of humans’ learning ability and the solution to some real-life problems [
16]. The use of incremental learning can improve some of the drawbacks of traditional machine learning: Traditional machine learning suffers from low recognition accuracy in the face of new data and is prone to forgetting old data when learning new data types. However, incremental learning, in which new knowledge is learned to selectively preserve and discard old knowledge, effectively improves the catastrophic forgetting problem. Furthermore, with current invasion detection datasets being large and have high characteristic dimensions, one-time complete learning will produce many problems, such as the high space requirements, the long learning time, new data being required after learning, the high consumption of data, etc.; however, incremental learning through dynamic partial read data training will further enhance the efficiency of the model. The incremental learning process [
16] is shown in
Figure 1.
3.2. Self-Organized Incremental Learning Neural Network
In 2006, Professor Shen Furao [
31] first proposed the unsupervised clustering neural network model SOINN, which is a two-layer neural network structure with competitive learning ability. The initial training data are received and the prototype neurons are generated adaptively using the first layer network structure. The second layer network then estimates the intra-class and inter-class distances based on the results from the first layer network. The prototype neurons are used as the initial data in this process, leading to the acquisition of a stable topological structure.
The SOINN algorithm uses a set of neurons distributed over the feature space to approximate the distribution of the input data [
32], that is, the distribution of the original dataset is represented by a small number of neurons. During learning, neurons belonging to the same cluster are connected to form a topology similar to the form of a connected graph to represent the current distribution of the learned data. As new data continue to be input, the topology is adjusted and updated according to the distance between the new data and the surrounding neurons. When the difference between the new data and the original neuron is large, the inter-class insertion operation is performed to generate new categories, and when the difference is small, the node fusion operation is performed. Furthermore, noisy nodes are removed based on the number of winning neurons and the number of surrounding neighbor nodes, which ensures not only the incremental learning of SOINN but also the stability of the learning results.
3.3. Attribute Importance
In real datasets, the importance of different attributes may be different. The rough set method can be used to measure the importance of some attributes. Three-way decisions are used to select the attributes.
The decision table is is the universe, which is a non-empty finite set containing all instances. denotes the conditional attributes, denotes the decision attribute. contains all attributes. is a set of attribute values, and is a mapping. For any , the indiscernibility relation on is defined as .
Definition 1. Given a decision table , for and , the lower and upper approximations of are expressed as is called a boundary domain of with respect to , and is called the negative domain of with respect to .
Definition 2. (Approximate classification accuracy) Given a decision table , for and , the approximate classification accuracy of with respect to is defined as Obviously, the approximate classification accuracy describes the percentage of correct decisions among all possible decisions when classifying decision attribute under . means that can completely describe decision attribute . means that cannot describe decision attribute at all.
Definition 3. (Decision boundary entropy, DBE) [14] Given a decision table,, for any, the decision boundary entropy ofwith respect tois defined as denotes the approximate classification accuracy, and denotes the boundary domain.
Definition 4. (DBE-based importance of an attribute) [14] Given a decision table , for any , the significance of attribute with respect to and is defined as 3.4. Three-Way Decisions
Three-way decisions based on the evaluation function give an uncertain division of element
in a certain target object set
according to the evaluation function
. The, a threshold pair
, where
, is introduced. For any
, when
, the element
is divided into the positive domain
of the set
. When
, the element
is divided into the negative domain
of the set
. When
, the element
is divided into the delay domain
of the set
. Then,
can be divided into three disjoint regions, as shown in
Figure 2.
It should be noted that the function
g evaluates the attributes, and by comparing parameters
α and
β, the attributes are divided into the positive, boundary, and negative domains. This is explained in detail in
Section 4.1.
4. T-SOINN Incremental Intrusion Detection Model
The T-SOINN model is divided into two parts: three-way attribute reduction and SOINN incremental clustering. The specific process is described as follows:
- (1)
Preprocess the initial dataset: symbol digitization and K-means clustering discretization;
- (2)
Calculate the importance of each attribute in the original dataset;
- (3)
Set the attribute importance as the evaluation function and the threshold to realize three-way attribute reduction;
- (4)
Select k/2 attributes, and delete the rest to obtain a subset of the dataset;
- (5)
Transfer the incremental dataset subset into SOINN for initialization;
- (6)
Find the first and second winning neurons;
- (7)
If or , insert new nodes;
- (8)
Update the values of any edges connected between two winning neurons;
- (9)
Update the values for all edges connected to the first winning neuron;
- (10)
If and , update the weights of the two winning neurons;
- (11)
Sum up the quantification error of the winning node;
- (12)
Judge the value of the connected edge, and delete redundant connections;
- (13)
For each inputted λ sample, perform intra-class node insertion and denoising operations;
- (14)
End the process.
The flow of the T-SOINN algorithm is shown in
Figure 3.
4.1. Three-Way Attribute Reduction
The attribute importance based on DBE is set to the evaluation function in the three-way decisions that is . The value of is then set for any : is classified into the positive decision domain if ; is classified into the negative decision domain if ; and is classified into the delay decision domain if .
After dividing the attributes into three domains, the three-way attribute selection rules are defined as follows:
- (1)
is satisfied, the random attributes in the positive domain are a subset of attributes.
- (2)
is satisfied, the random attributes in the positive domain and the boundary domain are a subset of attributes.
- (3)
is satisfied, the attributes in the positive and boundary domains, plus the random attributes in the negative domain, constitute a subset of properties.
denotes the total number of attributes, and , , and , respectively, denote the number of attributes in the positive domain, the boundary domain, and the negative domain.
In the process of applying the three-way attribute selection rules, rule (1) is considered first, followed by rule (2) and rule (3).
The detailed steps of Three-way attribute reduction algorithm are shown in Algorithm 1.
Algorithm 1 Three-way attribute reduction algorithm. |
Input: Dataset , Decision table , Three-way attribute selection threshold Output: Dataset containing attributes |
1: | Initialization |
2: | For do |
3: | Calculate |
4: | If then |
5: |
|
6: | Else |
7: | If then |
8: |
|
9: | Else |
10: |
|
11: | End if |
12: | End if |
13: | End for |
14: | If then |
15: | draw p/2 attributes from to constitute an attributes subset |
16: | Else |
17: | If then |
18: | draw attributes from and to constitute an attributes subset |
19: | Else |
20: | draw attributes from and , draw attributes from to constitute an attributes subset . |
21: | End if |
22: | End if |
23: | containing attributes |
When determining the number of attributes, if the number of attributes is too much, the effect of dimension reduction cannot be achieved; if the number of attributes is too small, the model accuracy may be affected. In reference [
13], the authors take
attributes to train a subtree and obtain a random forest. If it is obviously not appropriate to continue to select
attributes, here, half of the original attributes are selected as the number of attributes after attribute selection.
4.2. SOINN Incremental Clustering
- (1)
The data subset after attribute reduction is received, and two samples are randomly selected to initialize the neuron set , a set of neurons connected via the relationship . The two initialized neurons have no connection relationship, and their weights are, respectively, , .
- (2)
Finding the winning neuron: set the newly received sample as
, and the two most similar neurons to
are sought as the first and second winning neurons,
,
respectively. The calculation formula are
At this point, the winning number of the first winning neuron is updated. If node becomes the winning neuron for the first time, then is denoted as 1; otherwise, is added to 1.
- (3)
The similarity thresholds
and
of the winning neurons
and
are then calculated. For any neuron
, if there is a connected neuron, it is proved that there is a neighbor neuron to this neuron and it forms a class with it. In this case, the similarity threshold is the maximum intra-class distance, and the calculation formula is
If no neurons are connected to
, the similarity threshold is defined as the minimum inter-class distance, with the formula calculated as
- (4)
According to the similarity threshold for performing the corresponding operation
, the distance between
and the first and second winning neurons are
and
, respectively. If
or
, the current
differs greatly from the previously learned data, so
is inserted into the new neurons to update the neuron ensemble
. Next, return to step 2 to continue processing other new samples. The sample insertion diagram is shown in
Figure 4.
- (5)
Update the parameter for the connection edges. When the first and second winning neurons are activated simultaneously by the input sample , the two winning neurons are somewhat related. Thus, if there are connections between two winning neurons, then ; if there is no connection, the two winning neurons are connected and the parameter is set to 1.
- (6)
All edges connected to the first winning neuron have an
value plus one:
- (7)
The quantization error of the winning neuron is accumulated:
- (8)
If
and
, the current sample
is similar to the neuron in topology and there is no need to generate new neurons. Next, we need to update the weights of the winning neuron based on
and move toward
, that is, the process of the fusing sample
into the topology. When the samples are fused, the winning neuron will move by updating its weights. A sample fusion diagram is shown in
Figure 5.
Updating the weights: Since the coefficient of the original weight-updated value is a predefined fixed value and cannot be adjusted according to the actual situation, the coefficient is updated to
The coefficient of the weight change is the learning rate, and the multiplication is performed by adding one to the Euclidean distance so that the denominator of the updated coefficient is greater than one. is the total number of wins for a node. Neurons with a high number of wins are highly represented, and neurons that are close to each other have a high probability of belonging to the same class and moving a relatively large distance. Therefore, the Euclidean distance is introduced into the coefficients of the node movement weights, so that the node movement value is changed by the number of node wins and the distance between nodes.
- (9)
The learning cycle is ; after learning each sample, the denoising operation should be performed. If , then the connection edge between neurons and is overaged, and the set of neuron relations is updated. If learning is not complete, go back to step 2 of Algorithm 1 and continue processing new samples.
If the value of is small, it is applicable to the datasets with many noise points, the noise nodes and connections can be found in time, and the deletion operation can be performed. Conversely, is suitable for datasets with less noisy points. When the dataset structure is more complex, the smaller is, the more correct the topology. Generally, the value of is between 50 and 100.
The dataset after three-way attribute reduction is used for SOINN incremental clustering, and the T-SOINN algorithm is obtained. The detailed steps of T-SOINN are shown in Algorithm 2.
Algorithm 2. T-SOINN algorithm. |
Input: Dataset , Decision table , Three-way attribute selection threshold Output: The set of neurons and the set of connection relations |
1: | Algorithm 1 is used to reduce the attributes of the original dataset, obtain containing attributes |
2: | Initialize the set of neurons within a single competitive learning cycle , The set of neuronal connection relations |
3: | Input , Computing the winning neurons and |
4: | |
5: | , denotes the weights of |
6: | Let the set of neighbor neurons of be |
7: | Calculate , and the similarity thresholds and of the winning neurons and |
8: | If then |
9: |
|
10: | Else |
11: |
|
12: | End if |
13: | If or |
14: |
|
15: | If there is no connection between and |
16: |
|
17: |
|
18: | Else |
19: |
|
20: | End if |
21: | , |
22: |
|
23: | Else |
24: | and , is fused into the topology |
25: |
|
26: | , is the number of node wins |
27: | End if |
28: | If the input sample is integer times, |
29: | Perform intra-class node insertion and denoising operations |
30: | End if |
31: | If the input does not end |
32: | Return to step 2 in algorithm 1 to continue |
33: | End if |
34: | Stop running the algorithm, and the set and of connection relations are output. |
The used symbols and their meanings are shown in
Table 1.
5. Experimental Results and Analysis
5.1. Experimental Dataset
KDD CUP99 is selected as the experimental dataset. The dataset KDD-CUP99 contains network connectivity data collected from a simulated Air Force Local Area Network. This dataset has been widely used in experiments with different intrusion detection models and is a classical challenge for intrusion detection and machine learning research [
33].
5.2. Experimental Environment and Evaluation Indicators
The experimental hardware environment: Windows 10 operating system, AMDRyzen53500UCPU with 8 GB of memory. Software environment: Python3.6, Keras deep learning framework based on TensorFlow, to study the performance of T-SOINN model.
Since the experimental results all have a small range of fluctuations, the average of the ten experimental results is selected as the final detection for comparison across the different experiments. Precision, recall, F1-score, false positive rate (FPR), and false negative rate (FNR) were selected as the evaluation indicators of intrusion detection. For an intrusion detection model, the higher the precision and recall of the experiment and the lower the false positive rate and false negative rate, the better the performance of the intrusion detection model.
5.3. Data Preprocessing
Part of the features in the dataset are of the string type, and symbolic feature digitization is needed to convert the corresponding non-numeric type features into numeric identifiers [
33].
The KDD-CUP99 dataset contains properties with character values. For example, the protocol_type attribute has values for TCP, UDP, and ICMP, so it is necessary to convert these properties to numbers. Since the calculation method of attribute significance based on a rough set requires that the attribute values are discrete values, K-means clustering is used to discretize the data.
The 32 continuously valued attributes of the original dataset are visualized to obtain the distribution law of each attribute and thus to determine the number of clusters for each attribute. The 32 continuous attributes are visualized separately.
Figure 6 shows the distribution diagram of attribute dst_host_srv_serror_rate. The K value of each continuous attribute is determined using the results of the visual demonstration, and then, the K-means clustering discretization process is carried out.
5.4. Experimental Comparison
5.4.1. Parameter Analysis
The importance of all attributes is calculated, and the importance of some attributes is shown in
Table 2.
According to the partition attribute of
, it is divided into a positive region, a negative region, and a boundary region. The differences in the importance of different attributes in
Table 2 are relatively large. The value of attribute importance is divided into several equidistant intervals, and the average value of attribute importance in each interval is taken as the threshold value of
, where
. Due to the randomness of the three-way attribute reduction method, the selected attribute subset is different according to the difference of
, and the model checking result also changes.
The changes in precision and recall under different
and
are shown in
Figure 7 and
Figure 8, respectively. When
, the precision and recall of the model are optimal; when
, the precision and recall of the model are the lowest. In this case, the original attribute is divided into three parts, and the part with the lowest attribute importance has the largest number. In the random selection process, the number of attributes with low importance is selected, so the overall accuracy and recall of the model are low, indicating that having many attributes with low importance will reduce the detection performance.
The changes in FPR and FNR under different
and
are shown in
Figure 9 and
Figure 10, respectively. From the results of the experiment, when
, the precision and recall of the model are optimal and the FPR and FNR of the model are the lowest. The effect of the model reached an optimal level. At this time, most attributes were selected from the positive domain and the boundary domain, while the number of attributes selected from the negative domain was small. That is, with more high-importance attributes, the T-SOINN model’s detection performance was better.
With the values of the above parameters, the data subset after attribute reduction was passed into SOINN for training. The value of was between 50 and 100. The value of was between 100 and 200. The range of and was divided equally. took values of 50, 60, 70, 80, 90, and 100. took values of 100, 120, 140, 160, 180, and 200.
With different values of
and
, the precision and recall changes in the model are shown in
Figure 11 and
Figure 12 and the FPR and FNR changes in the model are shown in
Figure 13 and
Figure 14, respectively. According to
Figure 11,
Figure 12,
Figure 13 and
Figure 14, when
and
, the precision and recall are the highest and the FPR and FNR are the lowest. According to an analysis of the above parameters,
,
, and
.
5.4.2. Analysis of Experimental Results
The parameter values were obtained from the above analysis. The changes in precision, recall, false positive rate, and false negative rate of the model with the incrementally incoming stream data are shown in
Figure 15. With the incremental data input, the precision and recall gradually show an upward trend after certain fluctuations, while the FPR and FNR show a downward trend. In other words, when the amount of training data is small, the model detection results fluctuate a lot, while the model detection performance gradually stabilizes as the amount of training data increases.
The total number of nodes is constantly changing during the training of the T-SOINN model. When the difference between the new sample and the data previously learned by the T-SOINN model is too large, that is,
or
, the new node is inserted and the total number of nodes increases. After
cycles, intra-class insertion and denoising operations are needed for the entire SOINN structure. The intra-class node insertion operation requires selecting the two nodes with the largest quantization error among all nodes, inserting new nodes in the middle to reduce the quantization error of the neurons, and increasing the total number of nodes. When performing the denoising operation, the noise and outliers in the network structure and the associated connections should be removed, at which point the total number of nodes is reduced. The growth changes in the nodes and edges are shown in
Figure 16.
Figure 16a shows, as an incremental data input, the change in the total number of nodes during the topological structure adjustment process.
Figure 16b shows the evolution of the total number of connected edges in the topology adjustment process with increasing data input.
The winning neurons
and
are activated simultaneously and make connections as new data are input. When the value of
reaches the maximum, then the current connection is removed and the number of connection edges decreases. In the denoising operation after
cycles, the noise and outlier points and related connections in the network structure are deleted, and the number of connection edges decreases. The change in the number of deleted nodes and edges is shown in
Figure 17.
Figure 17a shows the change in the total number of deleted nodes in the topology adjustment process when incremental data are input. In
Figure 17b, as the incremental data are input, a number of topological structure adjustments were removed during the changes.
In order to verify the experimental effect, the T-SOINN model is compared with SOINN and LASSO-SOINN on the KDD-CUP99 dataset. Three-way attribute reduction is a dimensionality reduction method, and so is LASSO. LASSO is combined with the SOINN algorithm to obtain LASSO-SOINN, and it is applied to the KDD-CUP99 dataset. The performance comparison results of the three algorithms are shown in
Table 3.
If only the SOINN algorithm is used on the KDD-CUP99 dataset, the precision and recall are 97.45% and 97.73%, respectively. For SOINN, the precision and recall are increased by 0.31% and 0.26%, respectively, when LASSO is used for dimension reduction. Utilizing three-way attribute reduction for dimensionality reduction, the precision and recall are improved by 1.98% and 1.48%, respectively, which not only significantly improves the performance of SOINN but also outperforms LASSO. Similarly, using three-way attribute reduction for dimensionality reduction, not only are the FPR and FNR significantly reduced but also the performance is better than that of LASSO. According to the above analysis, the effectiveness of the three-way attribute reduction for optimizing the SOINN algorithm is evident, and T-SOINN is a good method for processing intrusion detection data.
6. Conclusions and Prospect
In view of the large scale of intrusion detection data that cannot be completely collected all at once, the T-SOINN incremental intrusion detection model was proposed. Through three-way attribute reduction, attribute importance was used as the evaluation function to calculate the attribute subset to achieve data dimensionality reduction. After receiving the dimensionality reduction data, SOINN generated a topological representation of the original data and achieved incremental learning by dynamically adjusting the topological structure. The results show that the precision and recall rate of the T-SOINN model are effectively improved.
In the future, we will further consider the optimization of the inter-class insertion operation, add three decisions to the inter-class insertion discrimination, and re-input the topological structure of the uncertain data combined with more information, so as to further improve the overall performance of the model. The comparison process of the similarity threshold and the Euclidean distance is relatively independent, and there are some errors. Therefore, in the future, we can continue to enrich the comparison process and to improve the comparison content.