4.2. Experimental Dataset
The intrusion detection dataset that is used to evaluate the performance of the IDS proposed in this study is reviewed. Several types of publicly available intrusion detection datasets are used as elements for evaluating IDSs. Datasets can be classified according to whether they contain complete packets, real data, zero-day attacks, or modern attacks [
33]. The KDDCUP-99 and NSL-KDD datasets produced by DARPA are widely used to study abnormal behavior IDSs [
34]. The above datasets have had an enormous impact on IDS research. However, after more than 20 years of publication, these datasets are not suitable for system evaluation due to their outdated attack types, lack of attack data complexity, and lack of attack data diversity [
35]. The University of New Brunswick (UNB) has also published datasets, such as CICIDS2017, which contain the latest attacks on the grounds that older datasets are not suitable for use due to their diversity in traffic and limitations on volume, the lack of anonymized packet information and payloads, and the limitations in the variety of attacks [
16].
Although many protocols are used for data communication in the IoT environment, market research shows that the MQTT protocol is the most frequently used to communicate data [
36]. Many IoT IDS studies have shown various systematic approaches to IoT security. Still, performance measurement showed that there were problems with reliability when old datasets that did not relate to the IoT were used [
18,
19,
20,
21]. Therefore, to reliably measure the performance of IoT IDSs, it is necessary to include the MQTT dataset in the experiment.
In this study, we conduct experiments using two types of datasets, CICIDS2017 and MQTTset. CICIDS2017 is a dataset created with a reliable benchmark and has been used in numerous existing IDS studies. MQTTset is the latest dataset created with MQTT that is intended for IoT IDSs. Experiments were performed using each dataset independently. In addition, the datasets were merged and evaluated to confirm that the reinforcement learning algorithm was trained well enough to judge intrusion according to changes in the network environment. The characteristics of each dataset are as follows.
4.2.1. CICIDS2017
UNB released CICIDS 2017 to compensate for the weaknesses of the previously announced intrusion detection dataset, the lack of data volume and attack diversity, and the inability to include the latest attacks. CICIDS2017 contains seven attack types: brute force, DoS, heartbleed, web attack, infiltration, botnet, and distributed DoS (DDoS), according to the latest network trends. CICIDS2017 was based on 10 criteria for data reliability [
37].
Table 4 lists the number of data samples for CICIDS2017 by type. Of the 2,830,743 data samples, about 80% were normal, and about 20% were attack types. In addition, because the seven attack types consist of 14 specific attack types, the dataset is suitable for judging the performance of IDSs using various attack types.
4.2.2. MQTTset
The MQTT protocol is one of the most frequently used protocols in the IoT. Although many IoT IDS studies have been conducted, due to the absence of an IoT-specific dataset, the intrusion detection datasets used in general network IDS research, such as KDDCUP-99, NSL-KDD, and CICIDS2017, have generally been used. However, various IoT-based intrusion detection datasets have recently been released due to the increase in the number of IoT networks. In this study, system performance is measured using the MQTTset dataset, which is the most-used IoT dataset.
MQTTset was built using IoT-Flock [
38], which is a network traffic generation tool that can emulate IoT devices and networks based on MQTT and CoAP protocols. It uses sensors for temperature, light intensity, humidity, motion, smoke, door opening, fan status, etc., to communicate data according to 10 scenarios, and it contains normal and attack data.
Table 5 lists the number of data observations in MQTTset by type. The SlowITe attack is a new type of DoS attack that was created using the MQTT vulnerability in IoT [
39].
The attack types in the two datasets have very different forms. Because the kinds of attacks between datasets are quite different, the above datasets can be combined to detect a wide range of attacks. In addition, unlike the CICIDS2017 dataset, which is mainly composed of TCP or UDP protocols, the MQTTset dataset consists of MQTT and CoAP protocols, so a greater variety of protocols can be learned and used to detect a greater diversity of attacks. Therefore, using a merged dataset does not guarantee that new types of attacks will be detected. However, high performance on complex datasets may lead to the detection of new types of attacks. As such, a merged dataset is very useful for evaluating the performance of an IDS.
4.5. Experimental Method
The experiment was performed as shown in
Figure 4, where the reinforcement learning agent consists of a PPO, and the environment is the object that the reinforcement learning agent learns. In this experiment, the environment refers to the IDS composed of a DNN feature extractor and k-means clustering module. The agent updates the hidden layer model, the number of features to be extracted from the DNN feature extractor, and the number of k-means clusters to be used as policies in the environment. The environment performs intrusion detection based on a policy and sends the resulting value, the F1-score, to the agent. The agent improves the performance of the IDS by iteratively evaluating and improving the policy based on the F1-score.
Experiments were performed in three different environments. In the first experiment, only the CICIDS2017 dataset was used. The second experiment used only the MQTTset dataset. Finally, the CICIDS2017 and MQTTset datasets were used as a merged set. The dataset merge was performed by integrating the features extracted from the DNN feature extractor into one file. When the agent sent a new number of features in each experiment, the features were re-extracted, and an integrated file was created. Because the two datasets contain different attack types, they were combined to verify that the proposed IDS with the reinforcement learning control algorithm could cover a wide range of attacks. It is challenging to represent all the rapidly changing network environments by combining two datasets. However, due to the diversity of attacks and normal data patterns in both datasets, the merged dataset cloud better show changes in the network environment than the previous datasets. PPO improves performance by iterating policy refinement, policy evaluation, and replay buffers until the appropriate performance level, which is set by the administrator, is reached.
In this experiment, the PPO reinforcement learning agent sets hyperparameters according to the hyperparameter ranges listed in
Table 8.
4.6. Experimental Evaluation
Table 9 lists the experimental results of measuring the performance of ID-HyConSys using the CICIDS2017 and MQTTset datasets. The result of the experiment using ID-HyConSys was an F1-score of 0.9707 on CICIDS2017 and an F1-score of 0.9973 on MQTTset. An F1-score of 0.9901 was obtained in the experiment in which the datasets were merged. The experiment with merged datasets provided excellent results, despite the wide range of attack types and increased complexity due to the increased volume and increased number of protocols. Each experiment showed performance that was better than or similar to other studies [
17,
22,
23,
24]. A comparison of the experimental results of the CICIDS2017 dataset showed similar performance to the IDS using DBN in [
22]. ID-HyConSys showed superior performance to other datasets according to the results of other studies.
Additionally, in the old version published in a previous paper [
15], before the revision to the existing ID-HyConSys, the F1-score on CICIDS2017 was 0.96552. However, the performance was improved to an F1-score of 0.9707 by adjusting the hidden layer. Because ID-HyConSys is designed to respond flexibly to changes in the network environment, it is expected to be useful in a real environment.
Figure 5,
Figure 6 and
Figure 7 show the 2000-repetition experiment training results for each dataset of ID-HyConSys.
Figure 5 shows the result of measuring performance using the CICIDS2017 dataset and shows the curve for the F1-score rising from a minimum of 0.8799 to a maximum of 0.9707.
Figure 6 shows the performance measurement results using the MQTTset dataset and shows the curve of the F1-score rising from a minimum of 0.9736 to a maximum of 0.9973. Finally,
Figure 7 shows the merged dataset results, showing the F1-score rising from a minimum of 0.8747 to a maximum of 0.9901. The red dotted line in each figure is a logarithmic trend line. The trend line shows a gentle upward curve in all graphs. The F1-score shows a range of fluctuations due to the exploratory reinforcement learning process. However, as time passes, the F1-score increases, and the fluctuation range decreases, indicating that it stabilizes. Due to the characteristics of PPO, stability is pursued, and the policy is updated, so it is expected that a very stable F1-score would be obtained if a lot of learning is done.
Table 10 and
Table 11 show the results of analyzing the models having the best and worst performance on each dataset. Just like the number of extracted features affected intrusion detection, the change in the hidden layers of the model, and the change in the number of clusters affected intrusion detection.
In the case of CICIDS2017, the case where four features were extracted from a hidden-layer model with 10 layers and SELU and ELU activation functions and were then clustered into 512 clusters showed the best performance. In addition, the case where four features were extracted from the hidden layer model using four layers and the ReLU activation function and were then clustered into eight clusters showed the worst performance.
In the case of the MQTTset, the case where 12 features were extracted from the seven layers and the hidden layer model using the ReLU activation function and were then clustered into 64 clusters showed the best performance. In addition, the case where four features were extracted from the hidden layer model using four layers and the TanH activation function and were then clustered into two clusters showed the worst performance.
In the case of the merged dataset, the case where 14 features were extracted from the four layers and the hidden layer model using the TanH activation function and were then clustered into 128 clusters showed the best performance. In addition, the case where 17 features were extracted from the hidden layer model using six layers and the ReLU activation function and were then clustered into eight clusters showed the worst performance.
Looking at the three models, in the case of CICIDS2017, there are many types of attack data, and the complexity of the data is high, so clustering with many clusters is required to achieve good performance. If the number of clusters is large, the intrusion detection time may increase, but reducing the number of clusters does not significantly degrade performance. Therefore, it seems that the number of clusters can be adjusted slightly when data are excessively crowded and fast processing is required.
In the case of MQTTset, the optimal model can easily be found because the performance change is small according to the model change. In addition, because the number of clusters in the model having the best performance is not large, intrusion detection is expected to proceed quickly.
The merged dataset has an extensive range of fluctuations between the best and worst performance compared with CICIDS2017 and UNSW-NB15. Additionally, the worst performance on the merged dataset is worse than that on CICIDS2017 or MQTTset due to increased complexity. Because performance varies widely across model sets, it seems likely that hyperparameters will need to be fine-tuned to find the optimal model. However, the merged dataset confirmed that high performance could be obtained due to the wide output range of TanH despite the increased complexity.
To improve the performance of the IDS, we adjusted the hidden layer model, the number of output features of the DNN feature extractor, and the number of clusters obtained by the k-means clustering algorithm. In the case of the number of k-means clusters, performance generally improves when a large number of clusters are used, but the intrusion detection speed may be affected depending on the number of clusters. Therefore, it seems that the appropriate number can be selected differently according to the computing power of the IDS. However, the generation of hidden layers and the number of output features of the DNN feature extractor are important factors, and an appropriate adjustment of the hidden layers and the number of output features through PPO makes it possible to build an IDS that can respond quickly to a changing network environment.
The DNN assigns weights to the input data according to the configuration of the hidden layer, and it outputs the output value by changing the output value to a non-linear structure using the activation function. The general DNN structure improves the DNN model by using the error-backpropagation function through repeated learning based on a pre-set answer, but the DNN feature extractor used in this study does not have an answer for the extracted feature. Therefore, it does not improve the DNN model through functions, such as error-backpropagation, but it improves the feature extractor by judging and adjusting the performance of the model itself through PPO. Although the DNN feature extractor plays a role, such as dimensionality reduction, that adjusts the number of features to an appropriate level, it is similar to a structure that generates data, such as the auto-encoder and GAN. In addition, the DNN feature extractor has a structure that creates new features from the input network data and detects intrusions using k-means with the newly created features.
If we analyze the structure of the model that achieves the best performance on each dataset, the number of clusters is generally set very high. If the number of clusters increases, the learning time is significantly affected, but it is judged that there is no problem in real-time detection. When the number of clusters increases during data learning, separating the training data and dividing the area takes a lot of time. However, the number of clusters does not affect real-time detection because the detection mode processes the extracted data as vector values and only checks whether an attack or normal in an already divided area. However, as the number of clusters increases, a lot of memory is needed. Therefore, in terms of memory, there is an inefficient part, so it is necessary to adjust the number of clusters to an appropriate level according to the computing power of the IDS.