Improved Resnet Model Based on Positive Traffic Flow for IoT Anomalous Traffic Detection

Li, Qingfeng; Liu, Yaqiu; Niu, Tong; Wang, Xiaoming

doi:10.3390/electronics12183830

Open AccessArticle

Improved Resnet Model Based on Positive Traffic Flow for IoT Anomalous Traffic Detection

¹

Network Information Center, Northeast Forestry University, Harbin 150040, China

²

College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China

^*

Authors to whom correspondence should be addressed.

Electronics 2023, 12(18), 3830; https://doi.org/10.3390/electronics12183830

Submission received: 20 August 2023 / Revised: 7 September 2023 / Accepted: 8 September 2023 / Published: 10 September 2023

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The Internet of Things (IoT) has been highly appreciated by several nations and societies as a worldwide strategic developing sector. However, IoT security is seriously threatened by anomalous traffic in the IoT. Therefore, creating a detection model that can recognize such aberrant traffic is essential to ensuring the overall security of the IoT. We outline the main approaches that are used today to detect anomalous network traffic and suggest a Resnet detection model based on fused one-dimensional convolution (Conv1D) for this purpose. Our method combines one-dimensional convolution and a Resnet network to create a new network model. This network model improves the residual block by including Conv1D and Conv2D layers for two-dimensional convolution. This change enhances the model’s ability to identify aberrant traffic by enabling the network to extract feature information from one-dimensional linearity and two-dimensional space. The CIC IoT Dataset from the Canadian Institute for Cybersecurity Research was used to assess the effectiveness of the proposed enhanced residual network technique. The outcomes demonstrate that the algorithm performs better at identifying aberrant traffic in the IoT than the original residual neural network. The accuracy achieved can be as high as 99.9%.

Keywords:

deep learning; anomalous traffic detection; IoT; feature engineering

1. Introduction

IoT technology [1] has seen significant advancements in recent years, leading to steady development and enhancement of the industrial chain. IoT technology is extensively used in several industries. Its integration with cyclical forces arising from infrastructure expansion, crucial industrial transformation, and consumer upgrading contributes to the exponential growth of the IoT industry as a whole. The advent of 5G coincides with the commencement of the Internet of Things. The origin of 5G technology has initiated a transformative revolution in the realm of IoT [2], as it facilitates the expansion of human communication to include the vast network of interconnected devices and systems known as the Internet of Everything. However, the rapid growth of the IoT also gives rise to distinct challenges, particularly in security. IoT necessitates the development of a pragmatic framework capable of detecting anomalous network activity. This is particularly crucial since most IoT devices are inherently limited in their computational capabilities, rendering them vulnerable to malicious hacking. This might lead to challenges such as declining service quality or unauthorized access to sensitive information.

Machine and deep-learning-based approaches [3] have been widely used for abnormal traffic identification in the IoT as artificial intelligence technologies gain prominence. The efficacy of feature engineering is significant for the approaches based on machine learning [4,5]. Nevertheless, IoT devices can see and gather external data, which is facilitated by a diverse array of sensors. Every sensor functions as a source of information, and various types of sensors collect information that varies in substance and format. Therefore, extracting network traffic data presents a substantial obstacle for detection algorithms. Moreover, the IoT is a pervasive network that relies on the Internet as its fundamental infrastructure and core technology. By integrating diverse wired and wireless networks with the Internet, IoT enables the precise and real-time delivery of object information. This entails transmitting the data collected by IoT sensors at regular intervals to designated devices via the network. However, the extraction of features from traffic in IoT devices poses a significant challenge due to the limited processing capacity of these devices [6], and the large volume of data involved could overload the devices. Additionally, if the feature extraction model is implemented in a data center or cloud server, data transmission becomes a complex issue. On the other hand, deep learning has recently arisen as a novel approach to learning, characterized by its robust learning skills, capacity to adapt to changing settings, and lack of reliance on manual feature engineering [7,8]. This approach has effectively mitigated and resolved several challenges often encountered with conventional methodologies [9].

The Spring 2022 State of IoT study, published by IoT Analytics, disclosed that the worldwide tally of IoT devices saw an 8% increase, culminating in 12.2 billion units in 2021. Nevertheless, the observed growth rate exhibited a notable decline compared to preceding years, primarily due to the limited availability of semiconductor chips. The training dataset must eventually include a large number of different negative examples for the majority of traditional detection algorithms. Therefore, this paper posits that acquiring a complete collection of negative samples for the vast range of IoT devices is a significant challenge.

This study employed a distinct methodology by only using positive traffic (allocated for testing purposes) to train the model. The training data were fed into the model for detection, resulting in the determination of the least confidence index for each class of samples, which indicates proper classification. Subsequently, this index was used as the threshold value for each category throughout the process of fine-tuning. Consequently, traffic data with a confidence index lower than its corresponding class were classified as malicious traffic. Traffic data with a confidence index lower than the appropriate class was regarded as negative. This study presents a new network model that aims to extract feature information only from positive traffic. The proposed model employs one-dimensional convolution to extract temporal information from time series data and two-dimensional convolution to extract spatial information. These two types of convolutions are combined to enhance the model’s overall performance. This extensive feature extraction technique allows the model to achieve heightened confidence in distinguishing regular traffic patterns and accurately detecting malicious traffic instances. Furthermore, it is essential to highlight the significance of minimizing the amount of input data needed and improving the detection capabilities of the model. The contributions of this study may be briefly summarized as follows.

(1) Our proposal entails using normal traffic to train the model. The proposed model can detect malicious network traffic without including negative samples during training. This approach effectively mitigates many issues arising from the inadequate collection of complete negative data and the inherent imbalance between positive and negative samples.

(2) In this study, we provide a novel network model. The extraction of sequence information is achieved through one-dimensional convolution, while spatial information extraction is accomplished through two-dimensional convolution. These convolutional operations are combined to enable the model to comprehensively extract feature information from the data. This allows the model to achieve a high level of recognition for normal traffic, resulting in an increased confidence index.

(3) This study utilizes Pcap files to exclude the 32 bytes after the first 24 bytes of the file header. This adjustment ensures that the data adhere to the specified criteria, mitigating the impact of data filling, cleansing, and other activities that may distort the real data. Additionally, this approach effectively reduces the length of the input data.

The subsequent sections of this manuscript are as follows: Section 2 provides an overview of the existing literature and research in the field. Section 3 provides an overview of the data used in the experimental analysis and a detailed description of the suggested model for monitoring anomalous traffic in the context of the IoT. In Section 4, the practical methodology and findings are presented. Section 5 comprehensively summarizes the present study and outlines future research directions.

2. Related Works

In recent years, there has been a conspicuous rise in the prominence of deep learning as an emerging field of research. One of the primary advantages of deep learning is its capacity to autonomously acquire feature representations from raw data, thereby eliminating the need for manual feature engineering. This feature boosts the system’s proficiency in representing information and contributes to its overall resilience and flexibility. Table 1 presents the latest studies in this specific field, including the used datasets, methodology, and relevant particulars.

The OwLEye method, created by Yong et al. [10], utilizes a machine learning model to evaluate the malevolent score of online requests to identify web traffic. A threshold is established to ascertain whether a query constitutes a web assault. In their study, Ageev et al. [11] used fuzzy logic inference to detect abnormal traffic patterns in IoT networks. This was achieved by the analysis of stationary Poisson or self-similar traffic that is characteristic of these networks. In their study, Zhu et al. [12] presented Fed-SOINN, an attention-based federal incremental learning algorithm. The technique described in this study enhances the efficacy and speed of model optimization by using asynchronous updates on a central server. The authors in reference [13] used non-linear transformation and structural risk reduction to convert the Internet traffic categorization issue into a quadratic optimization problem. This methodology does not need the feature selection process and exhibits commendable stability and accuracy.

In their research, N. Islam et al. [14] investigated the application of intrusion detection systems (IDSs) in IoT environments. The study concentrated on decision tree (DT), random forest (RF), support vector machine (SVM), and deep machine learning techniques, including deep neural networks (DNNs), deep belief networks (DBNs), long short-term memory (LSTM), stacked LSTM, and bi-directional LSTM (Bi-LSTM). This research assessed the efficacy of shallow and deep machine learning methodologies using diverse variables, including accuracy, precision, recall, and F1 score. The study’s findings revealed that deep learning intrusion detection systems (IDSs) had superior performance in detecting IoT threats compared to shallow machine learning methods.

Abdel-Basset et al. [15] adopted the LocalGRU method to achieve local representation and utilized a multi-headed attention layer to gain global representation. An intrusion detection model was developed to analyze IoT traffic inside a fog computing environment. The experimental findings revealed that the model had a remarkable accuracy rate of 99.75% when evaluated on the UNSW-NB15 dataset. In their study, Putchala [16] introduced a novel multilayer IoT architecture that integrates deep learning techniques, namely long short-term memory (LSTM) and gated recurrent neural networks (GRUs). This suggested design aimed to achieve a lightweight implementation for IoT systems. The performance of the design architecture was assessed using the DARPA/KDD Cup 1999 intrusion detection dataset for every layer, resulting in an accuracy rate of 98.91%.

In research undertaken by Lopez-Martin et al. [17], characteristics were retrieved from the packet headers transmitted throughout the stream lifespan. A distinct attribute was developed for every data stream, whereby just the features of the packets were used, with the exclusion of IP addresses. The collected characteristics were used to investigate several model architectures, such as recurrent neural networks (RNNs), individual convolutional neural networks (CNNs), and hybrid combinations of CNNs and RNNs. The researchers obtained a peak accuracy of 96.3% by performing comparison studies. In contrast, M. B. Umair et al. suggested a classification method using multilayer deep learning [18], resulting in a notable accuracy rate of 99.23%. This classification method’s accuracy was superior to that of support vector machine (SVM)- and K-nearest neighbor (KNN)-based classification algorithms.

Atayero [19] proposed the development of DRNN and SMOTE-DRNN models using the Bot-IoT dataset. The models sought to mitigate the issue of class imbalance by using the synthetic minority oversampling method (SMOTE) to generate supplementary samples representing the minority class. Deep recurrent neural networks (DRNNs) enabled the models to acquire hierarchical feature representations from the balanced network traffic data, facilitating discriminative classification. The findings from the simulation indicate that the performance metrics of the DRNN model, including accuracy, recall, F1 score, AUC, GM, and MCC, were negatively impacted by the presence of a high-class imbalance within the dataset. The SMOTE-DRNN model demonstrated a notable accuracy rate of 99.50%.

Rezvy et al. [20] employed a deep self-coding dense neural network technique in their research to proficiently identify intrusions or threats inside 5G and IoT networks. The findings revealed a noteworthy improvement in the precision and efficiency of detection. In contrast, Sarika [21] and colleagues introduced a technique that employs deep self-encoders to identify potentially malicious network behaviors through IoT devices. Nevertheless, the methodology used by these researchers resulted in an accuracy rate that fell below 85% for both datasets.

This study presents a new methodology for obtaining data information features via one-dimensional and two-dimensional convolutional crossover algorithms. The model underwent conventional traffic patterns training and can identify atypical traffic instances that depart from the established norm.

3. System Modeling

The dataset used in this research will be discussed in this section, along with a comprehensive explanation of the residual network model.

3.1. Data Preparation

The dataset used in this study was gathered from the Canadian Institute for Cybersecurity Research (CIC), a comprehensive and interdisciplinary organization focused on training, research and development, and innovative endeavors. Its primary objective is to promote excellence and establish a position of authority in cybersecurity research, innovation, and education. Additionally, the CIC aims to assume a leadership role in driving change within the cybersecurity domain and to foster thought leadership while facilitating con.

The selection process included choosing the appropriate data from three distinct datasets: the Android malware dataset (CIC-AndMal2017) [22], the CIC IoT dataset 2022 [23], and the VPN–nonVPN traffic dataset (ISCXVPN2016) [24]. Each dataset was carefully tagged. Table 1 displays the data that were chosen for inclusion from each dataset.

The Pcap file format [25] is widely used for storing network traffic data. Each Pcap file begins with a global header, which is followed by one or more packet records. The global header, which is 24 bytes in size, is followed by packet records that contain a 16-byte packet header and the corresponding packet data. The data format of a Pcap file is illustrated in Figure 1.

Due to the substantial volume of packet traffic and the primary objective of this study to use less significant dimensional data to detect anomalies in IoT traffic, it is essential to segment and analyze the packets to obtain the requisite data.

In terms of data selection, since we needed to extract rich features from normal traffic, we needed to cover a wide range of traffic, and the presence of fewer repetitive packets could enable the model to better extract data features. ACK packets may be used for network intrusion, and we included ACK packets in the training so that when there was a situation where ACK packets were utilized for an attack, we were able to detect this type of attack. Moreover, our model is based on packet detection and does not need special treatment for disordered packets; therefore, we chose to use all the packets in Pcap for our experiments.

In the processing of the Pcap file, Wang [26] et al. chose the first 1000 bytes of TCP/UDP for processing, while Chen [27] intercepted the first 1480 bytes of the payload. In our study, we addressed the challenge of constantly changing types of malicious traffic and the difficulty in adequately collecting data on such traffic. To overcome this, we opted to focus on normal traffic and employed a confidence level approach in developing our model. This approach enabled us to effectively detect most types of malicious traffic, including emerging threats. Additionally, we observed that the length of the data varied significantly. Selecting excessively long data, which would require filling in incomplete data, could potentially impact the accuracy of the model’s detection capabilities and subsequently reduce its performance. Therefore, we specifically chose to process the first 32 bytes of the packet data, effectively reducing the data dimensions. Each PCAP file consists of a collection of traffic data. Using the data type specified in this article, the traffic sample data inside the file were read cyclically by extracting the global header of each Pcap file. In this context, each packet header–packet data combination has the first 16 bytes of the packet header removed. The first 32 bytes of the packet data underwent a conversion process to a binary format, after which numerical labels were assigned to the traffic based on its specific type.

3.2. Model Design

Numerous models using neural networks have been examined, including those using a fusion of one-dimensional convolution (sequence model) and two-dimensional convolution (spatial model). This section outlines the models generated in this study.

Sequence models (Figure 2) often receive sequence data as their input and successfully capture the relationship properties between sequences by structurally designing the network internals. These relational qualities are commonly outputted in sequence form [28]. The spatial model, as described in reference [29], typically operates on two-dimensional data and employs the convolution operation inside the design model to extract the two-dimensional properties of the data (Figure 3). Generally, the resulting data after each convolution operation remain in a two-dimensional format.

In the feature engineering part of this research, we utilized one-dimensional convolution to extract features from sequences and two-dimensional convolution to extract features from spatial data. In addition, we used multi-layer convolutional techniques to comprehensively extract data characteristics. Nevertheless, as the depth of the network rises, there is a higher probability that the objective function will converge toward a locally optimum solution. Furthermore, the issue of gradient disappearance becomes increasingly pronounced as the number of layers in the network increases. This leads to inadequate learning of the network parameters in the final output layer. The residual unit can create a residual connection by adding the output of the previous layer to the input of the current layer. This allows the information to pass through network shortcuts, preventing the issue of gradient fading that may occur after multiple nonlinear transformations. With the use of residual connections, the gradient can be efficiently propagated back to shallower layers. This enables more effective gradient updating, ultimately enhancing the training process. One of the advantages of jump connections is that they enable information to be directly passed between different layers, preventing any loss or decay of information. This allows the network to effectively retain and transfer important feature information, thereby aiding the deeper layers of the network in learning and representing complex patterns. The construction of the residual network is shown in Figure 4, whereby the weight layer represents both the convolutional layer and the BN layer.

The residual network consists of a sequence of residual blocks, as seen in Figure 4. The residual block consists of two components: the direct mapping portion and the residual section. The function

h (x_{1})

represents the direct mapping, and the resulting curve is shown on the right side of Figure 4. The residual component,

F (x_{1}, W_{1}),

often comprises two or three convolutional processes. Specifically, it refers to the section inside Figure 4 encompassing the convolutional process at its core. A residual block can be expressed as (

x_{1} = h (x_{1})

):

x_{l + 1} = x_{1} + F (x_{1}, W_{1})

(1)

Any deep cell

X_{L}

feature can be expressed by recursion, as shown below.

X_{L} = x_{l} + \sum_{i = l}^{L - 1} F (x_{i}, W_{i})

(2)

The feature

X_{L}

for any deep cell

X_{L}

can be expressed as the feature

x_{1}

of the shallow cell

X_{L}

plus a residual function of

\sum_{i = l}^{L - 1} F

. Similarly, for an arbitrary depth cell

X_{L}

, its features can be described as:

X_{L} = x_{0} + \sum_{i = 0}^{L - 1} F (x_{i}, W_{i})

(3)

which is the sum of the outputs of all previous residual functions plus

x_{0}

. In the context of backpropagation, if we consider the loss function as E, the application of the chain rule allows us to obtain:

\frac{\partial_{ε}}{\partial_{x_{l}}} = \frac{\partial_{ε}}{\partial_{x_{L}}} \frac{\partial_{x_{L}}}{\partial_{x_{l}}} = \frac{\partial_{ε}}{\partial_{x_{L}}} (1 + \frac{\partial}{\partial_{x_{l}}} \sum_{i = l}^{L - 1} F (x_{i}, W_{i}))

(4)

Expanding this equation into two parts yields

\frac{\partial_{ε}}{\partial_{x_{L}}}

and

\frac{\partial_{ε}}{\partial_{x_{L}}} \frac{\partial}{\partial_{x_{l}}} \sum_{i = l}^{L - 1} F (x_{i}, W_{i})

, where the former ensures that the signal can be directly transmitted back to an arbitrary shallow layer, while Equation (4) also ensures that there is no gradient disappearance. Furthermore, in the convolutional network, the number of feature maps of

x_{1}

may not be the same as

x_{l + 1}

; in this case, it is necessary to use 1 × 1 convolution between the two to raise or lower the dimensions. At this point, the residual block is represented as

x_{l + 1} = h (x_{l}) + F (x_{l}, W_{l})

(5)

The conventional residual network [30] is limited in extracting spatial characteristics from the input data. While the current approach is practical for geographically scattered data, such as images, additional considerations are necessary when dealing with data that exhibit typical time series characteristics, such as IoT traffic. This article presents a proposal for updating the residual block while keeping the constant in the residual component the same. This improvement aims to enhance the feature extraction process to minimize the loss of the model. The configuration of the altered residual network is shown in Figure 5.

At this point, its equation can be expressed as:

x_{l + 1} = h (x_{l}, W_{1_{l}}) + F (x_{l}, W_{2_{l}})

(6)

In the detection model, the last layer is the SoftMax layer, which utilizes the output value as a measure of confidence. Additionally, to address the issue of gradient vanishing/exploding [31,32], we included batch normalization [33] into our approach to expedite the training process. The process of batch normalization entails the normalization of each feature at the batch level during the training phase. This normalization comprises scaling the inputs to achieve a zero mean and unit variance. Subsequently, the normalized features are re-scaled, taking into account the whole of the training dataset. The mean and variance acquired at the batch level are replaced with the newly learned values. The residual blocks inside the network were substituted with the recently devised residual block constructions to generate a novel model, as seen in Figure 6.

4. Experiments and Results

This section begins by conducting performance evaluations of the enhanced model using experiments on the CIC-AndMal2017 dataset. Subsequently, the model was applied to the CIC IoT Dataset 2022 to assess its capacity for traffic classification and anomalous traffic detection in the IoT context. Furthermore, an examination was conducted to determine the model’s generalization capability, exploring its potential applicability in domains beyond IoT. The computations were executed on a laptop with a 12th Generation Intel(R) Core(TM) i9-12900H processor running at a clock speed of 2.50 GHz. The laptop had 16.0 GB of onboard RAM. The calculations were performed using the torch package in Python.

The most effective approach for evaluating the model’s performance is to execute it inside an actual network environment. However, due to the high cost associated with this strategy, we considered the network model’s performance by examining the test set’s performance. Accuracy (acc) is a widely used evaluation metric in classification models. It represents the proportion of samples correctly classified by the model. In the context of IoT anomalous traffic detection, accuracy provides an overview of the overall prediction results. Precision (P) measures the model’s ability to correctly classify normal traffic, i.e., the proportion of normal traffic that is accurately detected. Recall (p) measures the model’s ability to capture normal traffic, i.e., the proportion of normal traffic that is successfully detected compared to the total traffic classified as normal. In IoT anomalous traffic detection, we aimed to minimize false positives by prioritizing high precision. At the same time, we strived to identify as much anomalous traffic as possible by focusing on high recall. Both precision and recall are crucial metrics in this context. To provide a comprehensive evaluation of the model’s performance in anomalous traffic detection, we used the F1 score. The F1 score combines precision and recall, offering a balanced assessment. By considering the F1 score, we can gain deeper insights into the model’s performance. They are formulated as follows:

a c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(7)

P = \frac{T P}{T P + F P}

(8)

R = \frac{T P}{T P + F N}

(9)

F_{1} = \frac{2 \times P \times R}{P + R}

(10)

Based on our understanding of neural networks [34,35,36,37,38], we hypothesized that the Resnet residual network effectively captures the spatial (2D) feature information of traffic data. However, it overlooks the sequence features inherent in traffic data, which may result in incomplete feature extraction and consequently impact the detection of malicious traffic. For these reasons, we made improvements to the direct mapping part of the original Resnet18. We incorporated one-dimensional convolution to extract sequence features, thereby combining sequence feature extraction with spatial feature extraction. However, it is necessary to conduct experiments to compare the performance of the original model with the modified model after replacing the constant mapping layer with one-dimensional convolution. We compared the performance of the original model and a modified version that incorporates one-dimensional convolution by conducting experimental observations.

Initial experiments were undertaken to assess the efficacy of the loss function using both the original residual network model without any enhancements and the updated residual network model with alterations. In this study, we used a selection of 10 distinct categories of traffic classified as adware, extracted from the Pcap file inside the CIC-AndMal2017 dataset. A total of 50,000 data points were obtained by selecting 5000 data points for each traffic class. Furthermore, we included ten distinct data labels. We partitioned the dataset into training and test sets at a ratio of 8:2. The data in the test set were mixed randomly to mitigate bias during the training phase and improve the model’s capacity for generalization.

As seen in Figure 7, the ultimate loss function of both the initial model (Mal_R0.01) and the novel model (Mal_RL0.01) exhibited similar results after adequate training iterations. The novel model demonstrated a marginal decrease in loss. This finding illustrates that substituting the constant mapping layer of the residual network with a one-dimensional convolutional layer does not result in a decline in the model’s performance. Additionally, it was noticed that the ultimate stable value of the loss function was unaltered with learning rate values of 0.01, 0.005, and 0.001. The loss function reached a steady state after 20 epochs. Hence, a learning rate of 0.01 was selected for the remaining trials, and the training process was conducted for 20 epochs.

The performance of the two models was assessed based on their accuracy. In the trials, the parameters of both models’ framework structures were similar, except for the variation in residual blocks, which pertains to the feature extraction process. Simultaneously, we executed a series of five experimental rounds to mitigate the influence of random occurrences. A disparity in accuracy existed between the two models in their ability to conduct data categorization. The mean accuracy of the original residual network model was 0.9254, while the mean accuracy of the new residual network model was 0.9303. This discrepancy between the two models is significant.

In the context of IoT traffic data, our objective was to detect anomalous network traffic data. Previous methodologies have primarily focused on identifying anomalous network traffic, but they have experienced several obstacles. The prevalence of various forms of atypical traffic has shown a rising trend over time, posing challenges in procuring an adequate number of negative samples. To tackle these issues, we propose a unique methodology that evaluates the confidence index of normal traffic data. The characterization of traffic data as normal is contingent upon the presence of a high confidence index, which indicates the extent of our faith in its accuracy. In contrast, when the confidence index of regular traffic data is deemed inadequate, it is categorized as abnormal traffic. To accomplish this objective, extracting an adequate number of characteristics from the traffic data is necessary. This allows us to give a high level of confidence to regular traffic data and a low level of confidence to anomalous network traffic.

The experimental results of our study indicate that the model we created is capable of extracting information from traffic data with a better feature extraction capability. The experiments were conducted using the CIC IoT Dataset 2022 to categorize IoT devices [39,40] and detect aberrant traffic.

Initially, the positive samples in the dataset (Table 2) were partitioned into a training set and a test set at a ratio of 8:2. Subsequently, the newly formed model was trained using the training set. Subsequently, the minimum confidence index was extracted from each category (Figure 8) and uniformly adjusted to serve as a threshold for the test set. The experiment was performed using both the old and new models.

When utilizing the confidence index for identifying malicious traffic, the model typically produces high-confidence classification results for normal traffic. However, for abnormal traffic, the model may yield a lower confidence score. By establishing an appropriate threshold, it becomes possible to allow normal traffic to pass the detection while still detecting abnormal traffic. Using this method addresses the issue of sample imbalance caused by insufficient negative sample collection. Additionally, it aids in the detection of new types of malicious traffic by assigning a lower confidence index when the traffic falls below the threshold. This approach is beneficial for classifying and identifying emerging types of malicious traffic.

To evaluate the model’s performance, the test set was partitioned into two subsets: positive and negative. These subsets were then tested independently to assess their respective outcomes. The minimum confidence index for each class in the training set, the minimum confidence index for each category in the positive samples, and the maximum confidence index for each class in the negative samples (as shown in Figure 9) were selected. It was observed that the positive samples exhibited similar values to the threshold value of the confidence index. A limited number of negative samples had confidence indexes close to the threshold value. Consequently, we can modify the threshold value to ensure that the positive samples satisfy the specified criteria and the negative samples fail to meet the threshold, facilitating the identification of abnormal traffic. In terms of confidence index setting, our goal was to ensure that a significant number of positive samples have a confidence level above the threshold. At the same time, we aimed to minimize the number of malicious traffic samples that have a confidence level below the threshold. This approach allows us to effectively detect abnormal traffic while allowing normal traffic to pass through the detection process. In our experiments, we observed the F1 scores of the model and determined that setting the threshold to the minimum confidence index minus 0.02 for each class in the training set improved the F1 score performance. This adjustment allowed the model to maintain a high level of accuracy in recognizing both types of traffic. Consequently, we applied the same threshold in all subsequent experiments.

The confusion matrix presented a comprehensive analysis of the Resnet-18 and Conv1+Resnet-18 models, specifically regarding their performance on positive and negative samples. The classification accuracy for each data category consistently exceeded 99.9%, mirroring the accuracy of the original model. The negative samples were deliberately chosen to include Flood and RTSP Brute Force (RSTP) types. The Flood category encompassed the entirety of the Camera data. In contrast, the RTSP Brute Force category contained HTTP, UDP, and TCP traffic data to comprehensively evaluate the model’s capacity to detect anomalous traffic. According to the data presented in Figure 10, it can be observed that the initial model exhibited a false positive rate of 0.02% for Flood detection and 11.75% for RSTP detection. Consequently, the overall accuracy of the model was at most 97.64%. Furthermore, it is worth noting that the model’s performance exhibited some instability, occasionally attaining an accuracy level of approximately 90%. On the other hand, the newly introduced model (depicted in Figure 11) showed a consistent performance and achieved a detection accuracy of 99.9% for negative samples.

The evaluation metrics for the Conv1+Resnet-18 model were computed and are displayed in Table 3. The findings indicate that the accuracy of identifying both types of aggressive traffic surpassed 99%, with the identification rate of positive samples reaching 100% in the most favorable circumstances. The results of this study suggest that the approach presented in this research article exhibits considerable promise in anomaly detection for IoT traffic.

To demonstrate the superior performance of our model, we trained a total of six models on the dataset. These models included traditional network models such as LSTM, Lenet-5, and Resnet-18, as well as the novel model CNN+LSTM. Additionally, we used network models FedAVG and CNN(1D)+GRU, specifically designed for IoT. By comparing these models with the model proposed in this paper, we aimed to highlight the advantages of our approach. As shown in Table 4, the method proposed in this paper necessitates a high confidence index to be assigned to the positive samples. Moreover, it utilizes a mere 32 bytes of data. Therefore, the employed model must effectively extract the information embedded in the positive samples. This extraction is crucial for accurately distinguishing between positive and negative samples since only positive samples are used as training data. The LSTM model has limited feature extraction capability and struggled to stabilize its confidence index for normal traffic, resulting in an accuracy of only 90.88% on the test set. On the other hand, FedAVG showed a significant improvement in accuracy. However, due to its non-centralized training, the confidence index of positive samples was not sufficiently large, making it challenging to distinguish negative samples. Both the Lenet-5 and Resnet-18 models achieved 99% accuracy on positive samples. However, due to Resnet-18’s superior information extraction capability, it outperformed Lenet-5 in recognizing negative samples. When trained with both positive and negative samples, both the CNN+LSTM and CNN(1D)+GRU models exhibited good performance. However, they were not as accurate as our proposed model in detecting malicious traffic when trained solely on normal traffic.

In our study, we utilized the binary format of data packets for model training. This approach helps us overcome the challenge of handling diverse data from heterogeneous IoT devices. Additionally, the data we select for analysis exhibit common characteristics found in most IoT devices. The design of our model focused on extracting data features for our study, independently of any specific traffic model. As a result, our findings can be generalized to various types of IoT devices without the limitations imposed by different device types or traffic models. In our upcoming research, we aim to develop a framework for detecting malicious traffic. This framework will operate in collaboration with both the cloud-side and end-side components, and our objective is to implement this method in real-world scenarios.

Based on the aforementioned experimental sections, the accuracy of the classification model and the confidence index for correctly classifying positive samples as positive was relatively low for the category of traffic data known as CIC-AndMal2017. This can be attributed to the similarity between the data instances within this category. Conversely, when conducting experiments on the CIC IoT Dataset 2022, we aimed to assess the method’s applicability to other relevant domains. To this end, we selected four distinct types of traffic data, namely chat (4000), email (4000), files (4000), and audio (2000), from the ISCXVPN2016 dataset. This dataset’s total number of instances used for testing purposes amounted to 14,000. Three audio samples were selected for the experiment as positive samples, while an additional set of audio samples was chosen as negative samples.

The experimental results indicate that the trained model exhibits a high level of stability, accurately identifying anomalous traffic and categorizing normal traffic with a success rate exceeding 99%. The model parameters utilized in the two previous studies remained unchanged.

5. Conclusions and Future Work

This study compared our modified residual network model and the original model, focusing on accuracy. The results consistently demonstrate that our model performed better than the original model. This observation suggests the exceptional performance of our model. Subsequently, our model was implemented to identify abnormal network traffic in the Internet of Things (IoT) domain, resulting in a commendable accuracy rate of 99% or above when detecting Flood and RSTP anomalous traffic. Furthermore, our model demonstrated a high level of accuracy in the classification of normal traffic. In addition, the model was evaluated on the ISCXVPN2016 dataset to identify abnormal network traffic not included in the training set. The outcomes of this evaluation were found to be satisfactory.

The methodology employed in our study has yielded encouraging outcomes within the realm of IoT. Based on the experimental research, we demonstrated the potential of our methods for detecting anomalous traffic in both application software and Internet networks. However, it is imperative to thoroughly examine the many challenges and issues in effectively identifying and detecting malicious network traffic. Based on these findings, we propose two recommendations for future investigations.

(1) Privacy and data transmission: The utilization of federated learning mitigates the issue of privacy breaches among clients. Rather than retaining the entirety of clients’ data, federated learning ensures privacy protection through model aggregation. Additionally, it alleviates the challenge of excessive data transmission in network settings. This approach can be effectively employed to address the issue of data transfer during the process of model training in network environments.

(2) Computational resources: Using computational resources in traditional centralized detection methods places a significant burden on data centers, resulting in inefficient use of computational resources on edge devices. However, this issue can be effectively addressed by adopting edge computing techniques. Hence, investigating approaches for transferring detection responsibilities from the central data center to the edge, thereby facilitating the optimal utilization of edge computing resources, presents a potentially fruitful avenue for further academic investigations.

Author Contributions

Conceptualization, T.N.; methodology, Q.L.; software, T.N.; validation, Q.L.; investigation, Y.L.; writing—original draft preparation, T.N.; writing—review and editing, T.N. and X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Ministry of Education Industry-University Cooperation Project (Grant no. 220604719062313) and the 2020 New Generation Information Technology Innovation Project of Science and Technology Development Center of Ministry of Education (Grant no. 2020ITA02032).

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://www.unb.ca/cic/datasets/andmal2017.html, https://www.unb.ca/cic/datasets/iotdataset-2022.html, https://www.unb.ca/cic/datasets/vpn.html, accessed on 8 December 2020.

Conflicts of Interest

The authors declare that they have no conflict of interest.

References

Miorandi, D.; Sicari, S.; De Pellegrini, F.; Chlamtac, I. Internet of things. Ad Hoc Netw. 2012, 10, 1497–1516. [Google Scholar] [CrossRef]
Li, S.; Da Xu, L.; Zhao, S. 5G Internet of Things: A survey. J. Ind. Inf. Integr. 2018, 10, 1–9. [Google Scholar] [CrossRef]
Al-Garadi, M.A.; Mohamed, A.; Al-Ali, A.K.; Du, X.; Ali, I.; Guizani, M. A Survey of Machine and Deep Learning Methods for Internet of Things (IoT) Security. IEEE Commun. Surv. Tutor. 2020, 22, 1646–1685. [Google Scholar] [CrossRef]
Moustafa, N.; Turnbull, B.; Choo, K.-K.R. An Ensemble Intrusion Detection Technique Based on Proposed Statistical Flow Features for Protecting Network Traffic of Internet of Things. IEEE Internet Things J. 2018, 6, 4815–4830. [Google Scholar] [CrossRef]
Alsulami, A.A.; Abu Al-Haija, Q.; Tayeb, A.; Alqahtani, A. An Intrusion Detection and Classification System for IoT Traffic with Improved Data Engineering. Appl. Sci. 2022, 12, 12336. [Google Scholar] [CrossRef]
Jing, Q.; Vasilakos, A.V.; Wan, J.; Lu, J.; Qiu, D. Security of the Internet of Things: Perspectives and challenges. Wirel. Netw. 2014, 20, 2481–2501. [Google Scholar] [CrossRef]
Shadroo, S.; Rahmani, A.M.; Rezaee, A. Survey on the Application of Deep Learning in Internet of Things (IoT), 05 April 2021, PREPRINT (Version 1) Available at Research Square. Available online: https://www.researchsquare.com/article/rs-271839/v1 (accessed on 1 June 2023).
Schmidhuber, J. Deep learning in neural networks. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef]
Liu, J.; Li, T.; Xie, P.; Du, S.; Teng, F.; Yang, X. Urban big data fusion based on deep learning: An overview. Inf. Fusion 2020, 53, 123–133. [Google Scholar] [CrossRef]
Yong, B.; Liu, X.; Yu, Q.; Huang, L.; Zhou, Q. Malicious Web traffic detection for Internet of Things environments. Comput. Electr. Eng. 2019, 77, 260–272. [Google Scholar] [CrossRef]
Ageev, S.; Kopchak, Y.; Kotenko, I.; Saenko, I. Abnormal traffic detection in networks of the Internet of things based on fuzzy logical inference. In Proceedings of the XVIII International Conference on Soft Computing & Measurements, Haryana, India, 21–22 February 2015; IEEE: Piscataway, NJ, USA, 2015. [Google Scholar]
Zhu, M.-Y.; Chen, Z.; Chen, K.-F.; Lv, N.; Zhong, Y. Attention-based federated incremental learning for traffic classification in the Internet of Things. Comput. Commun. 2022, 185, 168–175. [Google Scholar] [CrossRef]
Xu, P.; Liu, Q.; Lin, S. Internet traffic classification using support vector machine. J. Comput. Res. Dev. 2009. [Google Scholar] [CrossRef]
Islam, N.; Farhin, F.; Sultana, I.; Kaiser, M.S.; Rahman, S.; Mahmud, M.; Hosen, A.S.M.S.; Cho, G.H. Towards Machine Learning Based Intrusion Detection in IoT Networks. Comput. Mater. Contin. 2021, 69, 1801–1821. [Google Scholar] [CrossRef]
Abdel-Basset, M.; Chang, V.; Hawash, H.; Chakrabortty, R.K.; Ryan, M. Deep-IFS: Intrusion Detection Approach for Industrial Internet of Things Traffic in Fog Environment. IEEE Trans. Ind. Inform. 2020, 17, 7704–7715. [Google Scholar] [CrossRef]
Putchala, M.K. Deep Learning Approach for Intrusion Detection System (IDS) in the Internet of Things (IoT) Network using Gated Recurrent Neural Networks (GRU). Master’s Thesis, Wright State University, Dayton, OH, USA, 2017. Available online: http://rave.ohiolink.edu/etdc/view?acc_num=wright1503680452498351 (accessed on 1 June 2023).
Lopez-Martin, M.; Carro, B.; Sanchez-Esguevillas, A.; Lloret, J. Network Traffic Classifier with Convolutional and Recurrent Neural Networks for Internet of Things. IEEE Access 2017, 5, 18042–18050. [Google Scholar] [CrossRef]
Umair, M.B.; Iqbal, Z.; Bilal, M.; Almohamad, T.A.; Nebhen, J.; Mehmood, R.M. An Efficient Internet Traffic Classification System Using Deep Learning for IoT: Computers, Materials and Continua. arXiv 2021, arXiv:2107.12193. [Google Scholar]
Popoola, S.I.; Adebisi, B.; Ande, R.; Hammoudeh, M.; Anoh, K.; Atayero, A.A. SMOTE-DRNN: A Deep Learning Algorithm for Botnet Detection in the Internet-of-Things Networks. Sensors 2021, 21, 2985. [Google Scholar] [CrossRef]
Rezvy, S.; Luo, Y.; Petridis, M.; Lasebae, A.; Zebin, T. An efficient deep learning model for intrusion classification and prediction in 5G and IoT networks. In Proceedings of the 2019 53rd Annual Conference on Information Sciences and Systems (CISS), Baltimore, MD, USA, 20–22 March 2019. [Google Scholar]
Sarika, S.; Velliangiri, S.; Ravi, M. A detection of IoT based IDS attacks using deep neural network. In AIP Conference Proceedings; AIP Publishing LLC: Melville, NY, YSA, 2021; Volume 2358, p. 130001. [Google Scholar]
Lashkari, A.H.; Kadir AF, A.; Taheri, L.; Ghorbani, A.A. Toward developing a systematic approach to generate benchmark android malware datasets and classification. In Proceedings of the 2018 International Carnahan conference on security technology (ICCST), Montreal, QC, Canada, 22–25 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–7. [Google Scholar]
Dadkhah, S.; Mahdikhani, H.; Danso, P.K.; Zohourian, A.; Truong, K.A.; Ghorbani, A.A. Towards the development of a realistic multidimensional IoT profiling dataset. In Proceedings of the 2022 19th Annual International Conference on Privacy, Security & Trust (PST), Fredericton, NB, Canada, 22–24 August 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–11. [Google Scholar]
Draper-Gil, G.; Lashkari, A.H.; Mamun, M.S.I.; Ghorbani, A.A. Characterization of encrypted and vpn traffic using time-related. In Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP), Rome, Italy, 19–21 February 2016; pp. 407–414. [Google Scholar]
Harris, G.; Richardson, M.C. PCAP Capture File Format. 22 December 2020. Available online: https://datatracker.ietf.org/doc/id/draft-gharris-opsawg-pcap-00.html (accessed on 1 June 2023).
Wang, Z. The applications of deep learning on traffic identification. BlackHat USA 2015, 24, 1–10. [Google Scholar]
Chen, Z.; Yu, B.; Zhang, Y.; Zhang, J.; Xu, J. Automatic mobile application trafficide ntification by convolutional neural networks. In Proceedings of the 2016 IEEE Trustcom/BigDataSE/I SPA, Tianjin, China, 23–26 August 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 301–307. [Google Scholar]
Jiang, M.; Wu, C.; Zhang, M.; Zhang, M.; Hu, D. Comparative Study of Time Series Models in Network Traffic Forecasting. Acta Electron. Sin. 2009, 37, 2353–2358. [Google Scholar]
Chang, L.; Deng, X.M.; Zhou, M.Q.; Wu, Z.K.; Yuan, Y.; Yang, S.; Wang, H.A. Convolutional Neural Networks in Image Understanding. Acta Autom. Sin. 2016, 42, 1300–1312. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Glorot, X.; Bengio, Y. Understanding the diffificulty of training deep feedforward neural networks. In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, Sardinia, Italy, 13–15 May 2010. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectififiers: Surpassing human-level performance on imagenet classifification. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015. [Google Scholar]
Zhao, Z.; Luo, Z.; Wang, P. Survey on Image Classification Algorithms Based on Deep Residual Network. Comput. Syst. Appl. 2020, 29, 14–21. [Google Scholar]
Liu, X.; You, J.; Wu, Y.; Li, T.; Li, L.; Zhang, Z.; Ge, J. Attention-Based Bidirectional GRU Networks for Efficient HTTPS Traffic Classification. Inf. Sci. 2020, 541, 297–315. [Google Scholar] [CrossRef]
Li, H.; Ge, H.; Yang, H.; Yan, J.; Sang, Y. An Abnormal Traffic Detection Model Combined BiIndRNN with Global Attention. IEEE Access 2022, 10, 30899–30912. [Google Scholar] [CrossRef]
Ying, T.; Jian, Y.; Liu, X. Image Super-Resolution via Deep Recursive Residual Network. In Proceedings of the IEEE Conference on Computer Vision & Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar]
Huang, K.; Li, S.; Deng, W.; Yu, Z.; Ma, L. Structure inference of networked system with the synergy of deep residual network and fully connected layer network. Neural Netw. 2022, 145, 288–299. [Google Scholar] [CrossRef] [PubMed]
Meidan, Y.; Bohadana, M.; Shabtai, A.; Guarnizo, J.D.; Ochoa, M.; Tippenhauer, N.O.; Elovici, Y. ProfilIoT: A Machine Learning Approach for IoT Device Identification Based on Network Traffic Analysis. In Proceedings of the Symposium on Applied Computing, Marrakech, Morocco, 3–7 April 2017. [Google Scholar]
Kohout, J.; Grill, M.; Kopp, M.; Bajer, L. Classification of IoT Devices Based on Their Network Traffic. U.S. Patent 10,749,770, 18 August 2020. [Google Scholar]

Figure 1. Pcap data format.

Figure 2. RNN network structure.

Figure 3. Example of spatial convolution.

Figure 4. Residual block structure.

Figure 5. Modified residual block structure.

Figure 6. Conv1+Resnet-18 structure diagram.

Figure 7. Loss function.

Figure 8. Minimum confidence index for each type of sample in the training set.

Figure 9. Confidence index of various samples.

Figure 10. Resnet model confusion Jun matrix.

Figure 11. Conv1+Resnet model confusion Jun matrix display.

Table 1. Datasets used.

Dataset Name	Dataset File		Data Volume
CIC-AndMal2017	Adware		50,000
CIC IoT Dataset 2022	1-Power	Cameras	18,268
	6-Attacks	1-Flood	22,000
	6-Attacks	2-RTSP Brute Force	4153
ISCXVPN2016	VPN-PCAPs-01/VPN-PCAPs-02		14,000

Table 2. Data used in the Internet of Things.

Sample	Traffic Type	Device Name	Data Volume	Tags
Positive	Camera	Amcrest	708	0
Positive	Camera	Arlo Basestation	1593	1
Positive	Camera	ArloQ	1314	2
Positive	Camera	Borun	659	3
Positive	Camera	DLink	521	4
Positive	Camera	HeimVision	602	5
Positive	Camera	Home Eye	2473	6
Positive	Camera	Luohe	183	7
Positive	Camera	Nest	5620	8
Positive	Camera	Netatmo	3920	9
Positive	Camera	SimCam	675	10
Negative	Flood	Camera	22,000	--
Negative	RTSP Brute Force	all	4153	--

Table 3. Relevant evaluation indicators.

Flow Type	Accuracy	F1	Precision	Recall
Positive/Flood	1.0	1.0	1.0	1.0
Positive/RSTP	0.996	0.997	0.995	1.0
Positive/Flood and RSTP	0.999	0.997	0.995	1.0

Table 4. Accuracy rate of each model.

Model	Positive (Accuracy)/Precision	Negative (Accuracy)	Recall	F1
LSTM	0.908	0.409	0.177	0.296
Lenet-5	0.998	0.688	0.309	0.472
Resnet-18	0.999	0.976	0.855	0.922
CNN+LSTM	0.990	0.763	0.369	0.538
FedAVG	0.961	0.584	0.244	0.389
CNN(1D)+GRU	0.973	0.193	0.144	0.251
Conv1+Resnet-18	1.0	0.999	0.996	0.998

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Q.; Liu, Y.; Niu, T.; Wang, X. Improved Resnet Model Based on Positive Traffic Flow for IoT Anomalous Traffic Detection. Electronics 2023, 12, 3830. https://doi.org/10.3390/electronics12183830

AMA Style

Li Q, Liu Y, Niu T, Wang X. Improved Resnet Model Based on Positive Traffic Flow for IoT Anomalous Traffic Detection. Electronics. 2023; 12(18):3830. https://doi.org/10.3390/electronics12183830

Chicago/Turabian Style

Li, Qingfeng, Yaqiu Liu, Tong Niu, and Xiaoming Wang. 2023. "Improved Resnet Model Based on Positive Traffic Flow for IoT Anomalous Traffic Detection" Electronics 12, no. 18: 3830. https://doi.org/10.3390/electronics12183830

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improved Resnet Model Based on Positive Traffic Flow for IoT Anomalous Traffic Detection

Abstract

1. Introduction

2. Related Works

3. System Modeling

3.1. Data Preparation

3.2. Model Design

4. Experiments and Results

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI