A Network Traffic Abnormal Detection Method: Sketch-Based Profile Evolution

Yi, Junkai; Zhang, Shuo; Tan, Lingling; Tian, Yongbo

doi:10.3390/app13169087

Open AccessArticle

A Network Traffic Abnormal Detection Method: Sketch-Based Profile Evolution

¹

Key Laboratory of Modern Measurement and Control Technology, Ministry of Education, School of Automation, Beijing Information Science & Technology University, Beijing 100096, China

²

School of Automation, Beijing Information Science & Technology University, Beijing 100096, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(16), 9087; https://doi.org/10.3390/app13169087

Submission received: 7 July 2023 / Revised: 2 August 2023 / Accepted: 8 August 2023 / Published: 9 August 2023

(This article belongs to the Special Issue Machine Learning for Graph Pattern Mining and Its Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Network anomaly detection faces unique challenges from dynamic traffic, including large data volume, few attributes, and human factors that influence it, making it difficult to identify typical behavioral characteristics. To address this, we propose using Sketch-based Profile Evolution (SPE) to detect network traffic anomalies. Firstly, the Traffic Graph (TG) of the network terminal is generated using Sketch to identify abnormal data flow positions. Next, the Convolutional Neural Network and Long Short-Term Memory Network (CNN-LSTM) are used to develop traffic behavior profiles, which are then continuously updated using Evolution to detect behavior pattern changes in real-time data streams. SPE allows for direct processing of raw traffic datasets and continuous detection of constantly updated data streams. In experiments using real network traffic datasets, the SPE algorithm was found to be far more efficient and accurate than PCA and Basic Evolution for outlier detection. It is important to note that the value of φ can affect the results of anomaly detection.

Keywords:

network traffic; traffic graph; abnormal detection; sketch; evolution

1. Introduction

With the rapid increase in the amount of data and the extensive use of network applications, the problem of network attacks in the field of network security is becoming more and more serious, and discovering abnormal problems in network traffic problems is now a critical problem to be solved. Abnormal network traffic refers to the phenomenon that the current state of network traffic deviates from the normal state of network traffic. Abnormal network traffic is (more often than not) brought on through malicious network attacks [1], such as denial-of-service attacks [2], port scanning [3], password blasting, far-flung control, etc., as well as network configuration errors and other exceptions. Therefore, network traffic anomaly detection [4,5,6,7,8,9] is a necessary function to maintain security in cyberspace.

Anomaly detection of network traffic comes from outlier detection [10]. The purpose of outlier detection is to identify objects that are significantly different from most data objects in the dataset. Therefore, outliers can be applied to the analysis of network behavior and detect anomalies that are generated in the network.

The previous methods for detecting network traffic anomalies relied too heavily on manual feature selection, lacked adaptability, and could not directly process the original network traffic. On the other hand, in the face of massive high-dimensional traffic data, they cannot handle the dynamic data flow, and it is difficult to effectively extract key features and meet the real-time requirements of the system.

To solve the above problems, Sketch Evolution is proposed for detecting abnormal network traffic. First, the historical network traffic is analyzed to build a normal behavioral model (Profile) of the network traffic, and then calculate the deviation of the contemporary network traffic from the behavioral model of the normal network traffic. Thus, it is decided whether the network traffic is abnormal or not.

To solve the above problems, Sketch-based Profile Evolution (SPE) for network traffic anomaly detection is proposed. SPE algorithm mainly includes three parts. The first part is to analyze the historical network traffic, that is, generate the Traffic Graph (TG) based on Sketch. The second part is to build the normal behavior model (Profile) of network traffic, and use CNN and LSTM to extract the spatial and temporal behavior characteristics to generate profiles. The third part is to calculate the deviation between the current new network traffic and the normal network traffic behavior model to judge whether the network traffic is abnormal. Compare the newly obtained network data with the Profile to determine whether abnormal behavior exists. Profiles are not fixed due to the addition of new data. Profiles evolve when new data streams are generated.

The other contents of this paper are as follows: in the second part, the related work in the field of abnormal network traffic detection is presented. In the third part, we introduce the detection model, which is the generation process of the TG based on Sketch––that is, the structuring process of the network traffic, the modeling method of Profile, and the transition calculation process. The fourth part deals with the analysis of the experimental results, including the Profile analysis experiment and the network traffic volution analysis experiment. The fifth part describes the experimental conclusions and future work.

In addition, the technique proposed in this paper has established commercial co-operation with Beijing Esafenet Development Technology Co., Ltd. (Beijing, China). It has been applied in Esafenet’s DLP (Data Leakage Prevention) system to efficiently detect traffic with abnormal features in a continuous and dynamic network environment, and to provide it to the DLP system for defense measures in time.

2. Related Works

Anomaly detection is an important task in wireless network data evaluation and management, which helps to improve the intelligent management of the network and realize the optimal allocation of network resources. Network anomaly detection can be divided into four categories: statistics-based, time series-based, sketch-based, and machine learning-based network anomaly detection. The statistics and timeseries-based method is unsupervised and does not require labeling data, while the sketch and machine learning based method requires the detection system to train the detection model using labeling data, which is very time consuming but can achieve higher accuracy. In addition, the sketch-based method requires little memory in a fast and complex network environment, so it can store the characteristic information of network traffic in real time. Therefore, this paper applies the method based on Sketch for abnormal detections of network traffic.

Statistics-based methods are well suited for anomaly detection. Wang et al. [11] proposed a network anomaly detection method (PCSS) that combines principal component analysis (PCA) and a single-stage headless face detection (SSH) algorithm. It solves the problem that the existing detection methods cannot learn regarding the spatio-temporal characteristics of the data, the classification accuracy is no longer high, and the detection time and accuracy are easily affected by redundant data in the sample. Patil et al. [12] proposed an abnormal network traffic detection framework, which makes use of principal component analysis (PCA) with feature extraction and dimensionality reduction as the main purpose and makes use of a bidirectional generative adversarial network (BiGAN) model to detect abnormal network traffic. Ibrahim et al. [13] propose a comprehensive entropy-based method for network traffic anomaly classification that protects in opposition to the deception of entropy detection capabilities through a novel protection mechanism. It analyzes changes in different entropy types and monitors the number of different elements in the feature distribution as a unique detection metric to achieve entropy deception protection mechanism. Then, based on multivariate analysis of entropy changes of multiple features and aggregation of complex feature combinations, an entropy-based anomaly classification rule is proposed, expanding the entropy-based anomaly detection method. Ren et al. [14] propose an anomaly detection method based on dynamic Markov models. This method segments the sequence data using sliding windows. In the sliding window, the state of the data is defined by the value of the data, which creates a high-order Markov model with an appropriate order to balance the length of the memory attribute and keep up with the trend of the sequence. In addition, an anomaly replacement strategy is proposed to prevent the detected anomalies from affecting the model building and to maintain the continuity of anomaly detection.

Strategies based on time series consist of autoregressive and moving average model (ARMA) regression, empirical mode decomposition (EMD) transform, wavelet transform [15,16], instantaneous frequency analysis, etc. These methods are suitable for network traffic data processing, which can meet the quantification requirements in network traffic anomaly detection and flexibly use signal processing techniques. Yu et al. [17] proposed a traffic anomaly detection algorithm for wireless sensor networks (WSNs) based on the improved autoregressive integrated moving average (ARIMA) model, and they improved the traditional time series ARIMA model to detect traffic in WSNs, make predictions, and make judgments about traffic anomalies. Yang et al. [18] proposed a threshold model based on Fractional Autoregressive Integrated Moving Averages (FARIMA) to describe SCN traffic and detect anomalies. Cao et al. [19] proposed a network traffic anomaly detection model MPTCP-EMD based on the Multipath TCP(MPTCP) network. The model combines multi-scale detection and digital signal processing theory to realize anomaly detection based on the self-similarity of MPTCP network traffic. This method uses the empirical mode decomposition (EMD) method to decompose MPTCP traffic data, and reconstructs effective signals by removing high frequency noise and residual trend terms. This model exploits the idea of a sliding window to compare the changes in the Hurst exponent of the MPTCP network under different attack conditions and decide whether an anomaly exists or not.

The Sketch is a distributed profile data structure, which is widely used in network traffic anomaly detection and can process a large amount of data in a short time. Ippoliti et al. [20] proposed and developed a dynamic method for enhanced network flow anomaly detection. We delineate the network state during the creation of the data flow, enabling threat detection for general purposes. We describe an efficient flow augmentation method based on a count-minute sketch that provides per-flow-, per-node-, and per-network-level statistics in parallel with flow record generation. Tong et al. [21] first proposed a general architecture on FPGA to speed up Sketch and deployed it in two widely used Sketches: Count-min Sketch and Kary Sketch. For two key network anomaly detection tasks, we propose online sketch-based algorithms: Heavy-Hitter Detection and Heavy-Change Detection. We use the proposed Sketch general architecture to accelerate these online algorithms.

Machine learning methods mainly include classification, clustering, pattern recognition, neural networks, and decision trees. Machine learning-based methods can process large network traffic data briefly and correctly via self-learning methods. Pu et al. [22] proposed an unsupervised anomaly detection method that combines Sub-Space Clustering (SSC) and One-Class Support Vector Machine (OCSVM), which can detect attacks without any prior knowledge. Baek et al. [23] established a new attribute that can efficiently identify anomalous events using clustering, which allows us to construct label information for individual data points called estimated samples while preserving the local neighborhood information of the connections’ features by using the Laplacian eigenmap technique. Jain et al. [24] proposed two techniques, an Error Rate Based Concept Drift Detection and Data Distribution Based Concept Drift Detection, and investigated their effects. In addition, a sliding window-based data collection and drift analysis combined with K-Means Clustering has been used to reduce the amount of data size and improve the training datasets. We have used the Support Vector Machine (SVM) classifier for anomaly detection, and retraining of the model was initiated based solely on statistical tests. Hwang et al. [25] proposed an abnormal traffic detection method, namely, D-PACK, which consists of a Convolutional Neural Network (CNN) and an unsupervised deep learning model (e.g., Autoencoder) that can automatically analyze the traffic patterns and filter abnormal traffic. The CNN module can automatically extract features from the original network data. Garg et al. [26] proposed a robust anomaly detection technique, Fuzzified Cuckoo-based Clustering Technique (F-CBCT), which is divided into two stages: the hybridization of Cuckoo Search Optimization and K-means clustering. The advantages and disadvantages of some of the above references are listed in Table 1.

The two main parts of anomaly detection are model building and anomaly detection, with model building being the most important part of anomaly detection. The core idea of anomaly detection is to compare the new data pattern with the existing behavior. If the new data deviates from the existing behavior, it is classified as an anomaly. In the existing literature, the user model is generally defined as the Profile, which is an outline or overview of the normal behavioral characteristics. The main purpose of establishing the model is to “digitize” the profile by various methods so that the algorithm can process it. Numerous methods for modeling and migration computation have been proposed in the related literature, which can be broadly classified into three categories depending on the type of method: statistical-based method, Soft Computing (SC), and distance-based method.

The statistical-based approach is based on whether the sample obeys the global distribution. The network data flow to be processed in anomaly detection varies in scale and level. Although some literatures can deal with real network traffic, they mainly take the network data of large networks as the analysis target. For example, with Campus LAN [29], trans-Pacific backbone [30], Abilene and GÉANT networks, CER-NET [31], etc., such anomaly detection mainly maintains the stable operation of large networks. The detected abnormal conditions are mostly related to a part of the network area, so it is difficult to obtain fine-grained abnormal behavior for a single user or IP address. Nowadays, the anomaly detection research tends to analyze anomalies for specific targets.

SC corresponds to traditional computation, which seeks exact solutions to problems. However, in practical applications, some problems do not have exact solutions, or even if there are exact solutions, they involve a lot of computational cost. Therefore, finding a feasible solution within acceptable computational cost is the most important feature of soft computing. SC includes many intelligent computations related to artificial intelligence, such as genetic algorithms, ant colony algorithms, and neural networks. Therefore, soft computing is also a commonly used method to establish a Profile for network traffic. D’angelo et al. [32] preprocess data to generate Digital Signature of Network Segment using Flow Analysis (DSNSF), and 1440 samples were selected, crossed, and replicated by a genetic algorithm to create a Traffic Profile to describe traffic behavior. Hamamoto et al. [33] improved the clustering algorithm and proposed the Linear Grouping Algorithm (LGA) to cluster the samples, and then combined it with the particle swarm optimization algorithm to optimize the resampling process and obtain more stable grouping results. This grouping result is used in the next step of Single-Hop Access Profile modeling.

The distance-based method, i.e., the method based on similarity [34], classifies anomalies by the way of determining them and performs anomaly detection by calculating the degree of similarity between Profile and sample data. In the existing literature, there are many methods to calculate the distance, but some of them are the existing methods. For example, the Mahalanobis Distance [35], the Weighted Jaccard Distance [36], and the KL divergence [37], and there is also a self-defined distance function Similarity search [38]. Wang et al. [39] used the Affinity Propagation (AP) clustering algorithm to cluster normal behaviors and detected anomalies by calculating the Euclidean distance between the samples and normal behavior clustering. The dataset used in the experiment was comprised only of HTTP packets. Vieira et al. [40] apply the signal processing framework to anomaly detection believing that network traffic is similar to signal. Traffic is defined as a mixture of normal traffic, abnormal traffic, and noise, and traffic is modeled, and then anomaly detection is performed using intrinsic similarity. Bi et al. [41] used n-dimensional vectors to represent each sample, and used cosine distance to calculate the cosine angle of two vectors to represent the similarity coefficient. Ding et al. [42] used PCA to generate the digital features of network traffic and used Dynamic Time Warping (DTW) to calculate the similarity of two sequences in computing the similarity of network traffic. DTW is an algorithm used to find the optimal queue in speech recognition.

Most of the existing anomaly detection methods are related to intrusion detection datasets, and a small fraction of anomaly detection methods based on traffic statistics cannot solve the fine-grained anomaly problem for a single IP user. The Sketch-based Traffic modeling method proposed in this paper is a distance-based method that maps the network traffic to specific areas according to the structuring process of network traffic to form a Traffic Graph (TG), so that the abnormal method can directly deal with the original network traffic and generate fine-grained features. Then, CNN and LSTM neural networks are used to extract the Profile of network traffic behavior to realize the evolution behavior modeling. The corresponding distance formula and TG are used to determine the position of outliers, and the fine-grained detection of outliers is realized. The anomaly detection model improves the existing anomaly detection and enables the direct processing of the original network data, but the analysis of the type of anomaly requires further research.

3. Sketch-Based Profile Evolution for Network Traffic Abnormal Detection

The difficulty in analyzing network behavior lies in modeling dynamic traffic behavior. Traffic is characterized by few attributes, a large amount of data, a large number of human influence factors, and no typical behavior characteristics. The existing anomaly detection has many characteristics, but is built on a specific system. When the scope is extended to the actual communication data in the network, anomaly detection cannot be applied. Based on the analysis of the most common network packets in the network, this paper generates a Profile of user behavior through the network traffic structuring process, including Profile Evolution to describe the change of user behavior. Finally, the anomaly detection is completed by calculating the offset.

3.1. Detection Model

The word detection comes from motion detection [43]. The key is to detect changes in images, and it is commonly used in automatic alarm systems and unattended monitoring systems. Detection models can dynamically deal with changing objects as well as detect and locate anomalies. It is characterized by dynamically capturing images over an ex-tended period of time, detecting abnormal conditions by comparing before and after images, and providing early warning of unusual location motion detection used in network traffic analysis. The network data flow is converted into the TG map of traffic behavior via the Sketch method, which corresponds to the actual image captured by the aid of the camera. The Profile of traffic behavior is generated through modeling to detect anomalies and locate them in the data flow. The core idea of motion detection is to automatically determine anomalies through image changes, so the detection model needs to have two characteristics: one is to process dynamically changing targets, and the other is to detect and locate anomalies to take follow-up processing. Motion detection can be divided into the following five parts: image generation, determination of “default” behavior, anomaly detection, anomaly localization, and early warning, as shown in Figure 1.

We introduce the concept of motion detection in daily life monitoring into traffic monitoring in a network environment to realize the modeling and anomaly analysis of network behavior. The anomaly detection model for motion detection in the network environment should solve the following two problems:

The model needs to have the ability to process dynamic data. In actual monitoring, static analysis is not performed after collecting a segment of the image. To ensure security, image acquisition and analysis must be performed in real time. The detection model in this paper is based on the anomaly detection method. Since previous literature on anomaly detection has been analyzed based on static datasets, there is no dynamic anomaly detection for original network packets. Therefore, based on the network structuring results, this paper uses the Sketch method to generate a TG to complete the network traffic structuring process, realize the dynamic processing of network data flow, and provide features for the generation of Profile and model transitions in the next step. In addition, the generation of a TG can retain information on network traffic, which is equivalent to the abstraction in data flow mining. When an anomaly is found, the location of the anomaly can be located according to the TG, which is very important for practical applications. Since the data generated by network traffic is too large, manually searching for the anomaly will take a lot of time. Therefore, it is very necessary to store the summary information of network traffic in the form of TG.
Since the modeling of network traffic behavior in the existing literature is focused on anomaly detection and the processed data are static data, which is mostly modeled using statistics and probability distributions, the problem to be solved in this paper is how to establish a Profile based on the dynamic network traffic and change the Profile based on new data in real time. Therefore, this paper takes the original network data obtained as the analysis target, which is the Sketch-based method is used to generate the TG, and based on the TG, CNN and LSTM neural networks are used to extract spatial and temporal features and generate profiles. If the newly generated network traffic does not exceed the threshold for determining outliers, the Profile Evolution is based on the Profile and the newly generated network traffic.

3.2. Sketch-Based Network Traffic Structuring Process

Abnormal network traffic detection requires a Sketch-based network traffic structuring process to generate a TG. The network traffic structuring process first takes the original network traffic as the analysis target, extracts the features of the original network traffic by analyzing the communication sessions (i.e., applies the fingerprint), classifies the obtained traffic, and finally generates the TG according to the obtained application fingerprint and classification results.

In data stream mining, the data stream is generated continuously, which consumes a lot of memory, and it is also very resource intensive when searching for historical data, so the Sketch method is introduced. The Sketch data structure [44] can quickly query the data that passes through a large amount of traffic in sequence, i.e., it can provide real-time up-to-date summary data about the data stream and answer the user’s query quickly with a certain accuracy guarantee. It can also store the characteristic information of the traffic in the network in real time in a high-speed and complicated network environment, and it takes up little space.

In addition, it is possible to compress a large amount of information into a smaller data structure using hash functions, i.e., mapping the data stream into a two-dimensional space [45] to create an abbreviated version of the dataset. The specific method is to first extract keywords from the captured data stream to form a vector, and then to use two sets of hash functions to map the keyword vector into a two-dimensional array to complete the compression of the data stream. In doing so, the two groups of hash functions can choose different mapping methods.

For the hash function [46] given two datasets, X and Y, the mapping relation between datasets X and Y is R, which is usually represented by a function:

f_{R} : x \overset{R}{\to} y (\forall x \in X, y \in Y)

. When using the network measurement method based on Sketch, the conflict problem to be addressed is the stable state of network traffic behavior, that is, the era of identical data flow shows that the behavior has not changed at this time, and due to the fact that the generated TG map is continuously stored, it can search for information and locate anomalies in historical data. Figure 2 is the Sketch data flow summary method.

When anomaly analysis is carried out on a single user in a network environment, network traffic is primarily affected by humans. However, if we extend the consideration to a macro perspective, such as the backbone network or the network of an ISP Internet carrier provider, the human factor is almost erased. In large networks, the characteristics of the overall network are not affected by changes in individual users but change over time. Therefore, to eliminate the influence of the human factor on the behavior modeling and generate a TG for the behavior over this period, the concept of “structured” is proposed.

The concrete embodiment of network traffic is a continuously generated packet stream. The original attributes of each packet are few, including IP address, format, size, time, etc. According to these attributes, the features that can be analyzed in a fine granularity can be extracted, and the data stream can be processed quickly to prevent being annihilated by a large number of packets. In this paper, the original network traffic is directly taken as the analysis object, and the behavior of the mobile terminal is obtained without analyzing the content of the packet. Moreover, due to the huge amount of data, it is impossible to adopt the manual analysis method in practical application, and the fine-grained application behavior characteristics of the mobile terminal [47]—the application fingerprint––can be obtained online in real-time through the principle of the communication session. The specific methods to extract application fingerprints are as follows: first, we use IP Address Transition (IAT) to divide network traffic, find the “breakpoints” of network traffic, divide network traffic according to communication sessions, and obtain a complete application fingerprint. Then, combine with the natural language processing method the tf-idf algorithm and N-gram model to calculate the utility of the applied fingerprint and an N-gram model-based “tree” to save the candidate set, and then extract the applied fingerprint from the structure of the “tree” to achieve the non-artificial dynamic extraction of the applied fingerprint.

The structuring process needs to use the acquired application fingerprint. Let the network traffic be

s e q_{T} = {p_{1}, p_{2}, p_{3}, \dots}

, the N application fingerprints

F P = {f p_{1}, f p_{2}, \dots, f p_{N}}

extracted from the network traffic, with M apps

A P P = {a p p_{1}, a p p_{2}, \dots, a p p_{M}}

. The relationship is shown in Equation (1):

R_{m a p} : s e q_{T} \to F P \to T G_{M \times N} = {p_{i}, \dots, p_{j}} \to T G_{m n} (1 < m < M, 1 < n < N)

(1)

The structured process also requires the classification of the traffic received. In a network environment, the packet exchange between two communication parties is a continuous process. The type, length, and IP address of packets is constantly obtained. The packet length is taken as an important feature analysis, and the Gaussian mixture model is used to solve the duplication problem of fingerprint application. By limiting clustering, the fingerprints of the same application with similar lengths are divided into the same cluster as far as possible to realize the traffic classification of different states. The same fingerprint is clustered according to its length, and the probability distribution of the fingerprint with a similar length is calculated to complete the classification of the same fingerprint. In restricted clustering, the fingerprints of the same IP address should be divided into a cluster as far as possible, so that the fingerprints in the same cluster are similar, which may come from the same application, to improve the classification effect.

The TG map is generated through the network traffic structuring process, and an image can be generated according to

T G_{M \times N}

, as shown in Figure 3:

3.3. Network Spatiotemporal Feature Extraction Based on CNN-LSTM

When detecting network traffic anomalies, transition anomaly detection is also required. The Sketch method is utilized to continuously extract TGs to obtain a set of TGs, and then uses this set of TGs to generate behavior models. Generated behavior models preprocess TGs in time and space to generate profiles through CNN and LSTM neural networks. Finally, the new data stream is processed by the transition method to realize the Profile Evolution.

Use the Sketch method to continuously extract the TG map to obtain a set of TG maps, and then use this set of TG maps to generate a behavior model. However, to generate a behavior model, the TG map needs to be preprocessed in time and space through CNN and LSTM neural networks.

The network data stream [48] is formed by the continuously generated data packet

T_{f} = (p_{1}, p_{2}, \dots, p_{i}, \dots)

and the data packet

T_{f}

is converted into some characteristic fingerprint combinations

F_{p} = (f_{1}, f_{2}, \dots, f_{i}, \dots)

, and then the application fingerprint is mapped to the specified interval according to the Sketch-based method to generate a TG map:

M_{l : n} = [\begin{matrix} f_{1}^{1} & f_{2}^{1} & \dots & f_{m}^{1} \\ f_{1}^{2} & f_{2}^{2} & \dots & f_{m}^{2} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ f_{1}^{l} & f_{2}^{l} & \dots & f_{m}^{l} \end{matrix}]

(2)

The elements of the matrix are all represented by the length of the data packet. Figure 4 is a TG image represented by a gray scale image:

According to the obtained behavioral TG map, CNN [49] is used to obtain the user’s spatial features due to its excellent ability to extract features from images. Figure 5 is a structural diagram of a CNN.

The main functional modules of the deep convolutional neural network CNN can be divided into two categories: a feature learning module and classification module. The feature learning module is mainly composed of a convolution layer and a pooling layer. The convolution layer performs a convolution operation on the obtained behavior TG to extract the features of the local area. This process uses a filter to analyze the image and extract the corresponding feature information. Assuming that the filter is a square matrix of order

| w |

, let

l = n

in

M_{l : n}

, and

\frac{l}{| w |}

is the number of new features,

w

is the weight of the filter, and

b

is the bias vector, as shown in Equation (3):

h_{i} = f (w \cdot M_{i \frac{l}{| w |}} + b)

(3)

Combine the generated new feature vectors, as showed in Equation (4):

H = [h_{1}, h_{2}, \dots, h_{\frac{l}{| w |}}]

(4)

Pooling layers can reduce the number of parameters and reduce the possibility of overfitting. Pooling functions include average pooling and max pooling. Here, the operation of maximum pooling on vector

H

generates a new feature vector, as shown in Equation (5):

\hat{H} = \max {H}

(5)

The network data stream generates a TG through the structured process of the Sketch summary method and then extracts the feature vector of the space through CNN. The data stream is continuously generated, and the user’s behavior is also affected by the change in time. Therefore, the LSTM [50] recurrent neural network is used to extract temporal characteristics of the network data stream

T_{f}

.

The LSTM network has the characteristic that its current state is affected by the previous state, and it has the ability to choose to save or forget information. Its structure is shown in Figure 6, which is composed of three gates: Forget Gate f_t, Input Gate

i_{t}

, and Output Gate

o_{t}

.

C_{t}

is the current state of the cell,

h_{t}

is the state of the hidden layer, and

x_{t}

is the input data.

The Forget Gate is to selectively discard the historical information of the cell state. As shown in Equation (6), the output of the Forget Gate

f_{t}

is determined by the state

h_{t - 1}

and input data

x_{t}

of the previously hidden layer,

σ (\cdot)

is the sigmoid function, and

M_{f}

and

a_{f}

are Forget Gate weights and biases:

f_{t} = σ (M_{f} [h_{t - 1}, x_{t}] + a_{f})

(6)

The Input Gate selectively saves new information into the cell unit state and updates the state information, as shown in Equations (7)–(9); the current state

C_{t}

of the LSTM network will be affected by the previous state;

C_{t - 1}

,

M_{i}

, and

a_{i}

are the weights and biases of the Input Gate;

M_{g}

and

a_{s}

are the weights and biases of the cell state; and

C_{t}^{'}

is the state of the candidate cell:

i_{t} = σ (M_{i} [h_{t - 1}, x_{t}] + a_{i})

(7)

C_{t}^{'} = \tanh (M_{g} [h_{t - 1}, x_{t}] + a_{s})

(8)

C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot C_{t}^{'}

(9)

The Output Gate determines the output information according to the united state of the cell, as shown in Equations (10) and (11), and

M_{o}

and

a_{o}

are the weights and biases of the Output Gate:

o_{t} = σ (M_{o} [h_{t - 1}, x_{t}]) + a_{o}

(10)

h_{t} = \tanh (C_{t}) \cdot o_{t}

(11)

The network continuously generates new data streams, extracts a new spatial feature vector

R = {r_{1}, r_{2}, \dots r_{n}}

through the CNN network, inputs the vector R as the input data

x_{t}

into the LSTM network structure, and obtains the output vector

h_{t}

, which is used to describe the user behavior over time.

3.4. Behavior Pattern Detection Method Based on Profile Evolution

During a period, the user’s behavior is relatively stable. When performing anomaly detection, the user’s Profile needs to be established first. Suppose the vector extracted by the CNN-LSTM network feature is

T = (t_{1}, t_{2}, \dots, t_{i})

, and the Base of the user behavior Profile

B = [b_{1}, b_{2}, \dots, b_{i}]

. Equation (12) linearly expresses the relationship between T and B:

L_{w, b} = ‖ T - B ‖ = ‖ T - \sum_{l = 1}^{i} w_{l} \cdot b_{l} ‖

(12)

The basis generation problem is expressed according to the calculation of the basis set with the smallest reconstruction error, as shown in Equation (13), where T is the input signal and

w_{l}

is the weight vector

B = [b_{1}, b_{2}, \dots, b_{i}]

. Equation (13) is to obtain the

L_{w, b}

minimized parameters w, b:

\arg \min L_{w, b} = \arg \min_{w, b} ‖ T - \sum_{l = 1}^{i} w_{l} \cdot b_{l} ‖

(13)

The vector of the newly generated network data stream is

T^{(1)}, T^{(2)}, \dots, T^{(n)}

. Equation (14) is obtained according to Equation (13):

L_{_{w, b}}^{(n)} = ‖ T^{(n)} - \sum_{l = 1}^{n} w_{l}^{(n - 1)} \cdot b_{l}^{(n - 1)} ‖

(14)

Equation (14) can be refactored into a new objective function to solve the minimization problem:

\underset{w, b}{\arg \min} ‖ L_{w, b}^{(n)} - \sum_{k = 1}^{n} \sum_{l = 1}^{i} w_{l}^{(k)} \cdot b_{l}^{(k)} ‖

(15)

Singular value decomposition [51] (SVD) is performed on Equation (15), and the feature continuously generated by the CNN-LSTM network is

L^{(n)}

, as shown in Equation (16):

L^{(n)} = [\begin{matrix} t_{1}^{(1)} & t_{2}^{(1)} & \dots & t_{i}^{(1)} \\ t_{1}^{(2)} & t_{2}^{(2)} & \dots & t_{i}^{(2)} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ t_{1}^{(n)} & t_{2}^{(n)} & \dots & t_{i}^{(n)} \end{matrix}]

(16)

Perform SVD on

L^{(n)}

, as shown in Equation (17):

L^{(n)} = U Σ V^{T}

(17)

Among them,

U, V^{T}

is a unitary matrix,

Σ

is a diagonal matrix, as shown in Equation (18):

Σ = [\begin{matrix} τ_{1} & 0 & \dots & 0 \\ 0 & τ_{2} & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & τ_{n} \end{matrix}]

(18)

Because

Σ

is a diagonal matrix, the SVD of

L^{(n)}

can also be written as shown in Equation (19):

L (n) = \sum_{l = 1}^{n} u_{l} τ_{l} v_{l}^{T}

(19)

Convert Equation (17) to

U Σ = V^{T} L^{(n)}

, and associate U = B (the energy of U is concentrated on the diagonal) to obtain the corresponding weight

w_{l}

, as shown in Equation (20):

\hat{w_{l}} = \frac{L^{(n)} U}{| L^{(n)} |}

(20)

Therefore, the user’s Profile is shown in Equation (21):

B = [b_{1}, b_{2}, \dots, b_{i}] = \sum_{l = 1}^{i} \hat{w_{l}} \cdot b_{l}

(21)

Assuming that the newly extracted feature vector is

T^{'} = x_{1}^{'}, x_{2}^{'}, \dots, x_{i}^{'}

, set a threshold

δ

, and the distance between the new feature and the Profile is calculated by Power Distance [52], as shown in Equation (22):

D_{B, X'} = {({\sum_{l = 1}^{i} | x_{l}^{'} - \hat{w_{l}} |}^{p})}^{\frac{1}{r}}

(22)

Power

(p, r)

is a distance measure of the distance between two vectors of length i. It is usually expressed as the physical distance of two locations in three-dimensional space, but can be applied to vectors of any dimension when the data type is numeric. The specific distance metric is determined according to the values of

p

and

r

. For example, if

p = r = 2

, Equation (22) becomes the Euclidean distance. If the calculated distance is greater than

δ

, it will become an abnormal point.

The newly acquired sample vector not only needs to calculate the distance from the Profile for anomaly detection but also has an impact on the Profile. The new data flow will evolve the old Basis into a new Basis, so the detection model will follow the data flow. The change affects the evolution of the Profile, the newly generated vector

T^{'} = x_{1}^{'}, x_{2}^{'}, \dots, x_{i}^{'}

, and a new Profile

L_{ϕ, w, d}^{'}

is generated according to Equation (23):

\arg \min L_{ϕ, w, b}^{'} = \underset{ϕ, w, b}{\arg \min} ‖ t^{(n + 1)} - \sum_{k = 1}^{n + 1} \sum_{l = 1}^{i + 1} w_{l}^{(k)} \cdot b_{l}^{(k)} ‖ + φ d (b^{(n + 1)})

(23)

Among them,

d (b^{(n + 1)})

is the basis generated by the newly generated vector

T^{'} = x_{1}^{'}, x_{2}^{'}, \dots, x_{i}^{'}

, and the parameter

φ

is the measure, which shows the degree of influence of the new data on the Profile.

The steps of the SPE algorithm are as follows:

Input: data stream

T_{f}

,

Δ L

,

δ

, p, r

Output:

B

Extract the characteristics of the data stream and apply the fingerprint: $T_{f} \to F_{p}$ ;
Generate TG according to $F_{p}$ : $F_{p} \to T G$ ;
Use CNN to extract the spatial features of the TG map: $T G \overset{C N N}{\to} H$ ;
Extract the temporal features of users according to LSTM:

$H \overset{L S T M}{\to} T = (t_{1}, t_{2}, \dots, t_{i})$

(24)
Calculate the weight $\hat{w_{l}}$ according to (13), and obtain the user’s Profile as:

$\sum_{l = 1}^{i} \hat{w_{l}} \cdot b_{l}$

(25)
Calculate the abnormal point according to the distance:

$D_{B, X^{'}} = {({\sum_{l = 1}^{i} | x_{l}^{'} - \hat{w_{l}} |}^{p})}^{\frac{1}{r}}$

(26)
Profile Evolution:

$\arg \min L_{ϕ, w, b}^{'}$

(27)

The specific implementation process of SPE algorithm is showed in Figure 7.

4. Experiment and Result Analysis

The experimental dataset is the network data captured in the actual environment through Wireshark [53]. The captured devices are two Android smart phones. Mobile phones are hooked up to video software such as Douyin (v4.2.0) and Tencent Video (v7.1.0.19832), news software programs such as Weibo (v9.4.0) and Toutiao (v7.1.3), communication software programs such as WeChat (v7.0.3) and QQ (v7.9.2), and life software programs such as Meituan (v9.11.802) and Ctrip (v8.1.0). We conducted long-term dataset collection in these running environments, and then cut specific time interval data packets from these datasets to obtain the dataset used in the experiment. The specific values of the dataset are shown in Table 2:

The Background is the number of data packets captured by the mobile phone without any application running, and is mainly the data sent by the system itself and pushed by some applications. As can be viewed from Table 2, 932 data packets are obtained after 15 min of rest, which is the same as the number of data packets with other applications running, and is negligible. Video, News, Communication, and Life are the traffic conditions obtained by running only video software, news software, communication software, and life software, respectively. Mix1, Mix2, and Mix3 are the mixed flow statuses obtained via running all the software programs of the mobile phone for 5, 10, and 15 min, respectively.

At the same time, in order to simulate the situation of receiving abnormal traffic during actual use, we make the above datasets into abnormal traffic datasets with abnormal characteristics by sending malicious code to the device that is grabbing traffic packets irregularly.

To evaluate the performance of the SPE algorithm, we conducted the following three parts of the experiment as follows:

The first part of the experiment is to study the performance of the TG generated by the structured network traffic. Consistency Degree (CD) is used to measure the stability and sensitivity of the TG. The similarity of the TG formed by the same type of network traffic is relatively high. The similarity of TGs formed by different types of network traffic is relatively low, as shown in Equation (28):

C D = \frac{({2 μ}_{a} μ_{b} + l_{1}) ({2 σ}_{a b} + l_{2})}{(μ_{a}^{2} + μ_{b}^{2} + l_{1}) (σ_{a}^{2} + σ_{b}^{2} + l_{2})}

(28)

where

μ_{a}

and

μ_{b}

are the mean values of the pixels of a and b in the graph,

σ_{a}^{2}

and

σ_{b}^{2}

are the variances of a and b,

σ_{a b}

is the covariance of a and b,

l_{1}

and

l_{2}

are the stabilization parameters, and scale is the range of pixels. Figure 8 shows the continuity of the datasets for Video, News, Communication, and Life, respectively. Each dataset generates a TG every 10 s to calculate the continuity of adjacent images. The X-axis represents the generated x-th TG.

According to the curve in Figure 8, the continuity of the TG of video traffic is not much different, and the curve fluctuation is relatively small because the data packets captured from video applications are relatively similar. News applications include pictures, meanwhile, and there are various types of data such as video and text, so the fluctuation range of the TG is relatively large and the continuity is relatively low, indicating that the Profile of network traffic has changed.

The second part is the analysis experiment of the TG generation Profile, which is compared with PCA [54] and Basic Evolution [55], two algorithms for generating traffic characteristics. Among them, PCA converts the original data into a set of linearly independent vectors by finding the eigenvectors of the matrix. Basic Evolution divides the data flow by time, uses the size of network traffic as the feature of anomaly analysis, and, finally, uses SVD for anomaly detection. PCA is a widely used feature extraction method, and Basic Evolution has been proposed and proven to have a lower error rate in recent years. Therefore, we chose to compare SPE with them through experiments.

This experiment uses a synthetic dataset [56] because it is difficult to perform artificial outlier analysis on real network traffic. Therefore, in the Mix3 dataset, points are randomly selected, and the order of the data packets is adjusted, or part of the data are replaced to obtain a synthetic dataset.

The detection probability of outliers is used as an evaluation criterion, as shown in (29), where the number of outliers is detected correctly.

D P_T P R = \frac{N_{T P}}{N_{a}}

(29)

The proportion of incorrectly detected non-abnormal points is used as another evaluation criterion, as shown in (30), where

N_{F P}

is the number of non-abnormal points detected by the algorithm and

N_{D P}

is the number of abnormal points detected by the algorithm:

D P_F P R = \frac{N_{F P}}{N_{D P}}

(30)

Figure 9 shows the comparative experimental results of the three algorithms PCA, Basic Evolution, and SPE.

Figure 9 suggests the outlier detection effects of the three algorithms PCA, Basic Evolution, and SPE. The detection probability of outliers and the proportion of non-outliers wrongly detected by the SPE algorithm are superior to the other two algorithms. With the increase in the number of outliers, the detection probability of outliers keeps increasing, while the proportion of non-outliers wrongly detected keeps decreasing. When the number of outliers reaches a certain size, the percentage of non-outliers varies very little between Basic Evolution and SPE, while the SPE algorithm continues to decrease, so the ability of the SPE algorithm to detect the percentage of non-outliers in error is far superior to the other two algorithms. Since the position of the abnormal point is random, the DP_TPR and DP_FPR of different abnormal points will change, which suggests that the abnormal position will affect the detection effect.

The third part is the influence of Profile Evolution parameters on outlier detection. A total of 100 outliers are selected and the value is between [0.1, 1]. The results of each experiment are measured using the value of DP_TPR. The results of the experiments are shown in Figure 10:

The result of anomaly detection is affected by the value of

φ

. According to the Equation (23),

d (b^{(n + 1)})

is the result generated by the new data stream, so the value

φ

determines the impact of the new data stream on the Profile. When the value

φ

is small, the impact of the new data stream on the Profile is relatively small, and when the value φ is large, the Profile will approximate the behavior of the new data stream. As shown in Figure 10, when

φ

= 0.6, the value of DP_TPR is the largest, and the effect is the best for the selected 100 outliers. In reality, the behavior of a new data stream is subject to change, and the value of

φ

should be selected according to the actual situation.

5. Conclusions

In this paper, we propose a method based on an SPE algorithm for better anomaly detection of network traffic. The TG is established through the network traffic structuring process so that the SPE algorithm can directly deal with the original data of the network traffic, generate the user’s Profile via CNN-LSTM, and use Power Distance to decide whether or not the user’s behavior is abnormal. Then, in accordance to the real-time acquired data, the new data always updates the Profile, which realizes the Evolutionary analysis of user behavior in order to obtain both the location of abnormal points and the entire abnormal detection of dynamic data flow. The results obtained from experiments using real traffic show that the SPE algorithm can directly process the original dataset, has the ability to continuously process dynamic data stream, and can detect anomalies in the data stream with better performance than PCA and Basic Evolution.

This study has proven that SPE can perform anomaly detection on a small-scale dynamic dataset. However, SPE only detects outliers and cannot analyze the types of outliers, and both CNN and LSTM have problems with high demand for computing resources, long training time, and difficulty in interpreting training and feature extraction results. They can lead to insufficient timeliness when dealing with dynamic real-time traffic difficulty in explaining the types of outliers. Meanwhile, it has not been utilized in a practical environment on a large scale, so we are still unable to determine its performance in this kind of environment. In the future, we will apply large-scale datasets to behavior in an in-depth way for research in real-world settings.

Author Contributions

Conceptualization, J.Y.; Methodology, J.Y.; Software, S.Z. and Y.T.; Formal analysis, J.Y. and L.T.; Investigation, S.Z., L.T. and Y.T.; Resources, J.Y. and L.T.; Data curation, S.Z.; Writing—original draft, S.Z.; Writing—review & editing, J.Y., L.T. and Y.T.; Funding acquisition, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are not publicly available due to the agreement with Esafenet.

Conflicts of Interest

The authors declare no conflict of interest.

References

Shafiq, M.; Tian, Z.; Bashir, A.K.; Du, X.; Guizani, M. CorrAUC: A Malicious Bot-IoT Traffic Detection Method in IoT Network Using Machine-Learning Techniques. IEEE Internet Things J. 2021, 8, 3242–3254. [Google Scholar] [CrossRef]
Kasim, Ö. An efficient and robust deep learning based network anomaly detection against distributed denial of service attacks. Comput. Networks 2020, 180, 107390. [Google Scholar] [CrossRef]
Ring, M.; Landes, D.; Hotho, A. Detection of slow port scans in flow-based network traffic. PLoS ONE 2018, 13, e0204507. [Google Scholar] [CrossRef]
Kurniabudi, K.; Purnama, B.; Sharipuddin, S.; Darmawijoyo, D.; Stiawan, D.; Samsuryadi, S.; Heryanto, A.; Budiarto, R. Network anomaly detection research: A survey. Indones. J. Electr. Eng. Inform. 2019, 7, 37–50. [Google Scholar]
Ahmed, M.; Mahmood, A.N.; Hu, J. A survey of network anomaly detection techniques. J. Netw. Comput. Appl. 2016, 60, 19–31. [Google Scholar] [CrossRef]
Wang, S.; Balarezo, J.F.; Kandeepan, S.; Al-Hourani, A.; Chavez, K.G.; Rubinstein, B. Machine Learning in Network Anomaly Detection: A Survey. IEEE Access 2021, 9, 152379–152396. [Google Scholar] [CrossRef]
Moustafa, N.; Hu, J.; Slay, J. A holistic review of Network Anomaly Detection Systems: A comprehensive survey. J. Netw. Comput. Appl. 2019, 128, 33–55. [Google Scholar] [CrossRef]
Kwon, D.; Kim, H.; Kim, J.; Suh, S.C.; Kim, I.; Kim, K.J. A survey of deep learning-based network anomaly detection. Clust. Comput. 2017, 22, 949–961. [Google Scholar] [CrossRef]
Fernandes, G.; Rodrigues, J.J.; Carvalho, L.F.; Al-Muhtadi, J.F.; Proença, M.L. A comprehensive survey on network anomaly detection. Telecommun. Syst. 2019, 70, 447–489. [Google Scholar] [CrossRef]
Boukerche, A.; Zheng, L.; Alfandi, O. Outlier detection: Methods, models, and classification. ACM Comput. Surv. 2020, 53, 1–37. [Google Scholar]
Wang, Z.; Han, D.; Li, M.; Liu, H.; Cui, M. The abnormal traffic detection scheme based on PCA and SSH. Connect. Sci. 2022, 34, 1201–1220. [Google Scholar] [CrossRef]
Patil, R.; Biradar, R.; Ravi, V.; Biradar, P.; Ghosh, U. Network traffic anomaly detection using PCA and BiGAN. Internet Technol. Lett. 2022, 5, e235. [Google Scholar] [CrossRef]
Ibrahim, J.; Gajin, S. Entropy-based network traffic anomaly classification method resilient to deception. Comput. Sci. Inf. Syst. 2022, 19, 87–116. [Google Scholar] [CrossRef]
Ren, H.; Ye, Z.; Li, Z. Anomaly detection based on a dynamic Markov model. Inf. Sci. 2017, 411, 52–65. [Google Scholar] [CrossRef]
Ji, S.-Y.; Jeong, B.K.; Kamhoua, C.; Leslie, N.; Jeong, D.H. Forecasting network events to estimate attack risk: Integration of wavelet transform and vector auto regression with exogenous variables. J. Netw. Comput. Appl. 2022, 203, 103392. [Google Scholar] [CrossRef]
Ning, D.; Hou, J.; Gong, Y.; Zhang, Z.; Sun, C. Auto-identification of engine fault acoustic signal through inverse trigonometric instantaneous frequency analysis. Adv. Mech. Eng. 2016, 8, 1687814016641840. [Google Scholar] [CrossRef] [Green Version]
Yu, Q.; Jibin, L.; Jiang, L. An Improved ARIMA-Based Traffic Anomaly Detection Algorithm for Wireless Sensor Networks. Int. J. Distrib. Sens. Netw. 2016, 12, 9653230. [Google Scholar] [CrossRef] [Green Version]
Yang, Q.; Hao, W.; Ge, L.; Ruan, W.; Chi, F. FARIMA model-based communication traffic anomaly detection in intelligent electric power substations. IET Cyber-Physical Syst. Theory Appl. 2019, 4, 22–29. [Google Scholar] [CrossRef]
Cao, Y.; Ji, R.; Huang, X.; Lei, G.; Shao, X.; You, I. Empirical Mode Decomposition-empowered Network Traffic Anomaly Detection for Secure Multipath TCP Communications. Mob. Netw. Appl. 2022, 27, 2254–2263. [Google Scholar] [CrossRef]
Ippoliti, D.; Jiang, C.; Ding, Z.; Zhou, X. Online Adaptive Anomaly Detection for Augmented Network Flows. ACM Trans. Auton. Adapt. Syst. 2016, 11, 1–28. [Google Scholar] [CrossRef]
Tong, D.; Prasanna, V.K. Sketch Acceleration on FPGA and its Applications in Network Anomaly Detection. IEEE Trans. Parallel Distrib. Syst. 2017, 29, 929–942. [Google Scholar] [CrossRef]
Pu, G.; Wang, L.; Shen, J.; Dong, F. A hybrid unsupervised clustering-based anomaly detection method. Tsinghua Sci. Technol. 2020, 26, 146–153. [Google Scholar] [CrossRef]
Baek, S.; Kwon, D.; Suh, S.C.; Kim, H.; Kim, I.; Kim, J. Clustering-based label estimation for network anomaly detection. Digit. Commun. Networks 2021, 7, 37–44. [Google Scholar] [CrossRef]
Jain, M.; Kaur, G.; Saxena, V. A K-Means clustering and SVM based hybrid concept drift detection technique for network anomaly detection. Expert Syst. Appl. 2022, 193, 116510. [Google Scholar] [CrossRef]
Hwang, R.-H.; Peng, M.-C.; Huang, C.-W.; Lin, P.-C.; Nguyen, V.-L. An Unsupervised Deep Learning Model for Early Network Traffic Anomaly Detection. IEEE Access 2020, 8, 30387–30399. [Google Scholar] [CrossRef]
Garg, S.; Batra, S. Fuzzified Cuckoo based Clustering Technique for Network Anomaly Detection. Comput. Electr. Eng. 2018, 71, 798–817. [Google Scholar] [CrossRef]
Amaouche, S.; Guezzaz, A.; Benkirane, S.; Azrour, M.; Khattak, S.B.A.; Farman, H.; Nasralla, M.M. FSCB-IDS: Feature Selection and Minority Class Balancing for Attacks Detection in VANETs. Appl. Sci. 2023, 13, 7488. [Google Scholar] [CrossRef]
Douiba, M.; Benkirane, S.; Guezzaz, A.; Azrour, M. Anomaly detection model based on gradient boosting and decision tree for IoT environments security. J. Reliab. Intell. Environ. 2022, 1–12. [Google Scholar] [CrossRef]
Sait, S.Y.; Bhandari, A.; Khare, S.; James, C.; Murthy, H.A. Multi-level anomaly detection: Relevance of big data analytics in networks. Sadhana 2015, 40, 1737–1767. [Google Scholar] [CrossRef] [Green Version]
Yang, L.; Hu, G.; Li, D.; Wang, Y.; Jia, B.; Pan, Z. Anomaly detection based on efficient Euclidean projection. Secur. Commun. Networks 2015, 8, 3229–3237. [Google Scholar] [CrossRef]
Qin, T.; Guan, X.; Li, W.; Wang, P.; Zhu, M. A new connection degree calculation and measurement method for large scale network monitoring. J. Netw. Comput. Appl. 2014, 41, 15–26. [Google Scholar] [CrossRef]
D’angelo, G.; Palmieri, F.; Ficco, M.; Rampone, S. An uncertainty-managing batch relevance-based approach to network anomaly detection. Appl. Soft Comput. 2015, 36, 408–418. [Google Scholar] [CrossRef]
Hamamoto, A.H.; Carvalho, L.F.; Sampaio, L.D.H.; Abrão, T.; Proença, M.L., Jr. Network anomaly detection system using genetic algorithm and fuzzy logic. Expert Syst. Appl. 2018, 92, 390–402. [Google Scholar]
Abbasi, B.; Calder, J.; Oberman, A.M. Anomaly Detection and Classification for Streaming Data using PDEs. SIAM J. Appl. Math. 2018, 78, 921–941. [Google Scholar] [CrossRef] [Green Version]
Han, D.; Bi, K.; Xie, B.; Huang, L.; Wang, R. An anomaly detection on the application-layer-based QoS in the cloud storage system. Comput. Sci. Inf. Syst. 2016, 13, 659–676. [Google Scholar] [CrossRef]
Feng, P.; Ma, J.; Sun, C. Selecting Critical Data Flows in Android Applications for Abnormal Behavior Detection. Mob. Inf. Syst. 2017, 2017, 7397812. [Google Scholar] [CrossRef] [Green Version]
Nevat, I.; Divakaran, D.M.; Nagarajan, S.G.; Zhang, P.; Su, L.; Ko, L.L.; Thing, V.L. Anomaly Detection and Attribution in Networks with Temporally Correlated Traffic. IEEE/ACM Trans. Netw. 2017, 26, 131–144. [Google Scholar] [CrossRef]
Drašar, M.; Vizváry, M.; Vykopal, J. Similarity as a central approach to flow-based anomaly detection. Int. J. Netw. Manag. 2014, 24, 318–336. [Google Scholar] [CrossRef]
Wang, W.; Guyet, T.; Quiniou, R.; Cordier, M.-O.; Masseglia, F.; Zhang, X. Autonomic intrusion detection: Adaptively detecting anomalies over unlabeled audit data streams in computer networks. Knowl.-Based Syst. 2014, 70, 103–117. [Google Scholar] [CrossRef]
Vieira, T.P.; Tenório, D.F.; da Costa, J.P.C.; de Freitas, E.P.; Del Galdo, G.; de Sousa Júnior, R.T. Model order selection and eigen similarity based framework for detection and identification of network attacks. J. Netw. Comput. Appl. 2017, 90, 26–41. [Google Scholar] [CrossRef]
Bi, M.; Xu, J.; Wang, M.; Zhou, F. Anomaly detection model of user behavior based on principal component analysis. J. Ambient. Intell. Humaniz. Comput. 2016, 7, 547–554. [Google Scholar] [CrossRef]
Ding, M.; Tian, H. PCA-based network Traffic anomaly detection. Tsinghua Sci. Technol. 2016, 21, 500–509. [Google Scholar] [CrossRef]
Chen, Q.; Dong, M. Detection and Adaptive Video Processing of Hyperopia Scene in Sports Video. Complexity 2021, 2021, 6610760. [Google Scholar] [CrossRef]
Wellem, T.; Lai, Y.-K.; Huang, C.-Y.; Chung, W.-Y. A Flexible Sketch-Based Network Traffic Monitoring Infrastructure. IEEE Access 2019, 7, 92476–92498. [Google Scholar] [CrossRef]
Xiao, Y.; Xing, C.; Zhang, T.; Zhao, Z. An Intrusion Detection Model Based on Feature Reduction and Convolutional Neural Networks. IEEE Access 2019, 7, 42210–42219. [Google Scholar] [CrossRef]
Liu, H.; Kadir, A.; Liu, J. Keyed Hash Function Using Hyper Chaotic System with Time-Varying Parameters Perturbation. IEEE Access 2019, 7, 37211–37219. [Google Scholar] [CrossRef]
Ma, Q.; Sun, C.; Cui, B.; Jin, X. A novel model for anomaly detection in network traffic based on kernel support vector machine. Comput. Secur. 2021, 104, 102215. [Google Scholar] [CrossRef]
Zubaroğlu, A.; Atalay, V. Data stream clustering: A review. Artif. Intell. Rev. 2021, 54, 1201–1236. [Google Scholar] [CrossRef]
ElSayed, M.S.; Le-Khac, N.-A.; Albahar, M.A.; Jurcut, A. A novel hybrid model for intrusion detection systems in SDNs based on CNN and a new regularization technique. J. Netw. Comput. Appl. 2021, 191, 103160. [Google Scholar] [CrossRef]
Bi, J.; Zhang, X.; Yuan, H.; Zhang, J.; Zhou, M. A Hybrid Prediction Method for Realistic Network Traffic with Temporal Convolutional Network and LSTM. IEEE Trans. Autom. Sci. Eng. 2021, 19, 1869–1879. [Google Scholar] [CrossRef]
Subba, B.; Gupta, P. A tfidfvectorizer and singular value decomposition based host intrusion detection system framework for detecting anomalous system processes. Comput. Secur. 2021, 100, 102084. [Google Scholar] [CrossRef]
Carrera, F.; Dentamaro, V.; Galantucci, S.; Iannacone, A.; Impedovo, D.; Pirlo, G. Combining Unsupervised Approaches for Near Real-Time Network Traffic Anomaly Detection. Appl. Sci. 2022, 12, 1759. [Google Scholar] [CrossRef]
Fang, L.; Li, Y.; Liu, Z.; Yin, C.; Li, M.; Cao, Z.J. A Practical Model Based on Anomaly Detection for Protecting Medical IoT Control Services Against External Attacks. IEEE Trans. Ind. Inform. 2020, 17, 4260–4269. [Google Scholar] [CrossRef]
Pérez-Bueno, F.; García, L.; Maciá-Fernández, G.; Molina, R. Leveraging a Probabilistic PCA Model to Understand the Multivariate Statistical Network Monitoring Framework for Network Security Anomaly Detection. IEEE/ACM Trans. Netw. 2022, 30, 1217–1229. [Google Scholar] [CrossRef]
Xia, H.; Fang, B.; Roughan, M.; Cho, K.; Tune, P. A BasisEvolution framework for network traffic anomaly detection. Comput. Netw. 2018, 135, 15–31. [Google Scholar] [CrossRef]
Luo, M.; Wang, K.; Cai, Z.; Liu, A.; Li, Y.; Cheang, C.F. Using Imbalanced Triangle Synthetic Data for Machine Learning Anomaly Detection. Comput. Mater. Contin. 2019, 58, 15–26. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Motion detection models.

Figure 2. Sketch data stream summarization.

Figure 3. Traffic graph mapping.

Figure 4. Traffic Graph Grayscale.

Figure 5. Convolutional Neural Network.

Figure 6. Long Short-Term Memory Network.

Figure 7. SPE algorithm.

Figure 8. Consistency Degree comparison of Traffic Graph in a different dataset.

Figure 9. Outlier detection comparison.

Figure 10. Different φ comparison.

Table 1. The Comparison of Existing Recent Network Anomaly Detection Algorithms.

Algorithm	ML/EL Method	Advantages	Disadvantages
PCSS [11]	PCA, SSH	Superior to other detection models in detection speed and accuracy	It cannot meet the detection requirements of new network abnormal traffic and has poor scalability. It may even be incorrectly classified as a training dataset.
Framework [12]	PCA, BiGAN	Further enhance the performance of the BiGAN model	Cannot improve the feature engineering by auto-generation of meaningful derived features and find ways to interpret anomaly score
Entropy-based [13]	Entropy	More feasible for practical implementation and general use	Unsupervised machine learning, with no training required; needs further research
ARIMA-based [17]	ARIMA	Have better anomaly detection accuracy	Need to reduce the false alarm rate
MPTCP-EMD [19]	EMD	Improve the robustness of MPTCP transport systems	EMD decomposition will produce modal aliasing
Online Adaptive [20]	SVM	Maintain high accuracy without the need for offline training	Cannot adaptively tune itself to meet performance goals and constraints
Unsupervised Clustering-Based [22]	SSC, OCSVM	The method performs better than some of the existing techniques	Does not develop an effective feature selection method and implement the parallelization of the algorithm
Clustering-based label estimation [23]	Naive Bayes, Adaboosting, SVM, RF	Improve the quality of estimated labels	There should be an error in the label estimation process based on clustering
D-PACK [25]	CNN, Autoencoder	Consume much less flow pre-processing time and detection time, speed up the detection	The proper configuration cannot be automatically calculated. Unable to balance the factor of acceptable training time and still gain high classification performance, the DL-based classification approach is highly susceptible to the data poisoning attack
SMOTE [27]	RF classifier	Great performance for intrusion detection in VANETs	The fast mobility of nodes in VANETs and the dynamic changes in network typology pose challenges for intrusion detection
Catboost in NIDS [28]	Catboost	Make high and more robust performance with low cost in time	Lack of trust and reliability between Fog nodes
SPE	CNN, LSTM	The network traffic can directly process the original dataset during anomaly detection and can anomaly-detect dynamic data streams	This study is only an experiment on a small-scale dataset and has not been applied to a practical environment on a large scale

Table 2. Dataset.

Name	Nature Category	Time/min	Number of Packets
Background	Background traffic	15	932
Video	Video traffic	10	230,211
News	News traffic	10	278,386
Communication	Communication traffic	10	147,823
Life	Life traffic	10	152,538
Mix1	Mixed traffic	5	67,986
Mix2	Mixed traffic	10	108,726
Mix3	Mixed traffic	15	180,578

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yi, J.; Zhang, S.; Tan, L.; Tian, Y. A Network Traffic Abnormal Detection Method: Sketch-Based Profile Evolution. Appl. Sci. 2023, 13, 9087. https://doi.org/10.3390/app13169087

AMA Style

Yi J, Zhang S, Tan L, Tian Y. A Network Traffic Abnormal Detection Method: Sketch-Based Profile Evolution. Applied Sciences. 2023; 13(16):9087. https://doi.org/10.3390/app13169087

Chicago/Turabian Style

Yi, Junkai, Shuo Zhang, Lingling Tan, and Yongbo Tian. 2023. "A Network Traffic Abnormal Detection Method: Sketch-Based Profile Evolution" Applied Sciences 13, no. 16: 9087. https://doi.org/10.3390/app13169087

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Network Traffic Abnormal Detection Method: Sketch-Based Profile Evolution

Abstract

1. Introduction

2. Related Works

3. Sketch-Based Profile Evolution for Network Traffic Abnormal Detection

3.1. Detection Model

3.2. Sketch-Based Network Traffic Structuring Process

3.3. Network Spatiotemporal Feature Extraction Based on CNN-LSTM

3.4. Behavior Pattern Detection Method Based on Profile Evolution

4. Experiment and Result Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI