1. Introduction
With the rapid development of the Internet, network traffic has shown explosive growth, in which the existence of abnormal traffic poses serious threats to network security; network traffic anomaly detection technology is currently receiving widespread attention and in-depth research [
1]. Abnormal traffic detection technology is designed to monitor network traffic in real time and identify abnormal behaviors, such as distributed denial of service (DDoS) attacks, port scanning, worm propagation, etc., in order to take appropriate protective measures to protect the normal operation of the network. Therefore, the network traffic anomaly detection technology, which can effectively monitor and identify these abnormal behaviors, has become a research hotspot in the field of network security.
In traditional network architecture, the control plane and data plane are tightly coupled and integrated into dedicated devices, which limits the flexibility of network management and service innovation ability. In traditional networks, the data plane and the control plane are integrated into the network device. The control plane is responsible for maintaining the routing table of the switch and determining the best path to send network packets. The data plane is responsible for forwarding packets according to instructions given by the control plane. In Software Defined Networking (SDN), control functions are stripped out and centralized in the software-based SDN controller, while the underlying network devices only need to accept and execute instructions from the controller. The basic architecture of SDN consists of three layers and two interfaces. The application layer communicates with the SDN controller by calling the API. The SDN controller communicates with the network switch on the data plane through the OpenFlow protocol to realize the forward control of network traffic. OpenFlow is a mainstream protocol for southbound interfaces. The data plane is made up of many network switches that are only responsible for the forwarding and exchange of packets. The network switch communicates with the controller through the OpenFlow protocol to perform specific processing or forwarding operations on packets according to the controller’s instructions.
SDN has been widely used in data centers and cloud computing due to its centralized control and flexible programmability [
2]. However, the decoupling of the control plane and data plane of the SDN architecture makes it a key target for cyberattacks. Network attacks on the SDN are mainly launched on the forwarding plane and control plane. Among the possible attacks on the SDN control plane and forwarding plane, DoS attacks take the most forms and are easiest to organize. A large amount of malicious traffic is generated to overload the SDN controller or switch, resulting in an overload of the control channel, which seriously affects the quality of service, and even makes the entire network unavailable. Because the forwarding plane is directly connected to external users, attacks are more diverse, such as switch hijacking, SDN scanning, address resolution protocol (ARP) attacks, and virus attacks. For example, in an ARP attack, the attacker sends a large number of ARP packets containing the incorrect mapping between MAC addresses and IP addresses. As a result, the switch cannot store the correct MAC address information and therefore cannot implement normal forwarding services. On the control plane, attackers can directly launch resource-consuming attacks on network controllers and send a large number of Packet-in messages to block the processing queue of the controller or even cause the entire network to crash. Attackers can also launch a variety of attacks by forging North–South interface conversation messages, such as DDoS attacks, black hole attacks, malicious insertion of flow rules, and so on. Because of its centralized control, SDN sometimes faces more complex attack challenges than traditional networks.
In the SDN environment, the traditional anomaly detection methods mainly rely on rule-based matching and statistical analysis techniques. These methods have many limitations when dealing with increasingly complex and changeable attack methods. Rule-based matching methods usually rely on predefined attack signature databases to identify abnormal behaviors by matching network traffic with known attack patterns. However, as attack methods continue to evolve, new attacks may not conform to established rules, making it difficult for these methods to detect emerging threats in a timely and accurate manner. The statistical analysis method establishes the normal behavior model of network traffic and detects the abnormal situation that deviates from the model. These methods include techniques based on threshold setting, statistical modeling, etc. However, in practical applications, frequent fluctuations in normal flow can lead to high false positive rates and reduce the reliability of detection.
With the rapid advancements in deep learning, remarkable progress has been achieved in fields such as image recognition and natural language processing, presenting new opportunities for network anomaly detection. Deep learning models can automatically learn complex features in network traffic data without manually designing feature extractors, and they have stronger adaptability and generalization ability. Currently, deep learning-based network anomaly traffic detection methods can be primarily categorized into those based on convolutional neural network (CNN) [
3], recurrent neural network (RNN) [
4], and their variants, such as long short-term memory (LSTM) [
5] and gated recurrent units (GRU) [
6], as well as hybrid models. CNN is capable of extracting complex features from network traffic data without the need for manually designed feature extractors, providing more adaptive and generalizable capabilities. Most CNN models are used to extract spatial features of network traffic data, but they are weak in modeling time series; RNNs and their variants, although able to capture time series dependencies, are prone to the problem of gradient vanishing or gradient explosion when dealing with long sequences, and they have a slow training speed. To overcome these limitations, some studies have proposed hybrid models to improve detection performance by combining the advantages of different models. However, these hybrid models still have some complexity in feature extraction and model training, and there is still room for further improvement in the convergence speed and generalization ability of the models.
However, one of the key challenges in anomaly detection, especially in encrypted traffic scenarios, is the effective extraction of representative features from raw network data. Traditional methods that rely on payload inspection become ineffective when traffic is encrypted, necessitating the exploration of alternative feature extraction approaches. In network anomaly detection, features such as packet length distribution, inter-arrival times, flow duration, and statistical properties of network sessions can provide valuable insights even when payload data are unavailable. The effectiveness of any machine learning or deep learning model heavily depends on the quality of these extracted features, as poorly chosen or redundant features may lead to high false positive rates and degraded detection performance. Therefore, advanced feature engineering techniques, including automated feature selection and representation learning through deep models, have become essential for improving detection accuracy and robustness.
In recent years, Temporal Convolutional Network (TCN) [
7] has emerged in the fields of time series prediction and speech recognition due to its unique sequence modeling capability, which builds a deep network by stacking causal convolutional layers and dilated convolutional layers and combines the efficient parallel computation advantage of convolutional neural networks with the long-range temporal sequence modeling capability of recurrent neural networks. In the field of anomaly detection, the layer expansion mechanism of TCN can effectively capture multi-scale timing features, and its fixed-length history window mechanism avoids the gradient propagation problem of traditional RNN. These characteristics enable TCN to demonstrate unique advantages in network traffic timing analysis.
To address the above challenges, this paper proposes a method for anomalous traffic detection featuring packet length sequences, based on TCN model to learn the features. The TCN model employs a convolutional network to process time-series data, modeling temporal dependencies through causal convolution (ensuring that predictions at the current time step are not affected by future time steps) and dilated convolution (expanding the sensory field) to model temporal dependencies. The approach aims to improve the accuracy and efficiency of detection, reduce the false alarm rate, and enhance the generalization ability of the model to better cope with complex network environments and variable attack methods.
This paper presents several significant contributions:
In this study, TCN is applied to the work of SDN traffic anomaly detection. The TCN model is able to effectively capture the long-range dependencies of the packet length sequences, and at the same time, it utilizes parallel computing to improve the training efficiency and overcome the inefficient training and gradient vanishing problems of the RNN model.
Different from the methods that rely on the traditional stream statistics features, this paper proposes that using the packet length sequence as the core feature representation can reduce the feature dimension while preserving the key features of the attack behavior and improve the detection accuracy. The five-tuple grouping policy (source IP address, destination IP address, source port, destination port, and protocol) is used to optimize the feature extraction mode, enhance the expression capability of traffic behavior features, and improve classification accuracy.
Experiments on public InSDN dataset show that the proposed method achieves high accuracy in normal traffic and malicious traffic classification tasks, and that it is superior to the baseline method in detection and computation efficiency.
In summary, the anomaly detection framework proposed in this study provides a way for network anomaly detection. In the development process of 5G and subsequent technologies, the framework has shown potential application value in many fields, such as industrial Internet of Things [
8], intelligent mission-critical services [
9], network function virtualization [
10] and virtualized network slicing environment [
11]. The practical application effect of the framework in 5G and future technologies will be further explored and verified in the future.
The rest of the paper is organized as follows. We present related work in
Section 2. The methodology proposed in this paper is described in
Section 3. The experiments are synthesized in
Section 4. Finally, we conclude the paper in
Section 5.
2. Related Work
Aiming at the threat of anomalous traffic attacks faced in SDN architectures, in order to meet the challenges of the complex network attack methods and the dynamic traffic patterns, researchers at home and abroad have proposed a variety of detection methods. Currently, the research on SDN anomalous traffic detection methods mainly focuses on the following aspects: detection techniques based on traditional statistical methods, which realize anomaly identification by analyzing the differences in the statistical distribution of traffic feature parameters. Machine learning methods show strong adaptability, and typical algorithms include supervised learning models such as Support Vector Machine (SVM), Decision Tree and K-Nearest Neighbor (KNN). With the breakthrough of deep learning technology, neural network-based detection methods have gradually become the mainstream of research, which can be categorized into supervised learning, unsupervised learning, and semi-supervised learning.
2.1. SDN
SDN, as a new network architecture, separates network control and data forwarding. The control plane is responsible for network policy control and resource scheduling for the forwarding plane, while the forwarding plane forwards data according to the dynamic policies of the control plane, thus realizing flexible control of network traffic. The controller sends flow table rules to the switch through southbound interfaces (e.g., OpenFlow, NETCONF), and provides network state query and policy invocation functions for third-party applications through northbound interfaces (e.g., REST APIs). The basic SDN architecture consists of three planes and two interfaces as shown in
Figure 1. The three planes are the application plane, the control plane, and the data plane, and the two interfaces are the southbound and northbound interfaces. The application plane mainly realizes load balancing, traffic monitoring, security protection, and so on. The control plane is responsible for issuing and updating routing and forwarding rules. The data plane is responsible for packet forwarding and switching.
However, the wide adoption of SDN also brings new security issues. In the application plane, malicious applications can implement covert attacks through API vulnerabilities in the northbound interface. By constructing illegal network policy requests, the attacker injects false flow rules into the controller, thus realizing network topology tampering, traffic hijacking, and other invasive behaviors. Such attacks are highly stealthy, as they can pass through the authentication mechanism disguised as normal control commands. Most of the security threats in the control plane are aimed at paralyzing the controller or affecting the interaction between the controller and the switch, so the attacker mainly overloads the computing resources and network links of the control plane as a means to realize the attack on the controller. The attack vectors in the data plane focus on resource overloading attacks on switching devices. This type of flooding attack can lead to the degradation of switches into traditional Layer 2 devices, undermining the centralized control advantages of SDN.
2.2. Traditional Anomalous Traffic Detection Methods
Traditional abnormal traffic detection methods mainly rely on predefined rules and statistical analysis techniques. The rule-based method identifies abnormal traffic by setting a series of fixed rules, such as determining whether the traffic is abnormal based on specific port numbers, protocol types, packet sizes and other characteristics. The method is simple to implement and fast to detect, but its drawbacks are also very obvious, i.e., the formulation of rules needs to rely on expert knowledge, and it is difficult to cover all possible anomalies, and it is weak in detecting new types of attacks and unknown threats.
Methods based on statistical analysis have been applied in the field of abnormal traffic detection since the 1990s [
12]. Statistical analysis methods, on the other hand, model normal traffic by analyzing the statistical characteristics of network traffic, such as the arrival rate of packets, the average packet size, and the flow duration, etc., and when the detected traffic differs significantly from the normal model, it is considered to be an abnormal flow. However, statistical analysis methods are susceptible to noise and normal traffic fluctuations in the face of complex network environments and variable traffic patterns, leading to a decrease in detection accuracy and an increase in false alarms. He et al. [
13] proposed a DDoS attack defense scheme, SDCC, which integrates bandwidth detection and data flow detection techniques, and employs a confidence-based filtering (CBF) method to calculate the CBF score of the packet. If the CBF score of a packet is below a specific threshold, the packet is determined to be an attack packet. The algorithm is simple to compute, but it needs to be constantly updated and optimized in the emerging attacks.
2.3. Machine Learning-Based Anomaly Traffic Detection Methods
Machine learning-based detection methods are also widely used in SDN traffic anomaly detection [
14,
15,
16], but these methods usually rely on accurate feature engineering support. For example, Tayfour [
17] proposed a Voting-based Non-Kernel Density Estimation (V-NKDE) classifier that combines a voting mechanism with four classical machine learning methods—Naive Bayes, KNN [
18], Decision Tree [
19], and Extremely Randomized Trees, by classifying the SDN traffic feature data and combining with the voting mechanism for the final result determination. This method can effectively reduce the false alarm rate and the risk of excess, thus improving the accuracy and robustness of traffic anomaly detection. Satheesh et al. [
20] proposed a machine-learning based anomaly detection model, which categorizes packets by analyzing in detail the packet’s information, including the source IP, the destination IP, the port number, and other features of the packet. The model utilizes the SDN controller to adjust the forwarding rules in the flow table to block the malicious traffic in time and stop its further propagation. Sebbar et al. [
21] proposed a security model based on the Random Forest algorithm, which identifies the determination of whether an attack is an attack or not by pre-establishing the security policy and the Time To Live (TTL) delay criteria. Ali et al. [
22] proposed a blockchain-based intelligent link failure recovery framework for software-defined Internet of Things (SD-IoT) environments. The article innovatively adopts the TOPSIS (optimal solution rule) module for link failure recovery, which integrates multiple quality metrics such as latency, jitter, and bandwidth in alternative path selection, rather than being limited to the shortest path. In addition, in order to enhance the security of IoT systems, the research combines blockchain technology and Artificial Neural Network (ANN) to achieve distributed DDoS attack detection and defense. The blockchain ensures data immutability, while the ANN implements an efficient DDoS attack identification and defense mechanism by analyzing attack patterns in network traffic. The framework not only enhances the efficiency of link recovery but also improves the system’s ability to protect against DDoS attacks, providing strong support for the stable operation of SD-IoT. Wei et al. [
23] proposed a hybrid deep learning based DDoS attack detection and classification method. This method combines two deep learning models, the Auto-Encoder (AE) and the Multi-Layer Perceptron (MLP). First, feature extraction is performed using AE, which maps the high-dimensional features of network traffic samples to the low-dimensional space through unsupervised learning, from which the most discriminative features are extracted to effectively reduce noise and irrelevant features. Subsequently, the compressed features extracted by AE are input into the MLP model for the classification of DDoS attack types. The experimental results show that the AE-MLP model indicates high accuracy and robustness in the detection and classification of DDoS attacks.
2.4. Deep Learning Based Anomaly Traffic Detection Methods
In the field of anomalous traffic detection, many researchers have widely adopted a variety of deep learning models to improve detection effectiveness. CNN-based methods can effectively identify local patterns and structural features in traffic by mapping network traffic data to a multidimensional space and using convolutional layers to extract spatial features of the data. However, CNNs have certain limitations in processing time series data and are difficult to capture long-term dependencies in the data. RNNs and their variants (e.g., LSTM, GRU), on the other hand, are specialized in processing time series data and are able to memorize previous information and use it for current prediction, thus capturing dynamic changes in time series. However, RNN is prone to the problem of gradient vanishing or gradient explosion when dealing with long sequences, resulting in models that are difficult and slow to train. In addition, some researchers and scholars have proposed methods such as AE networks [
24], deep belief networks (DBN) [
25], and generative adversarial networks (GAN) [
26].
In order to overcome the shortcomings of the above single model, some studies have proposed hybrid models, and although better detection results have been achieved, these hybrid models still have some complexity in feature extraction and model training, and the convergence speed and generalization ability of the model still need to be further improved. Elsayed et al. [
27] proposed a hybrid method based on CNN and LSTM, which firstly structured the detection data, then extracted spatiotemporal features through CNN and LSTM, and finally used Softmax to complete the detection. Wei et al. [
28] proposed a two-branch feature extraction network, which uses CNN and RNN, respectively, to extract spatial and temporal features of the data. The advantage of this method is that it does not need to extract traffic features manually and is able to learn complex patterns in the data automatically. By combining the dual advantages of CNN and RNN, the method is not only able to capture the spatial features of the data but also can effectively handle time series data; however, there is a problem of poor model generalization. Bai et al. [
29] proposed to use bidirectional LSTM to learn data features, determine anomalies through a classifier, and deploy idle edge nodes in the Internet of Things to increase the flexibility of detection. To overcome the limitations of single-dimensional feature extraction in dealing with complex cyber attack scenarios, Lin et al. [
30] proposed a detection method based on multilevel feature fusion. The method employs a bidirectional long short-term memory network (BiLSTM) and a CNN, which are used to extract spatial, temporal, and byte features of the traffic data, respectively, and further fuses these multidimensional features. With this fusion strategy, the method is able to capture the intrinsic characteristics of network traffic more comprehensively, thus realizing more accurate anomaly detection in complex network environments. Liu et al. [
31] proposed a HTTPS traffic detection based on the Bidirectional Gating Recurrent Unit (BiGRU) and the Attention Mechanism method. The method extracts the forward and backward features of byte sequences in a session through BiGRU, which in turn captures the temporal dependency of the data. In addition, the introduction of the attention mechanism enables the model to assign different weights to different features, thus enhancing the model’s focus on key features. However, the method is highly sensitive to hyper-parameters, which not only increases the complexity of hyper-parameter setting but also requires a large number of experiments to determine the optimal parameter combinations, thus restricting the model’s practicability and generalization to a certain extent. Luo et al. [
32] proposed a model based on Recombination Generative Adversarial Network (RGAN). The RGAN-based intrusion detection method optimizes the generator and discriminator through two-stage adversarial learning to enhance the recognition of minority attack samples. The minority class attack samples are first generated by combining the self-attention mechanism and GAN, and the features of the traffic data are extracted by using GRUs and CNNs. The false alarm rate is then reduced by introducing the reconstruction loss to further improve the detection performance for minority class samples. The method shows good detection results on the CSE-CIC-IDS2018 dataset.
3. Proposed Method
In this paper, we propose a TCN based SDN abnormal traffic detection method, as shown in
Figure 2. Through lightweight feature design and efficient time series modeling, accurate identification of abnormal traffic in dynamic network environment is realized. At the feature engineering level, the packet length sequence is used as the core feature, combined with the five-element group grouping strategy, which reduces the computational complexity of traditional multi-dimensional feature extraction and retains the key dynamic information of traffic behavior. As for the model architecture, through the synergistic mechanism of causal convolution and dilation convolution, long-term dependencies are captured under the premise of ensuring temporal causality, which solves the problems of low training efficiency and gradient vanishing caused by sequence dependency in traditional RNN-like models. Meanwhile, the introduction of the residual connection structure further optimizes the stability of gradient propagation in the deep network, which significantly improves the convergence speed and generalization ability of the model. During the training process, the cross-entropy loss function and Adam optimizer are used, combined with dropout regularization technique to effectively suppress the overfitting phenomenon.
3.1. Data Preprocessing
Before abnormal traffic detection, raw network traffic data needs to be preprocessed to extract useful feature information. Packet length, as a key feature of network traffic, can effectively reflect the size and transmission of packets, which is closely related to network load and transmission efficiency. Specifically, packet length not only reveals the size of data transmission; the arrival frequency of packets can reflect the active degree of the network, with high-frequency packet arrivals indicating more frequent data transmission in the network, while low-frequency packet arrivals may indicate that the network transmission is more sparse; and lastly, a sudden change in packet length is often a sign of unexpected behavior in the network, such as DDoS attacks and port scans and port scanning, etc. Anomalous traffic usually leads to significant changes in packet lengths.
In this paper, the packet length sequence is chosen as the network traffic feature, and the specific steps are to extract the network traffic data from the PCAP file, group the packets according to the five-tuple, and extract the packets belonging to the same five-tuple to form multiple packet sequences. Each packet sequence contains a certain number of packets, and the length of the packet sequences is adjusted according to the actual demand and the characteristics of the dataset. The data preprocessing flowchart is shown in
Figure 3. We tested packet sequences of different lengths. We found that when the sequence length was set to no more than 750, the model demonstrated the best performance in capturing the key information of network traffic dynamic behavior and detecting abnormal traffic. If the sequence length is too short, the model cannot obtain sufficient traffic information, which will affect the detection accuracy of abnormal traffic. On the other hand, when the sequence length is too long, the model will introduce excessive redundant information. Therefore, we choose the length of the sequence to be no more than 750, which ensures that the sequence contains enough information and does not lead to the sequence being too long and increase the computational complexity.
3.2. TCN Model
TCN is a deep learning model specialized in processing time series data. The core idea of TCN is to capture long-term dependencies in time series by CNN. Compared with the traditional RNN, TCN has better parallel computing capability and fewer parameters, thus showing higher efficiency and better performance in processing long sequence data.
The structure of TCN mainly consists of an input layer, multiple residual blocks and an output layer. Each residual block contains multiple convolutional layers, which enhance the model’s learning capability through residual connection. In addition, TCN introduces dilated convolution, which expands the sensory field by inserting voids in the convolution kernel to capture longer temporal dependencies without increasing the computational effort. The structure of the TCN convolutional network is shown in
Figure 4, which ensures that the future information in the time series does not affect the current prediction by adopting convolutional method with interval sampling, thus maintaining the principle of causality in time series analysis. Compared with traditional convolutional methods, TCN is able to span a certain step size when performing convolutional operations, so that a wider perceptual range and capture dependencies at more distant time points can be obtained even if the output size remains constant. This design allows TCN to be more efficient in analyzing time-series data and to better understand and predict long-term dependency patterns.
In
Figure 5, we present the TCN model for SDN anomalous traffic detection. The model is designed to efficiently handle long time series data, and the main structure includes residual blocks and dilated causal convolutional layers. The model structure in the figure employs multiple layers of residual blocks to ensure that the training of the deep network is more stable and easy to optimize. The inputs in the figure are processed through three residual modules, and the residual blocks are jump-connected after each layer of convolution to avoid the common problem of gradient vanishing in deep networks and to speed up convergence. Dilated causal convolution allows the network to capture more complex temporal dependencies by expanding the sensory field, while causality ensures that the output depends only on the current moment and previous moment data. Finally, the output of the model is processed through a 1 × 1 convolutional layer to further map the features to the desired output.
The causal convolutional layer is the basis of the TCN model, which employs causal convolutional operations to ensure that only information from previous moments is used in predicting the output at the current moment, thus maintaining the sequential nature of the time series. The causal convolutional layer extracts local features in the sequence through convolutional operations of the convolution kernel with the input sequence and passes these local features to the next layer. The output of the causal convolutional layer can be expressed as
where
is the output of the causal convolutional layer at moment
,
is the activation function,
is the convolutional kernel weight,
is the bias term,
denotes the convolution operation, and
is the value of the input sequence at moment
. The activation function used in this paper is
ReLU, which is a non-saturated activation function that is unsaturated at positive numbers and hard saturated at negative numbers. Since there are only linear relations in the
ReLU function, it converges faster and is a commonly used activation function in CNNs. The function of
ReLU is shown in Equation (2).
The dilation convolution layer introduces the concept of dilation rate on the basis of the causal convolution layer and expands the sensory field of the model through the dilation convolution operation, so that the model can capture the dependencies in a longer time range. The dilation rate of the dilation convolution layer determines the time interval covered by the convolution kernel; the larger the dilation rate, the wider the time range covered by the convolution kernel, and the larger the model’s sense field. The output of the dilation convolutional layer can be expressed as:
where
denotes the dilation convolution operation and
is the dilation rate.
In order to alleviate the gradient vanishing problem and improve the training stability of the model, residual connections are introduced into the TCN model. The residual connection adds the input directly to the output so that the model can better convey the gradient information during the training process and avoid the gradient vanishing problem due to the deep structure. The output of residual connection can be expressed as
where
is the output of the main part of the TCN model,
is the input, and
is the output after residual linking.
3.3. Model Training and Optimization
In the process of model training, the selection of appropriate loss functions and optimization algorithms is crucial. These elements directly affect the training efficiency, convergence speed, and generalization ability of the model. The loss function is used to measure the difference between the model’s predicted output and the true label, while the optimization algorithm gradually reduces the loss function by adjusting the model’s parameters so that the model continuously approaches the optimal solution on the training data. In order to ensure that the model can converge effectively during the training process and has strong generalization ability, this paper adopts the cross-entropy loss function and Adam optimization algorithm and introduces the dropout technique to reduce the overfitting phenomenon.
The cross-entropy loss function is commonly used in classification tasks, and it can effectively measure the difference between the model output and the true label. Setting the true label as
and the predicted value of the model as
, the cross-entropy loss function
can be expressed as:
where
denotes the number of samples,
is the true label of the ith sample, and
is the predicted probability of the model. This loss function pushes the model to output more accurate predictions by penalizing the model for larger errors in incorrect predictions.
During the training process, the optimization algorithm is responsible for updating the model parameters through a back-propagation algorithm. The Adam (Adaptive Moment Estimation) optimization algorithm combines the advantages of the RMSprop and Momentum optimization algorithms, and it is able to adaptively adjust the learning rate to improve the training efficiency. Specifically, the Adam optimization algorithm uses first-order moments (mean) and second-order moments (variance) to calculate the adaptive learning rate for each parameter. Adam’s update formula is as follows:
where
and
denote the estimates of the first-order and second-order moments, respectively,
and
are the decay rates,
is the learning rate, and
is a constant that prevents division-by-zero errors. By combining the information of the first-order moments and second-order moments, Adam is able to better control the direction and magnitude of the gradient update, which enables the model parameters to converge quickly and stably during the training process.
In order to prevent the model from overfitting during the training process, this paper introduces the dropout technique. Dropout is a regularization method that reduces the model’s overdependence on some specific neurons by randomly discarding a portion of the neurons in the neural network during the training process, thus improving the model’s generalization ability. Specifically, in each training iteration, dropout sets the output of some neurons to zero with probability , thus breaking the co-adaptation relationship between neurons. Usually, the application of dropout can effectively prevent overfitting and improve the robustness of the model.
5. Conclusions
With the wide application of SDN centralized architecture, the network security threats it faces show a trend of complexity and concealment. Traditional anomalous traffic detection methods face the bottleneck of high false alarm rate and insufficient generalization capability in a SDN dynamic environment due to their reliance on manual rule design and difficulty in capturing dynamic timing features. To address this challenge, this paper proposes a TCN-based anomalous traffic detection method for SDN, which achieves efficient detection of anomalous traffic through packet length sequence modeling and multi-level timing feature extraction. Compared with existing research, this paper introduces the TCN model into the SDN security domain and utilizes the synergy of its causal convolution and dilation convolution to effectively capture the long-term dependency of traffic data while ensuring the temporal causality. Gradient propagation is optimized through the residual connection mechanism, which significantly improves the training stability and convergence efficiency of the deep network. At the feature design level, it is proposed to take the packet length sequence as the core feature, combined with the five-element group grouping strategy, which reduces the complexity of feature engineering while retaining the key information of traffic dynamic behavior. Experiments based on the publicly available dataset InSDN validation show that the proposed method achieves high detection accuracy on standard datasets. The advantages of a TCN model, such as parallel computation, fast training speed, and relatively small number of model parameters make it highly practical in real-world applications of abnormal network traffic detection.
The innovations of this study are mainly reflected in: the parallel computing capability and long-range temporal capture characteristics of the TCN model overcomes the low training efficiency and gradient vanishing problem of the traditional RNN-like models; the lightweight feature design reveals the dynamic behavioral characteristics of sudden flooding of anomalous traffic and low-frequency long connections by a single-dimensional sequence of packet lengths; and the model’s advantage of a low number of parameters in resource-constrained scenarios offers feasibility for edge network deployment.
However, there are still limitations in the current work. First, the experimental validation in this study is primarily based on the InSDN dataset, which is a simulated dataset. While it effectively demonstrates the feasibility of our approach, further validation using real-world SDN traffic datasets and heterogeneous architectures is required to ensure the model’s generalizability. Future research will aim to evaluate the proposed method on diverse datasets, including real-world encrypted traffic scenarios, to enhance its robustness across different SDN environments.
Another important limitation is the need for more advanced feature extraction techniques to improve anomaly detection performance, particularly in encrypted traffic scenarios. Traditional packet length sequences, while effective in capturing temporal traffic patterns, may not provide sufficient discriminatory power when dealing with encrypted communication or protocol obfuscation. Feature extraction plays a crucial role in network anomaly detection by enabling the identification of key behavioral patterns without relying on packet payload analysis. Advanced techniques, such as flow-based statistical analysis, entropy-based metrics, and graph-based representations can enhance detection capabilities by capturing higher-order dependencies within traffic flows. Furthermore, deep learning-based feature extraction methods, such as attention mechanisms and autoencoders, can learn hierarchical representations of network traffic, improving the adaptability of models to evolving attack patterns. In future research, integrating multi-modal feature extraction could provide a more comprehensive understanding of anomalous behaviors, leading to more robust and generalizable detection models.
Third, real-time feasibility remains a challenge. While TCN’s parallel computation offers an advantage in efficiency, achieving low-latency detection in high-speed SDN environments requires further optimization. Future work will focus on introducing knowledge distillation and quantization compression techniques to optimize inference speed while maintaining detection accuracy. Additionally, incremental learning mechanisms will be explored to dynamically adapt the model to new attack patterns, reducing reliance on frequent retraining.
Finally, cross-domain collaborative detection will be investigated by integrating logs from northbound interfaces and state data from the control plane to provide a more holistic view of network security. By combining multi-modal feature fusion, real-time optimization, incremental learning, and cross-domain analysis, future research aims to construct a more robust and adaptive SDN security protection system, providing both theoretical support and practical implementation for intelligent network defense.