The Attention-Based Autoencoder for Network Traffic Classification with Interpretable Feature Representation
Abstract
:1. Introduction
- Extracting global structural features: To preserve the original spatial structure features of network flows and address the difficulty in extracting global spatial structures, we utilize the spatial attention mechanism. By dynamically learning features at each position in the grayscale image, relevant attention weights are obtained, reflecting the importance of spatial information of network flow bytes at specified positions. This approach captures the global structural features of the entire grayscale image. Subsequently, these attention weights are applied to the original network flow matrix to enhance relevant useful features and diminish irrelevant ones.
- Resolving the issue of generating different FlowSpectrum intervals by the model: We introduce a self-attention mechanism and autoencoder model structure. A single-channel attention module is added to the model to simultaneously generate channel attention weights and spatial attention weights, thereby producing blended attention weights for application in the self-attention mechanism.
- The primary contributions of this chapter are as follows: A proposal of a “global + local” structural feature capture scheme to comprehensively extract features from encrypted flows. Initially, we capture global spatial structural features of original bytes through the design of spatial domain attention and single-channel attention mechanisms. Data weighed by global structural features undergo dimensionality reduction via an autoencoder module. Simultaneously, CNN convolutional layers are integrated within the autoencoder to capture local structural features.
- Proposing the AMAE model for generating interpretable representations of encrypted flow features: By leveraging the self-attention mechanism, we blend spatial attention weights and single-channel attention weights onto the original byte feature matrix, followed by dimensional analysis. Consequently, the desired encrypted FlowSpectrum and its attention weights are derived. Through the FlowSpectrum, distinct ranges of encrypted flows within the coordinate system can be observed clearly. Analysis of attention weight values elucidates specific reasons for the varying intervals of flow spectrum lines generated by the model.
- Utilizing encrypted FlowSpectrum for encrypted flow classification, enhancing the accuracy of encrypted flow classification: This section employs the ISCX-VPN2016 dataset. Initially, the AMAE model is trained using the training set to obtain the FlowSpectrum, followed by the classification of the test set. Experimental validation demonstrates the excellent performance of the AMAE model in encrypted traffic classification. Furthermore, it achieves symmetry between the differences in FlowSpectrum intervals and attention weight values.
2. Related Work
2.1. Machine Learning-Based Methods
2.2. Deep Learning-Based Methods
2.3. FlowSpectrum
3. Methodology
3.1. Input Type
- (1)
- Packet segmentation.
- (2)
- Session filtering.
- (3)
- IDX file generation.
3.2. FlowSpectrum Theory
3.3. Frame Design
4. Experiments Result and Discussion
4.1. Experiment Setup
- (1)
- Dataset.
- (2)
- Analysis and comparison of scheme design.
- (3)
- Experimental tools.
- (4)
- Evaluation indicators.
4.2. Parameter
- (1)
- Alleviating the dying neuron problem: Training the model with the PreLU function ensures that even if the input data contains negative values, the corresponding neurons will still be activated, thereby avoiding neuron death. If ReLU is used, negative inputs are directly set to zero, which may cause neurons to remain inactive during training, thus affecting the model’s learning ability.
- (2)
- Increasing model expressiveness: PreLU introduces a learnable parameter that can dynamically adjust the shape of the activation function based on the data’s characteristics. This enhances the model’s flexibility and expressiveness when dealing with complex data distributions.
- (3)
- Mitigating the vanishing gradient problem: With ReLU, the gradients for negative inputs are zero, which may lead to the vanishing gradient problem, especially in deep networks. However, with PreLU, the gradients for negative inputs also propagate, helping alleviate the vanishing gradient problem.
- (4)
- Improved robustness: PreLU exhibits robustness to outliers. ReLU is highly sensitive to negative outliers, as setting them to zero may lead to information loss. PreLU allows the negative part to retain some information, making the model more robust to outliers.
4.3. Analysis of Interpretability Characterization of Features
4.4. Comparison and Analysis of Classification Tests
5. Conclusions and Future Work
5.1. Conclusions
5.2. Future Work
- (1)
- Multidimensional characterization of FlowSpectrum.
- (2)
- The interpretability of FlowSpectrum.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Azab, A.; Khasawneh, M.; Alrabaee, S.; Choo, K.-K.R.; Sarsour, M. Network traffic classification: Techniques, datasets, and challenges. Digit. Commun. Netw. 2022. [Google Scholar] [CrossRef]
- Karagiannis, T.; Broido, A.; Faloutsos, M.; Claffy, K. Transport layer identification of P2P traffic. In Proceedings of the Fourth ACM SIGCOMM Conference on Internet Measurement, Sicily, Italy, 25–27 October 2004; pp. 121–134. [Google Scholar]
- Tahaei, H.; Afifi, F.; Asemi, A.; Zaki, F.; Anuar, N.B. The rise of traffic classification in IoT networks: A survey. J. Netw. Comput. Appl. 2020, 154, 102538. [Google Scholar] [CrossRef]
- Wang, Y.; Xiang, Y.; Yu, S.Z. Automatic application signature construction from unknown traffic. In Proceedings of the IEEE International Conference on Advanced Information Networking and Applications, Perth, Australia, 20–23 April 2010; pp. 1115–1120. [Google Scholar]
- Gai, K.; Qiu, M.; Zhao, H. Privacy-preserving data encryption strategy for big data in mobile cloud computing. In Proceedings of the IEEE Transactions on Big Data, Seattle, WA, USA, 10–13 December 2018; pp. 678–688. [Google Scholar]
- Dong, Y.N.; Zhao, J.J.; Jin, J. Novel feature selection and classification of internet video traffic based on a hierarchical scheme. Comput. Netw. 2017, 119, 102–111. [Google Scholar] [CrossRef]
- Govindarajan, M.; Chandrasekaran, R.M. Intrusion detection using k-Nearest Neighbor. In Proceedings of the 2009 First International Conference on Advanced Computing, Chennai, India, 13–15 December 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 13–20. [Google Scholar]
- Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
- Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Peng, H.; Li, J.; He, Y.; Liu, Y.; Bao, M.; Wang, L.; Yang, Q. Large-scale hierarchical text classification with recursively regularized deep graph-cnn. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 1063–1072. [Google Scholar]
- Yao, L.; Mao, C.; Luo, Y. Graph convolutional networks for text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 7370–7377. [Google Scholar]
- Bayat, N.; Jackson, W.; Liu, D. Deep learning for network traffic classification. arXiv preprint 2021, arXiv:2106.12693. [Google Scholar]
- Krupski, J.; Graniszewski, W.; Iwanowski, M. Data transformation schemes for cnn-based network traffic analysis: A survey. Electronics 2021, 10, 2042. [Google Scholar] [CrossRef]
- Ren, X.; Gu, H.; Wei, W. Tree-RNN: Tree structural recurrent neural network for network traffic classification. Expert Syst. Appl. 2021, 167, 114363. [Google Scholar] [CrossRef]
- D’Angelo, G.; Palmieri, F. Network traffic classification using deep convolutional recurrent autoencoder neural networks for spatial–temporal features extraction. J. Netw. Comput. Appl. 2021, 173, 102890. [Google Scholar] [CrossRef]
- Yao, H.; Liu, C.; Zhang, P.; Wu, S.; Jiang, C.; Yu, S. Identification of encrypted traffic through attention mechanism based long short term memory. IEEE Trans. Big Data 2019, 8, 241–252. [Google Scholar] [CrossRef]
- Yang, L.; Fu, S.; Zhang, X.; Guo, S.; Wang, Y.; Yang, C. FlowSpectrum: A concrete characterization scheme of network traffic behavior for anomaly detection. World Wide Web 2022, 25, 2139–2161. [Google Scholar] [CrossRef]
- Guo, S.; Lü, R.; He, M.; Zhang, J.; Yu, S. Application of Flow Spectrum Theory in Network Defense. J. Beijing Univ. Posts Telecommun. 2022, 45, 19–25. [Google Scholar]
- Guo, S.; Wang, X.; He, M. Research on Intelligent Monitoring Technology in Cyberspace Adversarial Defense. Inf. Secur. Commun. Priv. 2021, 11, 79–94. [Google Scholar]
- Moore, A.W.; Zuev, D. Internet traffic classification using bayesian analysis techniques. In Proceedings of the 2005 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Banff, AB, Canada, 6–10 June 2003; pp. 50–60. [Google Scholar]
- Velan, P.; Čermák, M.; Čeleda, P.; Drašar, M. A survey of methods for encrypted traffic classification and analysis. Int. J. Netw. Manag. 2015, 25, 355–374. [Google Scholar] [CrossRef]
- Vlăduţu, A.; Comăneci, D.; Dobre, C. Internet traffic classification based on flows’ statistical properties with machine learning. Int. J. Netw. Manag. 2017, 27, e1929. [Google Scholar] [CrossRef]
- Pacheco, F.; Exposito, E.; Gineste, M.; Baudoin, C.; Aguilar, J. Towards the deployment of machine learning solutions in network traffic classification: A systematic survey. IEEE Commun. Surv. Tutor. 2018, 21, 1988–2014. [Google Scholar] [CrossRef]
- Dusi, M.; Este, A.; Gringoli, F.; Salgarelli, L. Using GMM and SVM-based techniques for the classification of SSH-encrypted traffic. In Proceedings of the 2009 IEEE International Conference on Communications, Dresden, Germany, 14–18 June 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 1–6. [Google Scholar]
- Lashkari, A.H.; Gil, G.D.; Mamun, M.S.I.; Ghorbani, A.A. Characterization of tor traffic using time based features. In Proceedings of the International Conference on Information Systems Security and Privacy, Porto, Portugal, 19–21 February 2017; SciTePress: Setubal, Portugal, 2017; Volume 2, pp. 253–262. [Google Scholar]
- Gil, G.D.; Lashkari, A.H.; Mamun, M.; Ghorbani, A.A. Characterization of encrypted and VPN traffic using time-related features. In Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP 2016), Rome, Italy, 19–21 February 2016; SciTePress: Setubal, Portugal, 2016; pp. 407–414. [Google Scholar]
- Zong, W.; Chow, Y.W.; Susilo, W. A 3d approach for the visualization of network intrusion detection data. In Proceedings of the 2018 International Conference on Cyberworlds (CW), Singapore, 3–5 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 308–315. [Google Scholar]
- Imran, H.M.; Abdullah, A.B.; Hussain, M.; Palaniappan, S.; Ahmad, I. Intrusions detection based on optimum features subset and efficient dataset selection. Int. J. Eng. Innov. Technol. 2012, 2, 265–270. [Google Scholar]
- Santos, A.C.F.; da Silva, J.D.S.; de Sá Silva, L.; da Costa Sene, M.P. Network traffic characterization based on time series analysis and computational intelligence. J. Comput. Interdiscip. Sci. 2011, 2, 197–205. [Google Scholar]
- Wang, W.; Zhu, M.; Wang, J.; Zeng, X.; Yang, Z. End-to-end encrypted traffic classification with one-dimensional convolution neural networks. In Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), Beijing, China, 22–24 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 43–48. [Google Scholar]
- Zeng, Y.; Gu, H.; Wei, W.; Guo, Y. Deep-Full-Range: A deep learning based network encrypted traffic classification and intrusion detection framework. IEEE Access 2019, 7, 45182–45190. [Google Scholar] [CrossRef]
- Wang, W.; Sheng, Y.; Wang, J.; Zeng, X.; Ye, X.; Huang, Y.; Zhu, M. HAST-IDS: Learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection. IEEE Access 2017, 6, 1792–1806. [Google Scholar] [CrossRef]
- Dai, J.; Xu, X.; Xiao, F. Glads: A global-local attention data selection model for multimodal multitask encrypted traffic classification of iot. Comput. Netw. 2023, 225, 109652. [Google Scholar] [CrossRef]
- Lotfollahi, M.; Jafari Siavoshani, M.; Shirali Hossein Zade, R.; Saberian, M. Deep packet: A novel approach for encrypted traffic classification using deep learning. Soft Comput. 2020, 24, 1999–2012. [Google Scholar] [CrossRef]
- Höchst, J.; Baumgärtner, L.; Hollick, M.; Freisleben, B. Unsupervised traffic flow classification using a neural autoencoder. In Proceedings of the 2017 IEEE 42Nd Conference on Local Computer Networks (LCN), Singapore, 9–12 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 523–526. [Google Scholar]
- Ferreira, D.C.; Vázquez, F.I.; Zseby, T. Extreme dimensionality reduction for network attack visualization with autoencoders. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–10. [Google Scholar]
- Javaid, A.; Niyaz, Q.; Sun, W. A deep learning approach for network intrusion detection system. In Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (Formerly BIONETICS), Utrecht, The Netherlands, 28–30 June 2016; pp. 21–26. [Google Scholar]
- Xie, G.; Li, Q.; Jiang, Y. Self-attentive deep learning method for online traffic classification and its interpretability. Comput. Netw. 2021, 196, 108267. [Google Scholar] [CrossRef]
- Cui, J.; Bai, L.; Li, G.; Zeng, O. Semi-2DCAE: A semi-supervision 2D-CNN AutoEncoder model for feature representation and classification of encrypted traffic. PeerJ Comput. Sci. 2023, 9, e1635. [Google Scholar] [CrossRef]
- Wang, W.; Zhu, M.; Zeng, X.; Ye, X. Malware traffic classification using convolutional neural network for representation learning. In Proceedings of the 2017 International Conference on Information Networking (ICOIN), Da Nang, Vietnam, 11–13 January 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 712–717. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef]
- Draper-Gil, G.; Lashkari, A.H.; Mamun, M.S.I.; Ghorbani, A.A. Characterization of encrypted and vpn traffic using time-related. In Proceedings of the 2nd International Conference on Information Systems Security and Privacy (ICISSP), Rome, Italy, 19–21 February 2016; pp. 407–414. [Google Scholar]
Module | Layer | Setting |
---|---|---|
Self-attention | -- | |
Single channel attention | Avgpool | -- |
Maxpool | -- | |
Spatial attention | Avgpool | -- |
Maxpool | -- | |
Conv2D | kernel_size = 1,#kernels = 1 | |
Prelu | -- | |
Autoencoder | Conv2D | kernel_size = 3,#kernels = 64 |
Conv2D | kernel_size = 3,#kernels = 32 | |
Prelu | -- | |
Maxpool | pool_size = 2 |
Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
1D-CNN | 76.1 ± 0.014% | 81.4 ± 0.056% | 80.8 ± 0.006% | 79.4 ± 0.039% |
CNN+RNN | 89.6 ± 0.073% | 91.8 ± 0.037% | 90.9 ± 0.035% | 91.2 ± 0.063% |
Semi-AE | 81.4 ± 0.083% | 83.2 ± 0.067% | 81.4 ± 0.083% | 79.5 ± 0.005% |
AE | 97.3 ± 0.003% | 97.6 ± 0.076% | 97.3 ± 0.003% | 97.2 ± 0.082% |
SAAE | 99.9 ± 0.098% | 99.9 ± 0.098% | 99.9 ± 0.082% | 99.9 ± 0.097% |
Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
1D-CNN | 91.4 ± 0.060% | 85.8 ± 0.067% | 82.1 ± 0.078% | 83.3 ± 0.051% |
CNN+RNN | 98.5 ± 0.063% | 96.8 ± 0.052% | 96.9 ± 0.085% | 96.8 ± 0.034% |
Semi-AE | 91.2 ± 0.067% | 92.1 ± 0.080% | 91.2 ± 0.067% | 91.1 ± 0.030% |
AE | 98.5 ± 0.063% | 96.8 ± 0.052% | 96.9 ± 0.085% | 96.8 ± 0.034% |
SAAE | 99.7 ± 0.066% | 99.7 ± 0.066% | 99.7 ± 0.066% | 99.7 ± 0.066% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cui, J.; Bai, L.; Zhang, X.; Lin, Z.; Liu, Q. The Attention-Based Autoencoder for Network Traffic Classification with Interpretable Feature Representation. Symmetry 2024, 16, 589. https://doi.org/10.3390/sym16050589
Cui J, Bai L, Zhang X, Lin Z, Liu Q. The Attention-Based Autoencoder for Network Traffic Classification with Interpretable Feature Representation. Symmetry. 2024; 16(5):589. https://doi.org/10.3390/sym16050589
Chicago/Turabian StyleCui, Jun, Longkun Bai, Xiaofeng Zhang, Zhigui Lin, and Qi Liu. 2024. "The Attention-Based Autoencoder for Network Traffic Classification with Interpretable Feature Representation" Symmetry 16, no. 5: 589. https://doi.org/10.3390/sym16050589