TSFN: A Novel Malicious Traffic Classification Method Using BERT and LSTM
Abstract
:1. Introduction
2. Related Work
2.1. Malicious Traffic Classification
2.2. Pre-Training Models
3. Methodology
3.1. Model Architecture
3.2. Datasets
3.3. Data Preprocess
4. Experiment
4.1. Experiment Setting
4.2. Evaluation Metrics
4.3. Effect of Different Network Traffic Representations
4.4. Effect of Sequence Length and the Numbers of Num_Layers
4.5. Comparison with Different Methods
4.5.1. The Benchmark Methods
4.5.2. Experimental Result
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhang, Z.; Han, X.; Liu, Z.; Jiang, X.; Sun, M.; Liu, Q. ERNIE: Enhanced language representation with informative entities. arXiv 2019, arXiv:1905.07129. [Google Scholar]
- Bader, O.; Lichy, A.; Hajaj, C.; Dubin, R.; Dvir, A. MalDIST: From Encrypted Traffic Classification to Malware Traffic Detection and Classification. In Proceedings of the 2022 IEEE 19th Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 8–11 January 2022; pp. 527–533. [Google Scholar]
- Wang, W.; Zhu, M.; Wang, J.; Zeng, X.; Yang, Z. End-to-end encrypted traffic classification with one-dimensional convolution neural networks. In Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), Beijing, China, 22–24 July 2017; pp. 43–48. [Google Scholar]
- Lin, X.; Xiong, G.; Gou, G.; Li, Z.; Shi, J.; Yu, J. ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 633–642. [Google Scholar]
- Wang, W.; Zhu, M.; Zeng, X.; Ye, X.; Sheng, Y. Malware traffic classification using convolutional neural network for representation learning. In Proceedings of the 2017 IEEE International Conference on Information Networking (ICOIN), Da Nang, Vietnam, 11–13 January 2017; pp. 712–717. [Google Scholar]
- Lin, P.C.; Lin, Y.D.; Lai, Y.C.; Lee, T.H. Using string matching for deep packet inspection. Computer 2008, 41, 23–28. [Google Scholar] [CrossRef]
- van Ede, T.; Bortolameotti, R.; Continella, A.; Ren, J.; Dubois, D.J.; Lindorfer, M.; Choffnes, D.; van Steen, M.; Peter, A. Flowprint: Semi-supervised mobile-app fingerprinting on encrypted network traffic. In Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 23–26 February 2020; Volume 27. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Shi, Z.; Luktarhan, N.; Song, Y.; Tian, G. BFCN: A Novel Classification Method of Encrypted Traffic Based on BERT and CNN. Electronics 2023, 12, 516. [Google Scholar] [CrossRef]
- Qi, Y.; Xu, L.; Yang, B.; Xue, Y.; Li, J. Packet classification algorithms: From theory to practice. In Proceedings of the IEEE INFOCOM 2009, Rio de Janeiro, Brazil, 19–25 April 2009; pp. 648–656. [Google Scholar]
- Madhukar, A.; Williamson, C. A longitudinal study of P2P traffic classification. In Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation, Monterey, CA, USA, 11–14 September 2006; pp. 179–188. [Google Scholar]
- Taylor, V.F.; Spolaor, R.; Conti, M.; Martinovic, I. Robust smartphone app identification via encrypted network traffic analysis. IEEE Trans. Inf. Forensics Secur. 2017, 13, 63–78. [Google Scholar] [CrossRef]
- Al-Naami, K.; Chandra, S.; Mustafa, A.; Khan, L.; Lin, Z.; Hamlen, K.; Thuraisingham, B. Adaptive encrypted traffic fingerprinting with bi-directional dependence. In Proceedings of the 32nd Annual Conference on Computer Security Applications, Los Angeles, CA, USA, 5–9 December 2016; pp. 177–188. [Google Scholar]
- Sirinam, P.; Imani, M.; Juarez, M.; Wright, M. Deep fingerprinting: Undermining website fingerprinting defenses with deep learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 1928–1943. [Google Scholar]
- Liu, C.; He, L.; Xiong, G.; Cao, Z.; Li, Z. Fs-net: A flow sequence network for encrypted traffic classification. In Proceedings of the IEEE INFOCOM 2019-IEEE Conference On Computer Communications, Rabat, Morocco, 12–19 April 2019; pp. 1171–1179. [Google Scholar]
- Lotfollahi, M.; Jafari Siavoshani, M.; Shirali Hossein Zade, R.; Saberian, M. Deep packet: A novel approach for encrypted traffic classification using deep learning. Soft Comput. 2020, 24, 1999–2012. [Google Scholar] [CrossRef]
- Lin, K.; Xu, X.; Gao, H. TSCRNN: A novel classification scheme of encrypted traffic based on flow spatiotemporal features for efficient management of IIoT. Comput. Netw. 2021, 190, 107974. [Google Scholar] [CrossRef]
- Sinha, J.; Manollas, M. Efficient deep CNN-BiLSTM model for network intrusion detection. In Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition, Online, 26–28 August 2020; pp. 223–231. [Google Scholar]
- Khan, M.A. HCRNNIDS: Hybrid convolutional recurrent neural network-based network intrusion detection system. Processes 2021, 9, 834. [Google Scholar] [CrossRef]
- Shieh, C.S.; Nguyen, T.T.; Horng, M.F. Detection of Unknown DDoS Attack Using Convolutional Neural Networks Featuring Geometrical Metric. Mathematics 2023, 11, 2145. [Google Scholar] [CrossRef]
- Sengupta, S.; Ganguly, N.; De, P.; Chakraborty, S. Exploiting diversity in android tls implementations for mobile app traffic classification. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 1657–1668. [Google Scholar]
- He, H.Y.; Yang, Z.G.; Chen, X.N. PERT: Payload encoding representation from transformer for encrypted traffic classification. In Proceedings of the 2020 IEEE ITU Kaleidoscope: Industry-Driven Digital Transformation (ITU K), Online, 7–11 December 2020; pp. 1–8. [Google Scholar]
- Viji, D.; Revathy, S. A hybrid approach of Weighted Fine-Tuned BERT extraction with deep Siamese Bi–LSTM model for semantic text similarity identification. Multimed. Tools Appl. 2022, 81, 6131–6157. [Google Scholar] [CrossRef] [PubMed]
- Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar]
- Moustafa, N.; Slay, J. The evaluation of Network Anomaly Detection Systems: Statistical analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf. Secur. J. Glob. Perspect. 2016, 25, 18–31. [Google Scholar] [CrossRef]
- Zhao, Z.; Chen, H.; Zhang, J.; Zhao, X.; Liu, T.; Lu, W.; Chen, X.; Deng, H.; Ju, Q.; Du, X. UER: An Open-Source Toolkit for Pre-training Models. arXiv 2019, arXiv:1909.05658. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Liu, C.; Wang, W.; Wang, M.; Lv, F.; Konan, M. An efficient instance selection algorithm to reconstruct training set for support vector machine. Knowl.-Based Syst. 2017, 116, 58–73. [Google Scholar] [CrossRef]
- Panchenko, A.; Lanze, F.; Pennekamp, J.; Engel, T.; Zinnen, A.; Henze, M.; Wehrle, K. Website Fingerprinting at Internet Scale. In Proceedings of the NDSS, San Diego, CA, USA, 21–24 February 2016. [Google Scholar]
- Hayes, J.; Danezis, G. k-fingerprinting: A robust scalable website fingerprinting technique. In Proceedings of the 25th USENIX Security Symposium (USENIX Security 16), Austin, TX, USA, 10–12 August 2016; pp. 1187–1203. [Google Scholar]
- Shen, M.; Zhang, J.; Zhu, L.; Xu, K.; Du, X. Accurate decentralized application identification via encrypted traffic analysis using graph neural networks. IEEE Trans. Inf. Forensics Secur. 2021, 16, 2367–2380. [Google Scholar] [CrossRef]
Traffic Type | Application |
---|---|
Malicious traffic | Htbot, CridexNeris, Nsis-ay, Shifu, Virut, Zeus, Tinba, Miuref, Geodo |
Normal traffic | Outlook, BitTorrent, FTP, Warcraft, MySQL, Skype, Facetime, SMB, Weibo, Gmail |
Label | Content |
---|---|
0 | 021a 1ac5 c502 0200 0000 0002 021a 1ac5 c501 0100 0000 0008 0800 0045 4500 0000 0091 9134 3419 1940 4000 0020 2006 0653 53af af01 0101 01be be9a 9a01 0102 0212 1202 02aa aaba ba01 01bb bbbe be57 57d1 d122 22c8 c853 53cd cd9c 9c80 8018 189e 9e60 60c3 c3e3 e300 0000 0001 0101 0108 080a 0a11 11f5 f594 94b9 b923 2325 2537 |
1 | 021a 1ac5 c502 0200 0000 0002 021a 1ac5 c501 0100 0000 0008 0800 0045 4500 0001 0112 1201 016e 6e40 4000 0020 2011 1198 98ac ac01 0101 015e 5e67 6701 0102 025f 5f57 5740 4013 1340 4013 1300 00fe fe1f 1f34 3490 9068 688d 8da2 a257 5732 32af af27 27e4 e49a 9aa8 a8f4 f444 4400 0000 0002 0228 28f5 f527 2773 7300 0000 0000 |
2 | 021a 1ac5 c502 0200 0000 0002 021a 1ac5 c501 0100 0000 0008 0800 0045 4500 0005 05b1 b10d 0ddf df40 4000 0020 2006 0617 175c 5c01 0101 015b 5b9f 9f01 0102 02d2 d26a 6ad7 d726 2601 01bb bbb7 b755 5520 2010 10d1 d186 8615 15fd fd80 8018 189e 9e60 6033 33d4 d400 0000 0001 0101 0108 080a 0a11 11f7 f71e 1e54 5423 2326 2638 |
3 | 021a 1ac5 c501 0100 0000 0002 021a 1ac5 c502 0200 0000 0008 0800 0045 4500 0000 005e 5e28 2858 5840 4000 0020 2006 06fe fe8c 8c01 0102 02b2 b259 5901 0101 017f 7f59 5900 0015 15b7 b719 19e7 e778 78b4 b400 00e1 e1bd bd05 05e2 e280 8018 1843 43e0 e033 337e 7e00 0000 0001 0101 0108 080a 0ad7 d7f9 f9b9 b95f 5fc6 c671 7188 |
Type | Method |
---|---|
Fingerprint construction approach | FlowPrint [7] |
Statistical feature approached | AppScanner [12], CUMUL [29], BIND [13], k-fingerprinting (K-fp) [30] |
Deep learning approaches | Deep Fingerprinting (DF) [14], FS-Net [15], GraphDApp [31], TSCRNN [17], Deeppacket [16], wang [5] |
pre-training approaches | PERT [22], ET-BERT (flow) [4], ET-BERT (packet) [4], BFCN [9] |
Method | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
AppScanner [12] | 89.54 | 89.84 | 89.68 | 88.92 |
CUMUL [29] | 56.75 | 61.71 | 57.38 | 55.13 |
BIND [13] | 84.57 | 86.81 | 83.82 | 83.96 |
FlowPrint [7] | 81.46 | 65.34 | 70.02 | 65.73 |
DF [14] | 77.87 | 78.86 | 78.19 | 75.93 |
FS-Net [15] | 88.46 | 88.46 | 89.20 | 88.40 |
GraphDApp [31] | 87.89 | 82.26 | 82.60 | 82.34 |
TSCRNN [17] | N/A | 98.70 | 98.60 | 98.70 |
Deeppacket [16] | 96.40 | 96.50 | 96.31 | 96.41 |
wang [5] | 99.17 | 99.20 | 99.23 | 99.21 |
PERT [22] | 99.09 | 99.11 | 99.10 | 99.11 |
ET-BERT (packet) [4] | 99.15 | 99.16 | 99.16 | 99.16 |
ET-BERT (flow) [4] | 99.29 | 99.30 | 99.30 | 99.30 |
BFCN [9] | 99.39 | 99.41 | 99.40 | 99.40 |
Proposed | 99.49 | 99.51 | 99.50 | 99.50 |
ET-BERT [4] | Proposed | Wang [5] | |||||||
---|---|---|---|---|---|---|---|---|---|
Class | Precision | Recall | F1-Score | Precision | Recall | F1-Score | Precision | Recall | F1-Score |
BitTorrent | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
Cridex | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
Facetime | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
FTP | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
Geodo | 100 | 99.9 | 99.9 | 100 | 100 | 100 | 100 | 100 | 100 |
Gmail | 98.4 | 99.3 | 98.8 | 98 | 100 | 99.0 | 98 | 100 | 99 |
Htbot | 99.8 | 100 | 99.9 | 100 | 100 | 100 | 100 | 100 | 100 |
Miuref | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
MySQL | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
Neris | 96.3 | 92.9 | 94.6 | 97.9 | 94.0 | 95.9 | 94 | 94 | 94 |
Nsis-ay | 99.8 | 99.0 | 99.4 | 100 | 98.0 | 99.0 | 100 | 98 | 99 |
Outlook | 99.2 | 98.1 | 98.7 | 100 | 100 | 100 | 100 | 100 | 100 |
Shifu | 99.9 | 99.9 | 99.9 | 100 | 100 | 100 | 100 | 100 | 100 |
Skype | 99.8 | 100 | 99.9 | 100 | 100 | 100 | 100 | 100 | 100 |
SMB | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
Tinba | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
Virut | 90.7 | 95.6 | 93.1 | 94.3 | 100 | 97.1 | 94.1 | 96 | 95 |
Warcraft | 100 | 99.9 | 99.9 | 100 | 100 | 100 | 100 | 100 | 100 |
Weibot | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 98 | 99 |
Zeus | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 | 100 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Shi, Z.; Luktarhan, N.; Song, Y.; Yin, H. TSFN: A Novel Malicious Traffic Classification Method Using BERT and LSTM. Entropy 2023, 25, 821. https://doi.org/10.3390/e25050821
Shi Z, Luktarhan N, Song Y, Yin H. TSFN: A Novel Malicious Traffic Classification Method Using BERT and LSTM. Entropy. 2023; 25(5):821. https://doi.org/10.3390/e25050821
Chicago/Turabian StyleShi, Zhaolei, Nurbol Luktarhan, Yangyang Song, and Huixin Yin. 2023. "TSFN: A Novel Malicious Traffic Classification Method Using BERT and LSTM" Entropy 25, no. 5: 821. https://doi.org/10.3390/e25050821
APA StyleShi, Z., Luktarhan, N., Song, Y., & Yin, H. (2023). TSFN: A Novel Malicious Traffic Classification Method Using BERT and LSTM. Entropy, 25(5), 821. https://doi.org/10.3390/e25050821