Unsupervised Anomaly Detection and Explanation in Network Traffic with Transformers
Abstract
:1. Introduction
1.1. Motivation
1.2. Related Work
1.3. Paper’s Contribution and Organization
- We outline a transformer-based autoencoder with feature- and self-attention mechanisms to reconstruct discrete and continuous information from network packet sequences;
- We perform unsupervised detection of malicious network packets with transformer-based autoencoders in network packet sequences;
- We offer sequential attention perturbation method for explaining the detection of benign and malicious network packets;
- We perform the evaluation of the explanation results by comparison with benchmark methods and expert-based true explanations.
2. T-NAE: Transformer-Based Network Traffic Autoencoder
2.1. Preprocessing of Discrete and Continuous Network Packet Information
- We reduce the extracted packet structures to a combination of common network layers (e.g., Ethernet, IP, TCP, HTTP);
- We transform the IP addresses and the TCP ports into a 4-tuplet (see Table 1).
2.2. Two-Stage Encoder and Decoder Structure
2.3. Model Training and Hyperparameter Optimization
3. Cyber-Attack Detection and Explanation
3.1. Histogram-Based Threshold Computation
3.2. Explanation via Perturbation of Attention Weights
- A positive manipulation factor (suppression) to decrease the influence of a specific attention weight;
- A negative manipulation factor (amplification) to increase the influence of a specific attention weight.
4. Results and Discussion
4.1. CIC-IDS-2017 Dataset and Explanation Evaluation
- Ten normal samples indicating web browsing and FTP file transfer activities;
- Eighteen attack samples indicating FTP and SSH brute-force attacks.
- For normal samples, we assign influence values of for packets containing TCP, HTTP, or FTP protocol layers and specific MAC addresses;
- For attack samples, we assign influence values of for packets containing TCP, SSH, or FTP protocol layers as well as the attacker’s MAC address.
4.2. Reconstruction Accuracies and Detection Results
4.3. Explanation of Benign and Malicious Network Packet Sequences
5. Summary and Outlook
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix B
Parameter | Value |
---|---|
Number of bins | 100 |
Maximum cumulative sum | 0.9 |
Parameter | Value |
---|---|
Number of embedding dimensions | 50 |
Hidden unit linear layer (encoder) | [900, 300] |
Hidden unit linear layer (decoder, discrete outputs) | [900, 200] |
Hidden unit linear layer (decoder, continuous outputs) | [1000, 500] |
Number of transformer blocks | 1 |
Number of attention heads | 3 |
Size of attention head | 10 |
Hidden unit linear layer (transformer block) | 400 |
Number of latent dimensions | 160 |
Optimizer | Adam |
Learning rate | 0.0005 |
Batch size | 100 |
Loss weight (discrete outputs) | 1.0 |
Loss weight (continuous outputs) | 1.4 |
Appendix C
No. | [%] | ||||
---|---|---|---|---|---|
Suppression experiments | |||||
1 | 0.1 … 0.99 | 18 | 1 | 0.4 | 16.9 … 22.3 |
2 | 0.99 | 18 … 42 | 1 | 0.4 | 12.5 … 16.2 |
3 | 0.99 | 18 | 1 … 4 | 0.4 | 11.7 … 16.1 |
4 | 0.99 | 18 | 1 | 0.1 … 0.6 | 16.9 … 17.0 |
Amplification experiments | |||||
5 | −4.0 … −0.2 | 18 | 1 | 0.2 | 12.2 … 21.6 |
6 | −1.0 | 18 … 40 | 1 | 0.4 | 15.5 … 18.2 |
7 | −1.0 | 20 | 1 … 4 | 0.4 | 12.2 … 18.1 |
8 | −1.0 | 18 | 1 | 0.1 … 0.6 | 18.7 … 19.2 |
No. | [%] | ||||
---|---|---|---|---|---|
Suppression experiments | |||||
1 | 0.05 … 0.995 | 28 | 1 | 0.4 | 0.0 … 11.7 |
2 | 0.99 | 18 … 40 | 1 | 0.4 | 8.4 … 13.3 |
3 | 0.99 | 20 | 1 … 4 | 0.4 | 7.6 … 10.9 |
4 | 0.99 | 28 | 1 | 0.1 … 0.6 | 11.1 … 12.5 |
Amplification experiments | |||||
5 | −4.0 … −0.2 | 18 | 2 | 0.3 | 0.0 … 8.8 |
6 | −1.0 | 18 … 40 | 2 | 0.4 | 7.6 … 7.9 |
7 | −1.0 | 20 | 1…4 | 0.4 | 5.8 … 7.6 |
8 | −1.0 | 18 | 2 | 0.1 … 0.6 | 7.7 … 7.9 |
No. | [%] | ||||
---|---|---|---|---|---|
Suppression experiments | |||||
1 | 0.05 … 0.995 | 22 | 1 | 0.3 | 40.7 … 52.6 |
2 | 0.99 | 18 … 42 | 1 | 0.4 | 51.0 … 54.6 |
3 | 0.99 | 20 | 1 … 4 | 0.4 | 28.0 … 51.0 |
4 | 0.99 | 22 | 1 | 0.1 … 0.6 | 49.4 … 54.6 |
Amplification experiments | |||||
5 | −4.0 … −0.2 | 26 | 1 | 0.2 | 35.1 … 53.8 |
6 | −1.0 | 18 … 40 | 1 | 0.4 | 46.1 … 46.7 |
7 | −1.0 | 20 | 1 … 4 | 0.4 | 27.2 … 46.1 |
8 | −1.0 | 26 | 1 | 0.1 … 0.6 | 44.7 |
No. | |||||
---|---|---|---|---|---|
Suppression experiments | |||||
1 | 0.05 … 0.995 | 26 | 1 | 0.4 | 35.7 … 69.5 |
2 | 0.99 | 18 … 40 | 1 | 0.4 | 52.0 … 54.1 |
3 | 0.99 | 20 | 1 … 4 | 0.4 | 31.2 … 52.2 |
4 | 0.99 | 26 | 1 | 0.1 … 0.6 | 54.0 … 54.8 |
Amplification experiments | |||||
5 | −4.0 … −0.2 | 18 | 1 | 0.3 | 32.8 … 76.6 |
6 | −1.0 | 18 … 40 | 1 | 0.4 | 51.2 … 54.1 |
7 | −1.0 | 20 | 1 … 4 | 0.4 | 28.3 … 53.7 |
8 | −1.0 | 18 | 1 | 0.1 … 0.6 | 53.3 … 54.1 |
References
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. 2017. Available online: http://arxiv.org/pdf/1706.03762v5 (accessed on 23 March 2023).
- Lim, B.; Arik, S.O.; Loeff, N.; Pfister, T. Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting. Int. J. Forecast. 2021, 37, 1748–1764. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Kummerow, A.; Schäfer, K.; Gupta, P.; Nicolai, S.; Bretschneider, P. Combined Network Intrusion and Phasor Data Anomaly Detection for Secure Dynamic Control Centers. Energies 2022, 15, 3455. [Google Scholar] [CrossRef]
- RÖsch, D.; Kummerow, A.; Ruhe, S.; Schäfer, K.; Monsalve, C.; Nicolai, S. IT-Sicherheit in digitalen Stationen: Cyber-physische Systemmodellierung, -bewertung und -analyse. Automatisierungstechnik 2020, 68, 720–737. [Google Scholar] [CrossRef]
- Aleesa, A.M.; Zaidan, B.B.; Zaidan, A.A.; Sahar, N.M. Review of intrusion detection systems based on deep learning techniques: Coherent taxonomy, challenges, motivations, recommendations, substantial analysis and future directions. Neural Comput. Appl. 2019, 32, 9827–9858. [Google Scholar] [CrossRef]
- Liu, H.; Lang, B. Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey. Appl. Sci. 2019, 9, 4396. [Google Scholar] [CrossRef]
- Aldweesh, A.; Derhab, A.; Emam, A.Z. Deep learning approaches for anomaly-based intrusion detection systems: A survey, taxonomy, and open issues. Knowl.-Based Syst. 2020, 189, 105124. [Google Scholar] [CrossRef]
- Lansky, J.; Ali, S.; Mohammadi, M.; Majeed, M.K.; Karim, S.H.T.; Rashidi, S.; Hosseinzadeh, M.; Rahmani, A.M. Deep Learning-Based Intrusion Detection Systems: A Systematic Review. IEEE Access 2021, 9, 101574–101599. [Google Scholar] [CrossRef]
- Wu, Z.; Zhang, H.; Wang, P.; Sun, Z. RTIDS: A Robust Transformer-Based Approach for Intrusion Detection System. IEEE Access 2022, 10, 64375–64387. [Google Scholar] [CrossRef]
- Lin, S.Z.; Shi, Y.; Xue, Z. Character-Level Intrusion Detection Based On Convolutional Neural Networks. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
- Wang, W.; Sheng, Y.; Wang, J.; Zeng, X.; Ye, X.; Huang, Y.; Zhu, M. HAST-IDS: Learning Hierarchical Spatial-Temporal Features Using Deep Neural Networks to Improve Intrusion Detection. IEEE Access 2018, 6, 1792–1806. [Google Scholar] [CrossRef]
- Segurola-Gil, L.; Moreno-Moreno, M.; Irigoien, I.; Florez-Tapia, A.M. Unsupervised Anomaly Detection Approach for Cyberattack Identification. Int. J. Mach. Learn. Cybern. 2024, 15, 5291–5302. [Google Scholar] [CrossRef]
- The Bot-IoT Dataset|UNSW Research. Available online: https://research.unsw.edu.au/projects/bot-iot-dataset (accessed on 7 November 2024).
- The UNSW-NB15 Dataset|UNSW Research. Available online: https://research.unsw.edu.au/projects/unsw-nb15-dataset (accessed on 7 November 2024).
- Kaliyaperumal, P.; Periyasamy, S.; Thirumalaisamy, M.; Balusamy, B.; Benedetto, F. A Novel Hybrid Unsupervised Learning Approach for Enhanced Cybersecurity in the IoT. Future Internet 2024, 16, 253. [Google Scholar] [CrossRef]
- IDS 2018|Datasets|Research|Canadian Institute for Cybersecurity|UNB. Available online: https://www.unb.ca/cic/datasets/ids-2018.html (accessed on 7 November 2024).
- Eren, M.E.; Moore, J.S.; Skau, E.; Moore, E.; Bhattarai, M.; Chennupati, G.; Alexandrov, B.S. General-purpose Unsupervised Cyber Anomaly Detection via Non-negative Tensor Factorization. Digit. Threats 2023, 4, 1–28. [Google Scholar] [CrossRef]
- Ahmed, M.S.; Shah, S.M. Unsupervised Ensemble Based Deep Learning Approach for Attack Detection in IoT Network. 2022. Available online: http://arxiv.org/pdf/2207.07903 (accessed on 7 November 2024).
- Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications, CISDA 2009, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar]
- The TON_IoT Datasets|UNSW Research. Available online: https://research.unsw.edu.au/projects/toniot-datasets (accessed on 7 November 2024).
- Meira, J.; Andrade, R.; Praça, I.; Carneiro, J.; Bolón-Canedo, V.; Alonso-Betanzos, A.; Marreiros, G. Performance evaluation of unsupervised techniques in cyber-attack anomaly detection. J. Ambient. Intell. Humaniz. Comput. 2020, 11, 4477–4489. [Google Scholar] [CrossRef]
- IDS 2012|Datasets|Research|Canadian Institute for Cybersecurity|UNB. Available online: https://www.unb.ca/cic/datasets/ids.html (accessed on 25 January 2022).
- Aygun, R.C.; Yavuz, A.G. Network Anomaly Detection with Stochastically Improved Autoencoder Based Models. In Proceedings of the 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud), New York, NY, USA, 26–28 June 2017; pp. 193–198. [Google Scholar]
- Meidan, Y.; Bohadana, M.; Mathov, Y.; Mirsky, Y.; Shabtai, A.; Breitenbacher, D.; Elovici, Y. N-BaIoT—Network-Based Detection of IoT Botnet Attacks Using Deep Autoencoders. IEEE Pervasive Comput. 2018, 17, 12–22. [Google Scholar] [CrossRef]
- Mirsky, Y.; Doitshman, T.; Elovici, Y.; Shabtai, A. Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection. In Proceedings of the 2018 Network and Distributed System Security Symposium, San Diego, CA, USA, 18–21 February 2018. [Google Scholar]
- Shahid, M.R.; Blanc, G.; Zhang, Z.; Debar, H. Anomalous Communications Detection in IoT Networks Using Sparse Autoencoders. In Proceedings of the 2019 IEEE 18th International Symposium on Network Computing and Applications (NCA), Cambridge, MA, USA, 26–28 September 2019; pp. 1–5. [Google Scholar]
- Song, Y.; Hyun, S.; Cheong, Y.-G. Analysis of Autoencoders for Network Intrusion Detection. Sensors 2021, 21, 4294. [Google Scholar] [CrossRef] [PubMed]
- Kang, H.; Ahn, D.H.; Lee, G.M.; Yoo, J.D.; Park, K.H.; Kim, H.K. IoT Network Intrusion Dataset. 2019. Available online: https://ocslab.hksecurity.net/Datasets/iot-network-intrusion-dataset (accessed on 17 September 2024).
- Marino, D.L.; Wickramasinghe, C.S.; Rieger, C.; Manic, M. Self-Supervised and Interpretable Anomaly Detection Using Network Transformers. 2022. Available online: http://arxiv.org/pdf/2202.12997v1 (accessed on 15 September 2022).
- Minh, D.; Wang, H.X.; Li, Y.F.; Nguyen, T.N. Explainable artificial intelligence: A comprehensive review. Artif. Intell. Rev. 2022, 55, 3503–3568. [Google Scholar] [CrossRef]
- Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
- Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable AI: A Review of Machine Learning Interpretability Methods. Entropy 2020, 23, 18. [Google Scholar] [CrossRef]
- Guidotti, R.; Monreale, A.; Ruggieri, S.; Turini, F.; Pedreschi, D.; Giannotti, F. A Survey Of Methods For Explaining Black Box Models. ACM Comput. Surv. 2018, 51, 1–42. [Google Scholar] [CrossRef]
- Nguyen, Q.P.; Lim, K.W.; Divakaran, D.M.; Low, K.H.; Chan, M.C. GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection. In Proceedings of the 2019 IEEE Conference on Communications and Network Security (CNS), Washington, DC, USA, 10–12 June 2019. [Google Scholar]
- Zhang, X.; Marwah, M.; Lee, I.-T.; Arlitt, M.; Goldwasser, D. ACE—An Anomaly Contribution Explainer for Cyber-Security Applications. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019. [Google Scholar]
- Yepmo, V.; Smits, G.; Pivert, O. Anomaly explanation: A review. Data Knowl. Eng. 2022, 137, 101946. [Google Scholar] [CrossRef]
- Amarasinghe, K.; Kenney, K.; Manic, M. Toward Explainable Deep Neural Network Based Anomaly Detection. In Proceedings of the 2018 11th International Conference on Human System Interaction (HSI), Gdańsk, Poland, 4–6 July 2018; pp. 311–317. [Google Scholar]
- Antwarg, L.; Miller, R.M.; Shapira, B.; Rokach, L. Explaining Anomalies Detected by Autoencoders Using SHAP. arXiv 2019, arXiv:1903.02407. [Google Scholar]
- Chen, X.; Deng, L.; Huang, F.; Zhang, C.; Zhang, Z.; Zhao, Y.; Zheng, K. DAEMON: Unsupervised Anomaly Detection and Interpretation for Multivariate Time Series. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece, 19–22 April 2021; pp. 2225–2230. [Google Scholar]
- Haldar, S.; John, P.G.; Saha, D. Reliable Counterfactual Explanations for Autoencoder based Anomalies. In Proceedings of the 3rd ACM India Joint International Conference on Data Science & Management of Data (8th ACM IKDD CODS & 26th COMAD), Bangalore, India, 2–4 January 2021; pp. 83–91. [Google Scholar]
- Xu, H.; Wang, Y.; Jian, S.; Huang, Z.; Wang, Y.; Liu, N.; Li, F. Beyond Outlier Detection: Outlier Interpretation by Attention-Guided Triplet Deviation Network. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 1328–1339. [Google Scholar]
- Deiseroth, B.; Deb, M.; Weinbach, S.; Brack, M.; Schramowski, P.; Kersting, K. AtMan: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation. Adv. Neural Inf. Process. Syst. 2023, 36, 63437–63460. [Google Scholar]
- IDS 2017|Datasets|Research|Canadian Institute for Cybersecurity|UNB. Available online: https://www.unb.ca/cic/datasets/ids-2017.html (accessed on 25 January 2022).
- Kummerow, A.; Henneke, M.; Bachmann, P.; Krackruegge, S.; Laessig, J.; Nicolai, S. Cyber-security platform for the transparent cyber-attack detection in energy supply infrastructures. In Proceedings of the ETG Congress 2023, Kassel, Germany, 25–26 May 2023; pp. 1–7. [Google Scholar]
- Kummerow, A.; Esrom, A.; Nicolai, S.; Bretschneider, P. Transparent autoencoding of network packets with self-attention-based transformers. In Proceedings of the 2023 IEEE 48th Conference on Local Computer Networks (LCN), Daytona Beach, FL, USA, 1–5 October 2023; pp. 1–4. [Google Scholar]
- Kummerow, A.; Monsalve, C.; Bretschneider, P. Siamese recurrent neural networks for the robust classification of grid disturbances in transmission power systems considering unknown events. IET Smart Grid 2021, 5, 51–61. [Google Scholar] [CrossRef]
- Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 17 July 2017; Volume 70, pp. 3319–3328. [Google Scholar]
Input Feature | Encoding Type |
---|---|
Packet structure (trimmed) | [1…N] |
MAC source address MAC destination address | [1…N] |
IP source address IP destination address | 0: no address 1: equals previous source address 2: equals previous destination address 3: new address |
TCP source port TCP destination port | 0: no port 1: system port 2: register port 3: dynamic port |
TCP flag | [1…N] |
Input Feature (Unit) | Encoding Type |
---|---|
Packet time difference (ms) Packet length (bytes) IP checksum IP length (bytes) TCP checksum TCP payload size (bytes) TCP stream number TCP time delay (ms) TCP window size | Numeric |
Dataset | Characteristics |
---|---|
Training | Normal packets: 750,000 Attack packets: 0 Attack ratio: 0% |
Test | Normal packets: 752,600 Attack packets: 97,400 Attack ratio: 13% |
Pool Size | F1 | BA | FPR |
---|---|---|---|
25 | 95.36 | 95.47 | 4.53 |
50 | 91.59 | 92.17 | 7.83 |
100 | 74.02 | 79.42 | 20.58 |
Samples | Fβ (Ours) [%] | Fβ (IG) [%] | ||||
---|---|---|---|---|---|---|
Normal | 0.10 | 18 | 1 | 0.4 | 22.3 | 7.4 |
Attack | 0.99 | 22 | 1 | 0.3 | 54.6 | 18.6 |
Samples | Fβ (Ours) [%] | Fβ (IG) [%] | ||||
---|---|---|---|---|---|---|
Normal | 0.8 | 28 | 1 | 0.4 | 24.3 | 6.0 |
Attack | 0.2 | 26 | 1 | 0.4 | 69.5 | 19.4 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kummerow, A.; Abrha, E.; Eisenbach, M.; Rösch, D. Unsupervised Anomaly Detection and Explanation in Network Traffic with Transformers. Electronics 2024, 13, 4570. https://doi.org/10.3390/electronics13224570
Kummerow A, Abrha E, Eisenbach M, Rösch D. Unsupervised Anomaly Detection and Explanation in Network Traffic with Transformers. Electronics. 2024; 13(22):4570. https://doi.org/10.3390/electronics13224570
Chicago/Turabian StyleKummerow, André, Esrom Abrha, Markus Eisenbach, and Dennis Rösch. 2024. "Unsupervised Anomaly Detection and Explanation in Network Traffic with Transformers" Electronics 13, no. 22: 4570. https://doi.org/10.3390/electronics13224570
APA StyleKummerow, A., Abrha, E., Eisenbach, M., & Rösch, D. (2024). Unsupervised Anomaly Detection and Explanation in Network Traffic with Transformers. Electronics, 13(22), 4570. https://doi.org/10.3390/electronics13224570