LogEDL: Log Anomaly Detection via Evidential Deep Learning
Abstract
:1. Introduction
- Analyzing the similarity between the log anomaly detection task and open-set recognition, and introducing the relationship and differences between open-set recognition and evidential learning, clarifying the applicability of evidential learning to log anomaly detection methods.
- Proposing an evidential learning network, LogEDL, suitable for log anomaly detection, enabling the model to identify unknown logs and quantitatively assess the uncertainty of unknown data, thereby assisting in identifying anomalous data.
- Applying evidential learning algorithms to log anomaly detection tasks on public datasets, demonstrating the effectiveness and generalization ability of the proposed method.
2. Related Work
2.1. Paradigm of Self-Training
2.2. Semi-Supervised Anomaly Detection
3. Preliminary
3.1. Background of Open-World Learning
3.1.1. Open-World Learning Algorithms
3.1.2. Open-World Learning Processes
3.2. Background of Evidential Deep Learning
4. Methodology
4.1. Masked Language Model
4.2. Uncertainty in Log Sequence Detection
4.3. Preprocessing
4.4. Problem Definition
4.5. LogEDL
4.5.1. Transformer Encoder
4.5.2. ENN Head
4.5.3. Uncertainty
4.5.4. Anomaly Detection
4.5.5. Evidential Uncertainty Loss
5. Experiment and Analysis
5.1. Experiment Setting
5.1.1. Dataset
5.1.2. Implementation Details
5.1.3. Evaluation Metrics
5.1.4. Baselines
5.2. Results and Analysis
5.2.1. Comparison Methods
5.2.2. Ablation Studies
6. Discussion
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
ML | Machine learning |
DL | Deep learning |
MLM | Masked language modeling |
NSP | Next sentence prediction |
EDL | Evidential deep learning |
VHM | Volume of hypersphere minimization |
OSR | Open-set recognition |
DNNs | Deep neural networks |
MLE | Maximum likelihood estimation |
DST | Shafer theory |
SL | Subjective logic |
FC | Fully connected |
AC | Accurate and certain |
AU | Accurate and uncertain |
IC | Inaccurate and certain |
IU | Inaccurate and uncertain |
EMSE | Evidential mean squared error loss |
ENL | Evidential negative log likelihood loss |
ECE | Evidential cross-entropy loss |
LLNL | Lawrence Livermore National Laboratory |
PCA | Principal component analysis |
iForest | Isolation Forest |
OCSVM | One-class SVM |
References
- He, S.; He, P.; Chen, Z.; Yang, T.; Su, Y.; Lyu, M.R. A survey on automated log analysis for reliability engineering. ACM Comput. Surv. (CSUR) 2021, 54, 1–37. [Google Scholar] [CrossRef]
- Liu, Y.; Zhang, X.; He, S.; Zhang, H.; Li, L.; Kang, Y.; Xu, Y.; Ma, M.; Lin, Q.; Dang, Y.; et al. Uniparser: A unified log parser for heterogeneous log data. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 1893–1901. [Google Scholar]
- Ma, L.; Yang, W.; Xu, B.; Jiang, S.; Fei, B.; Liang, J.; Zhou, M.; Xiao, Y. KnowLog: Knowledge Enhanced Pre-trained Language Model for Log Understanding. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, Lisbon, Portugal, 14–20 April 2024; pp. 1–13. [Google Scholar]
- Lin, Q.; Zhang, H.; Lou, J.G.; Zhang, Y.; Chen, X. Log clustering based problem identification for online service systems. In Proceedings of the 38th International Conference on Software Engineering Companion, Austin, TX, USA, 14–22 May 2016; pp. 102–111. [Google Scholar]
- Xie, Y.; Zhang, H.; Zhang, B.; Babar, M.A.; Lu, S. Logdp: Combining dependency and proximity for log-based anomaly detection. In Proceedings of the Service-Oriented Computing: 19th International Conference, ICSOC 2021, Virtual Event, 22–25 November 2021; Proceedings 19. Springer: Berlin/Heidelberg, Germany, 2021; pp. 708–716. [Google Scholar]
- Zhang, X.; Xu, Y.; Qin, S.; He, S.; Qiao, B.; Li, Z.; Zhang, H.; Li, X.; Dang, Y.; Lin, Q.; et al. Onion: Identifying incident-indicating logs for cloud systems. In Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Athens, Greece, 23–28 August 2021; pp. 1253–1263. [Google Scholar]
- Du, M.; Li, F.; Zheng, G.; Srikumar, V. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1285–1298. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
- Lewis, M.; Liu, Y.; Goyal, N.; Ghazvininejad, M.; Mohamed, A.; Levy, O.; Stoyanov, V.; Zettlemoyer, L. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
- Geng, C.; Huang, S.j.; Chen, S. Recent advances in open set recognition: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 3614–3631. [Google Scholar] [CrossRef]
- Liu, Z.; Miao, Z.; Zhan, X.; Wang, J.; Gong, B.; Yu, S.X. Large-scale long-tailed recognition in an open world. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2019, Long Beach, CA, USA, 15–20 June 2019; pp. 2537–2546. [Google Scholar]
- Jafarzadeh, M.; Dhamija, A.R.; Cruz, S.; Li, C.; Ahmad, T.; Boult, T.E. A Review of Open-World Learning and Steps Toward Open-World Learning Without Labels. arXiv 2020, arXiv:2011.12906. [Google Scholar]
- Amini, A.; Schwarting, W.; Soleimany, A.; Rus, D. Deep evidential regression. Adv. Neural Inf. Process. Syst. 2020, 33, 14927–14937. [Google Scholar]
- Sensoy, M.; Kaplan, L.; Kandemir, M. Evidential deep learning to quantify classification uncertainty. Adv. Neural Inf. Process. Syst. 2018, 31, 3183–3193. [Google Scholar]
- Guo, H.; Yuan, S.; Wu, X. Logbert: Log anomaly detection via bert. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–8. [Google Scholar]
- Lee, Y.; Kim, J.; Kang, P. Lanobert: System log anomaly detection based on bert masked language model. Appl. Soft Comput. 2023, 146, 110689. [Google Scholar] [CrossRef]
- Almodovar, C.; Sabrina, F.; Karimi, S.; Azad, S. LogFiT: Log anomaly detection using fine-tuned language models. IEEE Trans. Netw. Serv. Manag. 2024, 21, 1715–1723. [Google Scholar] [CrossRef]
- Shafer, G. Dempster-shafer theory. Encycl. Artif. Intell. 1992, 1, 330–331. [Google Scholar]
- Du, M.; Li, F. Spell: Streaming parsing of system event logs. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 859–864. [Google Scholar]
- Makanju, A.A.; Zincir-Heywood, A.N.; Milios, E.E. Clustering event logs using iterative partitioning. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; pp. 1255–1264. [Google Scholar]
- He, P.; Zhu, J.; Zheng, Z.; Lyu, M.R. Drain: An online log parsing approach with fixed depth tree. In Proceedings of the 2017 IEEE International Conference on Web Services (ICWS), Honolulu, HI, USA, 25–30 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 33–40. [Google Scholar]
- Huang, S.; Liu, Y.; Fung, C.; He, R.; Zhao, Y.; Yang, H.; Luan, Z. Paddy: An event log parsing approach using dynamic dictionary. In Proceedings of the NOMS 2020–2020 IEEE/IFIP Network Operations and Management Symposium, Budapest, Hungary, 20–24 April 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–8. [Google Scholar]
- Zhu, J.; He, S.; Liu, J.; He, P.; Xie, Q.; Zheng, Z.; Lyu, M.R. Tools and benchmarks for automated log parsing. In Proceedings of the 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP), Montreal, QC, Canada, 25–31 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 121–130. [Google Scholar]
- Qi, J.; Luan, Z.; Huang, S.; Fung, C.; Yang, H.; Li, H.; Zhu, D.; Qian, D. Logencoder: Log-based contrastive representation learning for anomaly detection. IEEE Trans. Netw. Serv. Manag. 2023, 20, 1378–1391. [Google Scholar] [CrossRef]
- Yager, R.R.; Liu, L. Classic Works of the Dempster-Shafer Theory of Belief Functions; Springer: Berlin/Heidelberg, Germany, 2008; Volume 219. [Google Scholar]
- Xu, W.; Huang, L.; Fox, A.; Patterson, D.; Jordan, M.I. Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, Big Sky, MT, USA, 11–14 October 2009; pp. 117–132. [Google Scholar]
- Oliner, A.; Stearley, J. What supercomputers say: A study of five system logs. In Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN’07), Edinburgh, UK, 25–28 June 2007; IEEE: Piscataway, NJ, USA, 2007; pp. 575–584. [Google Scholar]
- Zhu, J.; He, S.; He, P.; Liu, J.; Lyu, M.R. Loghub: A large collection of system log datasets for ai-driven log analytics. In Proceedings of the 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE), Florence, Italy, 9–12 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 355–366. [Google Scholar]
- Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation Forest. In Proceedings of the Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar]
- Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the support of a high-dimensional distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef] [PubMed]
- Meng, W.; Liu, Y.; Zhu, Y.; Zhang, S.; Pei, D.; Liu, Y.; Chen, Y.; Zhang, R.; Tao, S.; Sun, P.; et al. Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs. In Proceedings of the International Joint Conference on Artificial Intelligence 2019, Macao, China, 10–16 August 2019; Volume 19, pp. 4739–4745. [Google Scholar]
- Kotz, S.; Nadarajah, S. Extreme Value Distributions: Theory and Applications; World Scientific: Singapore, 2000. [Google Scholar]
Dataset | Log Messages | Normal | Anomaly |
---|---|---|---|
HDFS | 11,175,629 | 10,887,339 | 288,290 |
BGL | 4,747,963 | 4,399,503 | 348,460 |
Thunderbird | 20,000,000 | 1,241,438 | 758,562 |
Dataset | Log Sequences | Log Keys | Train Dataset | Test Dataset | Average Length | |
---|---|---|---|---|---|---|
Normal | Normal | Anomaly | ||||
HDFS | 742,527 | 47 (18) | 167,466 | 558,223 | 16,838 | 19 |
BGL | 37,315 | 334 (175) | 13,718 | 20,579 | 3018 | 562 |
Thunderbird | 122,540 | 1165 (866) | 46,293 | 30,862 | 45,385 | 326 |
Method | HDFS | BGL | Thunderbird | ||||||
---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F1 | Precision | Recall | F1 | Precision | Recall | F1 | |
PCA | 5.89 | 100.00 | 11.12 | 9.07 | 98.23 | 16.61 | 37.35 | 100.00 | 54.39 |
iForest | 53.60 | 69.41 | 60.49 | 99.70 | 18.11 | 30.65 | 34.45 | 1.68 | 3.20 |
OCSVM | 2.54 | 100.00 | 4.95 | 1.06 | 12.24 | 1.96 | 18.89 | 39.11 | 25.48 |
LogCluster | 99.26 | 37.08 | 53.99 | 95.46 | 64.01 | 76.63 | 98.28 | 42.78 | 59.61 |
DeepLog | 88.44 | 69.49 | 77.34 | 89.74 | 82.78 | 86.12 | 87.34 | 99.61 | 93.08 |
LogAnomaly | 94.15 | 40.47 | 56.19 | 73.12 | 76.09 | 74.08 | 86.72 | 99.63 | 92.73 |
LogBERT | 87.02 | 78.10 | 82.32 | 89.40 | 92.32 | 90.83 | 96.75 | 96.52 | 96.64 |
LogBERT (EVT) | 29.24 | 69.01 | 41.07 | 28.11 | 73.05 | 40.60 | 30.78 | 74.24 | 43.51 |
LogEDL (ours) | 90.06 | 92.80 | 91.41 | 99.84 | 97.26 | 98.53 | 97.80 | 98.08 | 97.91 |
Method | Flops | Parameters | Time (s) |
---|---|---|---|
PCA | - | - | |
iForest | - | - | |
OCSVM | - | - | |
LogCluster | - | - | |
DeepLog | 1.0 M | 0.1 M | |
LogAnomaly | 2.0 M | 0.1 M | |
LogBERT | 628.6 M | 2.1 M | |
LogEDL (ours) | 1076.8 M | 2.1 M |
Method | HDFS | BGL | Thunderbird | ||||||
---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F1 | Precision | Recall | F1 | Precision | Recall | F1 | |
LogBERT | 87.02 | 78.10 | 82.32 | 89.40 | 92.32 | 90.83 | 96.75 | 96.52 | 96.64 |
LogEDL (w. ENL) | 89.14 | 84.59 | 86.81 | 97.43 | 92.54 | 94.92 | 96.70 | 96.96 | 96.83 |
LogEDL (w. EMSE) | 89.93 | 84.62 | 87.19 | 97.46 | 92.34 | 94.83 | 96.99 | 97.80 | 97.39 |
LogEDL (w. ECE) | 90.06 | 92.80 | 91.41 | 99.84 | 97.26 | 98.53 | 97.80 | 98.08 | 97.91 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Duan, Y.; Xue, K.; Sun, H.; Bao, H.; Wei, Y.; You, Z.; Zhang, Y.; Jiang, X.; Yang, S.; Chen, J.; et al. LogEDL: Log Anomaly Detection via Evidential Deep Learning. Appl. Sci. 2024, 14, 7055. https://doi.org/10.3390/app14167055
Duan Y, Xue K, Sun H, Bao H, Wei Y, You Z, Zhang Y, Jiang X, Yang S, Chen J, et al. LogEDL: Log Anomaly Detection via Evidential Deep Learning. Applied Sciences. 2024; 14(16):7055. https://doi.org/10.3390/app14167055
Chicago/Turabian StyleDuan, Yunfeng, Kaiwen Xue, Hao Sun, Haotong Bao, Yadong Wei, Zhangzheng You, Yuantian Zhang, Xiwei Jiang, Sangning Yang, Jiaxing Chen, and et al. 2024. "LogEDL: Log Anomaly Detection via Evidential Deep Learning" Applied Sciences 14, no. 16: 7055. https://doi.org/10.3390/app14167055
APA StyleDuan, Y., Xue, K., Sun, H., Bao, H., Wei, Y., You, Z., Zhang, Y., Jiang, X., Yang, S., Chen, J., Duan, B., & Ou, Z. (2024). LogEDL: Log Anomaly Detection via Evidential Deep Learning. Applied Sciences, 14(16), 7055. https://doi.org/10.3390/app14167055