Attention-Based Automated Feature Extraction for Malware Analysis
Abstract
:1. Introduction
2. Related Work
3. Deep Learning Based Malware Detection
3.1. Feature Extraction for Malware Detection
3.1.1. Feature Extraction Using Static Analysis
3.1.2. Feature Extraction Using Dynamic Analysis
3.2. DL-Based Malware Detection Model
3.2.1. Recurrent Neural Network (RNN)
3.2.2. Long Short Term Momory (LSTM)
3.2.3. Skip-Connected LSTM
4. Automated Feature Extraction Based on Attention
4.1. Attention Mechanism
4.2. Feature Extraction Based on Attention Mechanism
5. Experimental Results
5.1. Setup
5.2. Data
5.3. Performance Metric
5.4. Results
5.4.1. Accuracy
5.4.2. Training Time
5.4.3. Test Time
6. Discussion
7. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- AV-TEST. Available online: https://www.av-test.org (accessed on 17 April 2020).
- Zero-Day. Available online: https://en.wikipedia.org/wiki/Zero-day_computing (accessed on 17 April 2020).
- Gavriluţ, D.; Cimpoeşu, M.; Anton, D.; Ciortuz, L. Malware Detection using Machine Learning. In Proceedings of the Internation Multiconference on Computer Science and Information Technology, Mragowo, Poland, 12–14 October 2009. [Google Scholar]
- Saxe, J.; Berlin, K. Deep Neural Network based Malware Detection using Two Dimensional Binary Program Features. In Proceedings of the International Conference on Malicious and Unwanted Software (MALWARE), Fajardo, Puerto Rico, 20–22 October 2015. [Google Scholar]
- Gibert, D. Convolutional Neural Networks for Malware Classification. Master’s Thesis, Universitat de Barcelona, Barcelona, Spain, 2016. [Google Scholar]
- Dahl, G.E.; Stokes, J.W.; Deng, L.; Yu, D. Large-scale Malware Classification using Random Projections and Neural Networks. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing(ICASSP), Vancouver, BC, Canada, 26–31 May 2013. [Google Scholar]
- Pascanu, R.; Stokes, J.W.; Sanossian, H.; Marinescu, M.; Thomas, A. Malware classification with recurrent networks. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, 19–24 April 2015. [Google Scholar]
- Huang, W.; Stokes, J.W. MtNet: A Multi-task Neural Networks for Dynamic Malware Classification. In Proceedings of the International Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA), San Sebastian, Spain, 7–8 July 2016. [Google Scholar]
- Ki, Y.; Kim, E.; Kim, H.K. A Novel Approach to Detect Malware Based on API Call Sequence Analysis. Int. J. Distrib. Sens. Networks 2015, 11. [Google Scholar] [CrossRef] [Green Version]
- Bae, J.; Lee, C.; Choi, S.; Kim, J. Malware Detection Model with Skip-Connected LSTM RNN. J. Korean Inst. Inf. Sci. Eng. 2018, 45, 1233–1239. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Wang, Y.; Tian, F. Recurrent Residual Learning for Sequence Classification. In Proceedings of the International Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, TX, USA, 1–5 November 2016. [Google Scholar]
- Kaspersky. Available online: https://securelist.com/mobile-malware-evolution-2018/89689/ (accessed on 9 May 2020).
- Schultz, M.G.; Eskin, E.; Zadok, F.; Stolfo, S.J. Data Mining Methods for Detection of New Malicious Executables. In Proceedings of the IEEE International Symposium on Security and Privacy (SP), Oakland, CA, USA, 14–16 May 2000. [Google Scholar]
- Weber, M.; Schmid, M.; Schatz, M.; Geyer, D. A Toolkit for Detecting and Analyzing Malicious Software. In Proceedings of the IEEE International Conference on Computer Security Applications, Las Vegas, NV, USA, 9–13 December 2002. [Google Scholar]
- Abou-Assaleh, T.; Cercone, N.; Keselj, V.; Sweidan, R. N-gram based Detection of New Malicious Code. In Proceedings of the IEEE International Conference on Computer Security and Applications, HongKong, China, 28–30 September 2004. [Google Scholar]
- Moser, A.; Kruegel, C.; Kirda, E. Limits of Static Analysis for Malware Detection. In Proceedings of the 23rd IEEE International Conference on Computer Security and Applications, Miami Beach, FL, USA, 10–14 December 2007. [Google Scholar]
- Rush, A.M.; Harvard, S.E.A.S.; Chopra, S.; Weston, J. A Neural Attention Model for Sentence Summarization. In Proceedings of the International Conference on Empirical Methods in Natural Language Processing, Lisbon, Protugal, 17–21 September 2015. [Google Scholar]
- Šrndić, N.; Laskov, P. Hidost: A Static Machine-Learning-based Detector of Malicious Files. Eurasip J. Inf. Secur. 2016, 22. [Google Scholar] [CrossRef] [Green Version]
- Hendler, D.; Kels, S.; Rubin, A. Detecting Malicious Powershell Commands using Deep Neural Networks. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, Incheon, Korea, 4–8 June 2018. [Google Scholar]
- Rusak, U.M.O.G.; Al-Dujaili, A. POSTER: AST-Based Deep Learning for Detecting Malicious Powershell. CoRR 2018. [Google Scholar]
- Objdump. Diassembler. Available online: https://en.wikipedia.org/wiki/Objdump (accessed on 17 April 2020).
- Wikipedia. n-gram. Available online: https://en.wikipedia.org/wiki/N-gram (accessed on 17 April 2020).
- Cuckoo Sandbox. Available online: https://cuckoosandbox.org (accessed on 17 April 2020).
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Vanishing Gradient Problem. Available online: https://en.wikipedia.org/wiki/Vanishing_gradient_problem (accessed on 17 April 2020).
- Colah’s Blog. Understanding LSTM Networks. Available online: http://colah.github.io/posts/2015-08-Understanding-LSTMs/ (accessed on 17 April 2020).
- Bastien, F.; Lamblin, P.; Pascanu, R.; Bergstra, J.; Goodfellow, I.; Bergeron, A.; Bouchard, N.; Warde-Farley, D.; Bengio, Y. Theano: New features and speed improvements. arXiv 2012, arXiv:1211.5590. [Google Scholar]
- Hauri. Antivirus Company. Available online: http://www.hauri.net (accessed on 17 April 2020).
- Ahnlab. V3 Internet Security. Available online: https://global.ahnlab.com/site/product/productSubDetail.do?prodSeq=5805 (accessed on 2 May 2020).
- Microsoft Malware Classification Challenge. Available online: https://www.kaggle.com/c/malware-classification (accessed on 9 May 2020).
- Cross Validation. Available online: https://machinelearningmastery.com/k-fold-cross-validation/ (accessed on 2 May 2020).
- KERAS. Available online: https://keras.io (accessed on 17 April 2020).
API Num | API System Call | Category |
---|---|---|
1 | CreateProcess | 1 |
2 | ExitProcess | 1 |
3 | TerminateProcess | 1 |
4 | OpenProcess | 1 |
5 | SearchProcess | 1 |
6 | ProcessDEPPolicy | 1 |
7 | InformationProcess | 1 |
8 | CreateLocalThread | 2 |
9 | ExitThread | 2 |
10 | TerminateThread | 2 |
… | … | … |
Name | Specification |
---|---|
OS | Ubuntu 14.04 (64 bit) |
CPU | Intel i7-7700 (4.2 GHz) |
RAM | 32 GB |
GPU | GTX 1060 |
Cuda | 8.0 |
Malware Type | Number |
---|---|
Trojan | 646 |
Win32 | 281 |
Backdoor | 23 |
Worm | 25 |
Dropper | 6 |
Malware | 4 |
Virus | 4 |
Total | 1000 |
- | Malware | Benign File |
---|---|---|
Predicted Malware | TP | FP |
Predicted Benign File | FN | TN |
Model | Seq Length | TP | FP | FN | TN | Accuracy |
---|---|---|---|---|---|---|
CNN | 200 | 115 | 28 | 23 | 129 | 82.71 |
CNN | 400 | 106 | 14 | 32 | 143 | 84.40 |
CNN | 600 | 107 | 9 | 31 | 148 | 86.44 |
CNN | 800 | 107 | 13 | 31 | 144 | 85.08 |
SC-LSTM | 200 | 121 | 11 | 17 | 146 | 90.50 |
SC-LSTM | 400 | 123 | 12 | 15 | 145 | 90.84 |
SC-LSTM | 600 | 123 | 7 | 15 | 150 | 92.54 |
SC-LSTM | 800 | 125 | 15 | 13 | 142 | 90.50 |
Attention | 200 | 130 | 5 | 8 | 152 | 95.59 |
Attention | 400 | 131 | 5 | 7 | 152 | 95.93 |
Attention | 600 | 132 | 5 | 6 | 152 | 96.27 |
Attention | 800 | 125 | 15 | 13 | 142 | 90.50 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Choi, S.; Bae, J.; Lee, C.; Kim, Y.; Kim, J. Attention-Based Automated Feature Extraction for Malware Analysis. Sensors 2020, 20, 2893. https://doi.org/10.3390/s20102893
Choi S, Bae J, Lee C, Kim Y, Kim J. Attention-Based Automated Feature Extraction for Malware Analysis. Sensors. 2020; 20(10):2893. https://doi.org/10.3390/s20102893
Chicago/Turabian StyleChoi, Sunoh, Jangseong Bae, Changki Lee, Youngsoo Kim, and Jonghyun Kim. 2020. "Attention-Based Automated Feature Extraction for Malware Analysis" Sensors 20, no. 10: 2893. https://doi.org/10.3390/s20102893
APA StyleChoi, S., Bae, J., Lee, C., Kim, Y., & Kim, J. (2020). Attention-Based Automated Feature Extraction for Malware Analysis. Sensors, 20(10), 2893. https://doi.org/10.3390/s20102893