Ensemble Malware Classification System Using Deep Neural Networks
Abstract
:1. Introduction
2. Materials and Methods
3. Dataset Distribution
4. Classification Architecture
4.1. CNN Architecture
4.2. RNN Architecture
4.3. Ensemble Architecture
5. Results
6. Discussion
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Symantec Internet Security Threat Report. 2019. Available online: https://www-west.symantec.com/content/dam/symantec/docs/reports/istr-24-2019-en.pdf (accessed on 25 April 2020).
- Tian, R.; Batten, L.M.; Versteeg, S.C. Function length as a tool for malware classification. In Proceedings of the 2008 3rd International Conference on Malicious and Unwanted Software (MALWARE), Fairfax, VI, USA, 7–8 October 2008; pp. 69–76. [Google Scholar]
- Karim, M.E.; Walenstein, A.; Lakhotia, A.; Parida, L. Malware phylogeny generation using permutations of code. J. Comput. Virol. 2005, 1, 13–23. [Google Scholar] [CrossRef]
- Kolter, J.Z.; Maloof, M.A. Learning to detect malicious executables in the wild. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM), Seattle, WA, USA, 22–25 August 2004; pp. 470–478. [Google Scholar]
- Park, Y.; Reeves, D.; Mulukutla, V.; Sundaravel, B. Fast malware classification by automated behavioral graph matching. In Proceedings of the Sixth Annual Workshop on Cyber Security and Information Intelligence Research (ACM), Oak Ridge, TN, USA, 21–23 April 2010; p. 45. [Google Scholar]
- Cesare, S.; Yang, X. Classification of malware using structured control flow. In Proceedings of the Eighth Australasian Symposium on Parallel and Distributed Computing, Darlinghurst, Australia, 18 January 2010; Australian Computer Society, Inc.: Darlinghurst, Australia, 2010; Volume 107, pp. 61–70. [Google Scholar]
- Narayanan, B.N.; Djaneye-Boundjou, O.; Kebede, T.M. Performance analysis of machine learning and pattern recognition algorithms for malware classification. In Proceedings of the 2016 IEEE National Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS), Dayton, OH, USA, 25–29 July 2016; pp. 338–342. [Google Scholar]
- Kebede, T.M.; Djaneye-Boundjou, O.; Narayanan, B.N.; Ralescu, A.; Kapp, D. Classification of malware programs using autoencoders based deep learning architecture and its application to the microsoft malware classification challenge (big 2015) dataset. In Proceedings of the 2017 IEEE National Aerospace and Electronics Conference (NAECON), Dayton, OH, USA, 27–30 June 2017; pp. 70–75. [Google Scholar]
- Messay-Kebede, T.; Narayanan, B.N.; Djaneye-Boundjou, O. Combination of Traditional and Deep Learning based Architectures to Overcome Class Imbalance and its Application to Malware Classification. In Proceedings of the NAECON 2018-IEEE National Aerospace and Electronics Conference, Dayton, OH, USA, 23–26 July 2018; pp. 73–77. [Google Scholar]
- Davuluru, V.S.P.; Narayanan, B.N.; Balster, E.J. Convolutional Neural Networks as Classification Tools and Feature Extractors for Distinguishing Malware Programs. In Proceedings of the 2019 IEEE National Aerospace and Electronics Conference (NAECON), Dayton, OH, USA, 15–19 July 2019; pp. 273–278. [Google Scholar]
- Nataraj, L.; Karthikeyan, S.; Jacob, G.; Manjunath, B.S. Malware images: Visualization and automatic classification. In Proceedings of the 8th International Symposium on Visualization for Cyber Security (ACM), Pittsburgh, PA, USA, 20 July 2011; pp. 1–7. [Google Scholar]
- Yoo, I. Visualizing windows executable viruses using self-organizing maps. In Proceedings of the 2004 ACM Workshop on Visualization and Data Mining for Computer Security, Washington, DC, USA, 29 October 2004; pp. 82–89. [Google Scholar]
- Quist, D.A.; Liebrock, L.M. Visualizing compiled executables for malware analysis. In Proceedings of the 2009 6th International Workshop on Visualization for Cyber Security (IEEE), Atlantic City, NJ, USA, 11 October 2009; pp. 27–32. [Google Scholar]
- Goodall, J.R.; Radwan, H.; Halseth, L. Visual analysis of code security. In Proceedings of the Seventh International Symposium on Visualization for Cyber Security (ACM), Ottawa, ON, Canada, 14 September 2010; pp. 46–51. [Google Scholar]
- Trinius, P.; Holz, T.; Göbel, J.; Freiling, F.C. Visual analysis of malware behavior using treemaps and thread graphs. In Proceedings of the 2009 6th International Workshop on Visualization for Cyber Security, Atlantic City, NJ, USA, 11 October 2009; pp. 33–38. [Google Scholar]
- Conti, G.; Bratus, S.; Shubina, A.; Sangster, B.; Ragsdale, R.; Supan, M.; Lichtenberg, A.; Perez-Alemany, R. Automated mapping of large binary objects using primitive fragment type classification. Digit. Investig. 2010, 7, S3–S12. [Google Scholar] [CrossRef]
- Kaggle BIG 2015 Dataset. Available online: https://www.kaggle.com/c/malware-classification (accessed on 11 November 2019).
- Ronen, R.; Radu, M.; Feuerstein, C.; Yom-Tov, E.; Ahmadi, M. Microsoft malware classification challenge. arXiv 2018, arXiv:1802.10135. Available online: https://arxiv.org/abs/1802.10135 (accessed on 25 April 2020).
- Yan, J.; Qi, Y.; Rao, Q. Detecting malware with an ensemble method based on deep neural network. Secur. Commun. Netw. 2018. [Google Scholar] [CrossRef] [Green Version]
- Garcia, F.C.C.; Muga, I.I.; Felix, P. Random forest for malware classification. arXiv 2016, arXiv:1609.07770. Available online: https://arxiv.org/abs/1609.07770 (accessed on 25 April 2020).
- Burnaev, E.; Smolyakov, D. One-class SVM with privileged information and its application to malware detection. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), Barcelona, Spain, 12–15 December 2016; pp. 273–280. [Google Scholar]
- Drew, J.; Moore, T.; Hahsler, M. Polymorphic malware detection using sequence classification methods. In Proceedings of the 2016 IEEE Security and Privacy Workshops (SPW), San Jose, CA, USA, 22–26 May 2016; pp. 81–87. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, CA, USA, 3–8 December 2012; pp. 1097–1105. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Simonyan, K.; Andrew, Z. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. Available online: https://arxiv.org/abs/1409.1556 (accessed on 25 April 2020).
- Sundermeyer, M.; Ralf, S.; Hermann, N. LSTM neural networks for language modeling. In Proceedings of the Thirteenth Annual Conference of the International Speech Communication Association, Portland, OR, USA, 9–13 September 2012. [Google Scholar]
Class ID | Malware Category |
---|---|
1 | Gatak |
2 | Kelihos_ver1 |
3 | Kelihos_ver2 |
4 | Lollipop |
5 | Obfuscator_ACY |
6 | Ramnit |
7 | Simda |
8 | Tracur |
9 | Vumdo |
Class ID | Number of Training Samples | Number of Validation Samples | Number of Testing Samples |
---|---|---|---|
1 | 729 | 81 | 203 |
2 | 286 | 32 | 80 |
3 | 2119 | 235 | 588 |
4 | 1784 | 198 | 496 |
5 | 884 | 98 | 246 |
6 | 1110 | 123 | 308 |
7 | 31 | 3 | 8 |
8 | 541 | 60 | 150 |
9 | 342 | 38 | 95 |
N | Accuracy (%) |
---|---|
2 | 91.2 |
3 | 95.6 |
4 | 96.2 |
5 | 95.7 |
Algorithm | Accuracy (%) |
---|---|
One-Class SVM [21] | 92 |
Random Forest [20] | 95.6 |
N-gram with SVM with only Assembly Files | 96.2 |
PCA and kNN [7] | 96.6 |
Proposed LSTM with only Assembly Files | 97.2 |
Strand Gene Sequence [21] | 98.59 |
Autoencoders [8] 1 | 99.1 |
MalNet [19] | 99.3 |
Our CNN Feature Extraction and SVM [10] | 99.4 |
Proposed Ensemble Approach using Logistic Regression | 99.5 |
Proposed Ensemble Approach using SVM | 99.8 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Narayanan, B.N.; Davuluru, V.S.P. Ensemble Malware Classification System Using Deep Neural Networks. Electronics 2020, 9, 721. https://doi.org/10.3390/electronics9050721
Narayanan BN, Davuluru VSP. Ensemble Malware Classification System Using Deep Neural Networks. Electronics. 2020; 9(5):721. https://doi.org/10.3390/electronics9050721
Chicago/Turabian StyleNarayanan, Barath Narayanan, and Venkata Salini Priyamvada Davuluru. 2020. "Ensemble Malware Classification System Using Deep Neural Networks" Electronics 9, no. 5: 721. https://doi.org/10.3390/electronics9050721
APA StyleNarayanan, B. N., & Davuluru, V. S. P. (2020). Ensemble Malware Classification System Using Deep Neural Networks. Electronics, 9(5), 721. https://doi.org/10.3390/electronics9050721