A New Framework for Visual Classification of Multi-Channel Malware Based on Transfer Learning
Abstract
:1. Introduction
- A new method of malware visualization is proposed, which transforms the extracted malware binary file information into three different grayscale images and fuses them into three-channel RGB images. We can analyze malware from a multi-dimensional perspective and effectively retain relevant information in the malware binary file.
- We propose a new framework for malware classification using convolutional neural networks. We combined the ResNet34 convolutional neural network model with the model trained on the ImageNet dataset for transfer learning and compared the performance with other ResNet network models. This method does not require reverse analysis and can achieve a good training effect in a short time, effectively improve the accuracy of malware classification, and enhance the model’s generalization ability.
- Compared with the sample images’ interception or filling method, this paper used image interpolation algorithms to deal with the size of grayscale images and compared and analyzed the performance of different interpolation algorithms. Without loss of image information, it can effectively avoid the loss of feature information.
- The RGB images generated were processed by using the contrast limited adaptive histogram equalization (CLAHE) data enhancement method, which can better deal with the problem of data imbalance. At the same time, it can effectively limit noise amplification, expand the local contrast, and display more details of the smooth areas.
2. Related Work
2.1. Methods Based on Static Features
2.2. Methods Based on Dynamic Features
2.3. Methods Based on Visualization Techniques’ Features
3. Methodology
3.1. Image Representation of Malware
3.1.1. ASCII Images
3.1.2. Hexadecimal Images
3.1.3. Entropy Images
3.2. Interpolation Algorithms
3.2.1. Nearest Neighbor Interpolation Algorithm
3.2.2. Bilinear Interpolation Algorithm
3.2.3. Bicubic Interpolation Algorithm
3.3. Contrast Limited Adaptive Histogram Equalization
3.4. Transfer Learning
3.5. Convolution Neural Network Classification
4. Experimental Evaluation
4.1. Dataset Statistics and Preprocessing
4.2. Evaluation Indicators
4.3. Experimental Results
4.3.1. Comparison of Malware Classification Performance with Different Dataset Division Ratios
4.3.2. Comparison of Interpolation Algorithms for Classification Performance of Malware Samples
4.3.3. Comparison of Transfer Learning for Classification Performance of Malware Samples
4.3.4. Comparison of Image Enhancement for Malware Sample Classification Performance
4.3.5. Comparison of Grayscale and Multi-Channel Images for Classification Performance of Malware Samples
4.3.6. Comparison of Different Classification Models for Classification Performance of Malware Samples
4.4. Comparison with Other Methods
5. Conclusions and Prospects
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Shabtai, A.; Moskovitch, R.; Elovici, Y.; Glezer, C. Detection of malicious code by applying machine learning classifiers on static features: A state-of-the-art survey. Inf. Secur. Tech. Rep. 2009, 14, 16–29. [Google Scholar] [CrossRef]
- David, B.; Filiol, E.; Gallienne, K. Structural analysis of binary executable headers for malware detection optimization. J. Comput. Virol. Hacking Tech. 2017, 13, 87–93. [Google Scholar] [CrossRef]
- Yuxin, D.; Siyi, Z. Malware detection based on deep learning algorithm. Neural Comput. Appl. 2019, 31, 461–472. [Google Scholar] [CrossRef]
- Liu, Y.S.; Lai, Y.K.; Wang, Z.H.; Yan, H.B. A new learning approach to malware classification using discriminative feature extraction. IEEE Access 2019, 7, 13015–13023. [Google Scholar] [CrossRef]
- Darabian, H.; Dehghantanha, A.; Hashemi, S.; Homayoun, S.; Choo, K.K.R. An opcode-based technique for polymorphic Internet of Things malware detection. Concurr. Comput. Pract. Exp. 2020, 32, e5173. [Google Scholar] [CrossRef]
- San, C.C.; Thwin, M.M.S.; Htun, N.L. Malicious software family classification using machine learning multi-class classifiers. In Computational Science and Technology; Springer: Berlin/Heidelberg, Germany, 2019; pp. 423–433. [Google Scholar]
- Xiao, F.; Lin, Z.; Sun, Y.; Ma, Y. Malware detection based on deep learning of behavior graphs. Math. Probl. Eng. 2019, 2019, 8195395. [Google Scholar] [CrossRef] [Green Version]
- Ficco, M. Comparing API call sequence algorithms for malware detection. In Proceedings of the Workshops of the International Conference on Advanced Information Networking and Applications, Caserta, Italy, 15–17 April 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 847–856. [Google Scholar]
- Xu, Z.; Fang, X.; Yang, G. Malbert: A novel pre-training method for malware detection. Comput. Secur. 2021, 111, 102458. [Google Scholar] [CrossRef]
- Jian, Y.; Kuang, H.; Ren, C.; Ma, Z.; Wang, H. A novel framework for image-based malware detection with a deep neural network. Comput. Secur. 2021, 109, 102400. [Google Scholar] [CrossRef]
- Tekerek, A.; Yapici, M.M. A novel malware classification and augmentation model based on convolutional neural network. Comput. Secur. 2022, 112, 102515. [Google Scholar] [CrossRef]
- Kancherla, K.; Mukkamala, S. Image visualization based malware detection. In Proceedings of the 2013 IEEE Symposium on Computational Intelligence in Cyber Security (CICS), Singapore, 16–19 April 2013; pp. 40–44. [Google Scholar]
- Kancherla, K.; Donahue, J.; Mukkamala, S. Packer identification using Byte plot and Markov plot. J. Comput. Virol. Hacking Tech. 2016, 12, 101–111. [Google Scholar] [CrossRef]
- Rezende, E.; Ruppert, G.; Carvalho, T.; Theophilo, A.; Ramos, F.; Geus, P.d. Malicious software classification using VGG16 deep neural network’s bottleneck features. In Information Technology-New Generations; Springer: Berlin/Heidelberg, Germany, 2018; pp. 51–59. [Google Scholar]
- Zhao, Y.; Xu, C.; Bo, B.; Feng, Y. Maldeep: A deep learning classification framework against malware variants based on texture visualization. Secur. Commun. Netw. 2019, 2019, 4895984. [Google Scholar] [CrossRef]
- Ren, Z.; Chen, G.; Lu, W. Malware visualization methods based on deep convolution neural networks. Multimed. Tools Appl. 2020, 79, 10975–10993. [Google Scholar] [CrossRef]
- Khan, R.U.; Zhang, X.; Kumar, R. Analysis of ResNet and GoogleNet models for malware detection. J. Comput. Virol. Hacking Tech. 2019, 15, 29–37. [Google Scholar] [CrossRef]
- Qiao, Y.; Jiang, Q.; Jiang, Z.; Gu, L. A multi-channel visualization method for malware classification based on deep learning. In Proceedings of the 2019 18th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/13th IEEE International Conference on Big Data Science And Engineering (TrustCom/BigDataSE), Rotorua, New Zealand, 5–8 August 2019; pp. 757–762. [Google Scholar]
- Jang, S.; Li, S.; Sung, Y. Fasttext-based local feature visualization algorithm for merged image-based malware classification framework for cyber security and cyber defense. Mathematics 2020, 8, 460. [Google Scholar] [CrossRef] [Green Version]
- Narayanan, B.N.; Davuluru, V.S.P. Ensemble malware classification system using deep neural networks. Electronics 2020, 9, 721. [Google Scholar] [CrossRef]
- Yuan, B.; Wang, J.; Liu, D.; Guo, W.; Wu, P.; Bao, X. Byte-level malware classification based on markov images and deep learning. Comput. Secur. 2020, 92, 101740. [Google Scholar] [CrossRef]
- Pinhero, A.; Anupama, M.; Vinod, P.; Visaggio, C.A.; Aneesh, N.; Abhijith, S.; AnanthaKrishnan, S. Malware detection employed by visualization and deep neural network. Comput. Secur. 2021, 105, 102247. [Google Scholar] [CrossRef]
- Yadav, P.; Menon, N.; Ravi, V.; Vishvanathan, S.; Pham, T.D. EfficientNet convolutional neural networks-based Android malware detection. Comput. Secur. 2022, 115, 102622. [Google Scholar] [CrossRef]
- Ding, Y.; Dai, W.; Yan, S.; Zhang, Y. Control flow-based opcode behavior analysis for malware detection. Comput. Secur. 2014, 44, 65–74. [Google Scholar] [CrossRef]
- Shalaginov, A.; Banin, S.; Dehghantanha, A.; Franke, K. Machine Learning Aided Static Malware Analysis: A Survey and Tutorial; Cyber Threat Intelligence; Springer: Cham, Switzerland, 2018; pp. 7–45. [Google Scholar]
- Gibert, D.; Mateu, C.; Planes, J. HYDRA: A multimodal deep learning framework for malware classification. Comput. Secur. 2020, 95, 101873. [Google Scholar] [CrossRef]
- Wu, X.W.; Wang, Y.; Fang, Y.; Jia, P. Embedding vector generation based on function call graph for effective malware detection and classification. Neural Comput. Appl. 2022, 34, 8643–8656. [Google Scholar] [CrossRef]
- Kakisim, A.G.; Gulmez, S.; Sogukpinar, I. Sequential opcode embedding-based malware detection method. Comput. Electr. Eng. 2022, 98, 107703. [Google Scholar] [CrossRef]
- Bonfante, G.; Kaczmarek, M.; Marion, J.Y. Architecture of a morphological malware detector. J. Comput. Virol. 2009, 5, 263–270. [Google Scholar] [CrossRef] [Green Version]
- Christodorescu, M.; Jha, S.; Seshia, S.A.; Song, D.; Bryant, R.E. Semantics-aware malware detection. In Proceedings of the 2005 IEEE Symposium on Security and Privacy (S&P’05), Oakland, CA, USA, 8–11 May 2005; pp. 32–46. [Google Scholar]
- Bruschi, D.; Martignoni, L.; Monga, M. Detecting self-mutating malware using control-flow graph matching. In Proceedings of the International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, Berlin, Germany, 13–14 July 2006; Springer: Berlin/Heidelberg, Germany, 2006; pp. 129–143. [Google Scholar]
- Lin, C.H.; Pao, H.K.; Liao, J.W. Efficient dynamic malware analysis using virtual time control mechanics. Comput. Secur. 2018, 73, 359–373. [Google Scholar] [CrossRef]
- Sun, Y.; Bashir, A.K.; Tariq, U.; Xiao, F. Effective malware detection scheme based on classified behavior graph in IIoT. Ad Hoc Netw. 2021, 120, 102558. [Google Scholar] [CrossRef]
- Amer, E.; Zelinka, I.; El-Sappagh, S. A multi-perspective malware detection approach through behavioral fusion of api call sequence. Comput. Secur. 2021, 110, 102449. [Google Scholar] [CrossRef]
- Li, C.; Cheng, Z.; Zhu, H.; Wang, L.; Lv, Q.; Wang, Y.; Li, N.; Sun, D. DMalNet: Dynamic malware analysis based on API feature engineering and graph learning. Comput. Secur. 2022, 122, 102872. [Google Scholar] [CrossRef]
- Nataraj, L.; Jacob, G.; Manjunath, B. Detecting Packed Executables Based on Raw Binary Data; Technical Report; University of California: Santa Barbara, CA, USA, 2010. [Google Scholar]
- Liu, X.; Lin, Y.; Li, H.; Zhang, J. A novel method for malware detection on ML-based visualization technique. Comput. Secur. 2020, 89, 101682. [Google Scholar] [CrossRef]
- Ni, S.; Qian, Q.; Zhang, R. Malware identification using visualization images and deep learning. Comput. Secur. 2018, 77, 871–885. [Google Scholar] [CrossRef]
- Zhao, Z.; Zhao, D.; Li, S.; Yang, S. Malware classification based on visualization and feature fusion. In Proceedings of the 2021 IEEE Sixth International Conference on Data Science in Cyberspace (DSC), Shenzhen, China, 9–11 October 2021; pp. 53–60. [Google Scholar]
- Conti, M.; Khandhar, S.; Vinod, P. A few-shot malware classification approach for unknown family recognition using malware feature visualization. Comput. Secur. 2022, 122, 102887. [Google Scholar] [CrossRef]
- Vasan, D.; Alazab, M.; Wassan, S.; Naeem, H.; Safaei, B.; Zheng, Q. IMCFN: Image-based malware classification using fine-tuned convolutional neural network architecture. Comput. Netw. 2020, 171, 107138. [Google Scholar] [CrossRef]
- Chaganti, R.; Ravi, V.; Pham, T.D. Image-based malware representation approach with EfficientNet convolutional neural networks for effective malware classification. J. Inf. Secur. Appl. 2022, 69, 103306. [Google Scholar] [CrossRef]
- Fu, J.; Xue, J.; Wang, Y.; Liu, Z.; Shan, C. Malware visualization for fine-grained classification. IEEE Access 2018, 6, 14510–14523. [Google Scholar] [CrossRef]
- Gibert, D.; Mateu, C.; Planes, J. An end-to-end deep learning architecture for classification of malware’s binary content. In Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, 4–7 October 2018, Proceedings, Part III 27; Springer: Berlin/Heidelberg, Germany, 2018; pp. 383–391. [Google Scholar]
- Alaeiyan, M.; Dehghantanha, A.; Dargahi, T.; Conti, M.; Parsa, S. A multilabel fuzzy relevance clustering system for malware attack attribution in the edge layer of cyber-physical networks. ACM Trans. Cyber-Phys. Syst. 2020, 4, 1–22. [Google Scholar] [CrossRef] [Green Version]
- Zhu, X.; Huang, J.; Wang, B.; Qi, C. Malware homology determination using visualized images and feature fusion. PeerJ Comput. Sci. 2021, 7, e494. [Google Scholar] [CrossRef] [PubMed]
- Kumar, S.; Janet, B. DTMIC: Deep transfer learning for malware image classification. J. Inf. Secur. Appl. 2022, 64, 103063. [Google Scholar] [CrossRef]
File Size | Image Pixel | File Size | Image Pixel |
---|---|---|---|
<10 KB | 32 | 100 KB∼200 KB | 384 |
10 KB∼30 KB | 64 | 200 KB∼500 KB | 612 |
30 KB∼60 KB | 128 | 500 KB∼1000 KB | 864 |
60 KB∼100 KB | 256 | >1000 KB | 1024 |
Class ID | Family | Samples | Percentage |
---|---|---|---|
1 | Ramnit | 1541 | 0.1418 |
2 | Lollipop | 2478 | 0.2280 |
3 | Kelihos_Ver3 | 2942 | 0.2707 |
4 | Vundo | 475 | 0.0437 |
5 | Simda | 42 | 0.0039 |
6 | Tracur | 751 | 0.0691 |
7 | Kelihos_Ver1 | 398 | 0.0366 |
8 | Obfuscator.ACY | 1228 | 0.1130 |
9 | Gatak | 1013 | 0.0932 |
Proportion | Accuracy | Precision | Recall | F-Score |
---|---|---|---|---|
9:1 | 0.9999 | 0.9934 | 0.9936 | 0.9935 |
7.5:2.5 | 0.9959 | 0.9917 | 0.9911 | 0.9914 |
6:4 | 0.9921 | 0.9908 | 0.9928 | 0.9918 |
5:5 | 0.9908 | 0.9905 | 0.9897 | 0.9901 |
Accuracy | Micro Avg | Macro Avg | |
---|---|---|---|
Proposed | 0.9999 | 0.9935 | 0.9935 |
No-enhanced | 0.9989 | 0.9890 | 0.9898 |
Accuracy | Average Type | Precision | Recall | F-Score | |
---|---|---|---|---|---|
Proposed | 0.9999 | micro | 0.9933 | 0.9937 | 0.9935 |
macro | 0.9935 | 0.9935 | 0.9935 | ||
ResNet18 | 0.9897 | micro | 0.9837 | 0.9768 | 0.9801 |
macro | 0.9824 | 0.9824 | 0.9824 | ||
ResNet34 | 0.9912 | micro | 0.9653 | 0.9719 | 0.9685 |
macro | 0.9738 | 0.9821 | 0.9779 | ||
ResNet50 | 0.9874 | micro | 0.9547 | 0.9671 | 0.9608 |
macro | 0.9612 | 0.9612 | 0.9612 | ||
ResNet101 | 0.9741 | micro | 0.9555 | 0.9602 | 0.9578 |
macro | 0.9531 | 0.9508 | 0.9519 | ||
ResNet50_32x4d | 0.9841 | micro | 0.9607 | 0.9587 | 0.9596 |
macro | 0.9528 | 0.9536 | 0.9531 | ||
ResNet101_32x8d | 0.9695 | micro | 0.9571 | 0.9529 | 0.9545 |
macro | 0.9526 | 0.9571 | 0.9548 |
Method | Accuracy | F-Score | |
---|---|---|---|
Gibert et al. [44] | Denoising autoencoders + dilated residual networks | 0.9894 | 0.9813 |
Alaeiyan et al. [45] | Multi-label fuzzy clustering | 0.9756 | 0.8921 |
Zhu et al. [46] | Opcode + bytecode | 0.9905 | 0.9852 |
Kumar et al. [47] | CNN + transfer learning + early stopping | 0.9319 | ∖ |
This paper | Proposed method | 0.9999 | 0.9935 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Z.; Yang, S.; Zhao, D. A New Framework for Visual Classification of Multi-Channel Malware Based on Transfer Learning. Appl. Sci. 2023, 13, 2484. https://doi.org/10.3390/app13042484
Zhao Z, Yang S, Zhao D. A New Framework for Visual Classification of Multi-Channel Malware Based on Transfer Learning. Applied Sciences. 2023; 13(4):2484. https://doi.org/10.3390/app13042484
Chicago/Turabian StyleZhao, Zilin, Shumian Yang, and Dawei Zhao. 2023. "A New Framework for Visual Classification of Multi-Channel Malware Based on Transfer Learning" Applied Sciences 13, no. 4: 2484. https://doi.org/10.3390/app13042484
APA StyleZhao, Z., Yang, S., & Zhao, D. (2023). A New Framework for Visual Classification of Multi-Channel Malware Based on Transfer Learning. Applied Sciences, 13(4), 2484. https://doi.org/10.3390/app13042484