Deep Learning Model with Sequential Features for Malware Classification
Abstract
:1. Introduction
- A malware detection and classification method (TCN-BiGRU) that fuses the temporal convolutional network and the bidirectional gated recurrent unit was proposed to improve the overall performance of the malware detection and classification model.
- Opcode and bytecode sequences were fused to obtain their occurrence frequencies, reduce interference from shelling and obfuscation techniques, and improve the accuracy rate.
- The feature extraction capability of temporal convolutional networks (TCN) for temporal data was introduced to fully learn the dependency relationship among data.
- The output of the maximum pooling layer and the output of the average pooling layer were fused for relatively comprehensive extraction of data features.
- The nonlinear fitting ability of a bidirectional gated recurrent unit (BiGRU) was used, and further feature extraction was conducted to learn the dependency of the before and after information in the opcode sequence, extracting the opcode features based on the time series to improve the model classification detection effect.
2. Related Technology
2.1. N-Gram Method
2.2. Temporal Convolutional Network (TCN)
- (a)
- Causal convolution: The output value for any moment t is related to the input only before moment t and the previous layer [18]. While traditional CNN networks can see future information, causal convolution can only see past information; it is causally consequent, so causal convolution has very strict temporal constraints and is a one-way structure. When the number of convolutional kernels is 4, a single causal convolutional structure is shown in the left panel of Figure 1, and the overall structure is shown in the right panel of Figure 1. A convolution kernel of 4 means that four points are selected from the previous layer for sampling input to the next layer.
- (b)
- Dilated convolution: With the gradual increase in the number of dilated convolution layers, the dilation coefficient exponentially increases, and the increase in the range of the receptive field of each layer reduces the number of convolution layers to reduce computational effort and simplify the network structure. To address the problems of traditional neural networks that require the linear stacking of multiple layers of convolution to extend the model of time series, TCN achieves a reduction in the number of convolutional layers by increasing the range of the receptive field of each layer by using dilated convolution [19], with a convolutional kernel of 4 and a dilation coefficient of 1, as shown in Figure 2. When the dilation coefficient of the input layer is 1, the samples in this model are sampled from the previous layer at an interval of 1 and input to the next layer.
- (c)
- Residual block: This is another important network structure in the TCN network. The residual block, shown in Figure 3, contains two layers of dilated causal convolution and nonlinear mapping. It has a constant mapping method of connection across layers, which enables the network to transfer information through a connection across layers. Through skip connect, it can not only speed up the response and convergence of the deep-level network but also solve the problem of too slow learning due to overly complex network hierarchical overlay structure. Dropout and batch normalization are also added to prevent model overfitting and speed up training [20].
2.3. Bidirectional Gated Recurrent Unit (BiGRU)
3. Malware Classification Method Based on Sequence Features and Deep Learning
3.1. Features Extraction
- (1)
- Malware opcode features
- (2)
- Malware bytecode features
Algorithm 1: The hex file is converted to a sequence of decimal values within [0, 256). |
Input: hexadecimal file; |
Output: a one-dimensional vector-matrix representation of file byte sequence. |
1. function getMatrixfrom(file) |
2. f = open(file,“rb”); /*read the file in binary */ |
3. /*convert binary file to a hexadecimal string */ |
4. /*convert the string to an unsigned decimal number by byte division into a byte*/ |
5. return byte; |
6. end function |
3.2. Feature Pre-Processing
3.3. Combine TCN and BiGRU for Feature Extraction
- Input layer: processed malicious code opcode feature data and shape (total number of samples, time step, and feature dimension).
- Time series convolutional network layer: the feature vectors were extracted via TCN, and the residual units were set up in two layers. A residual unit consisted of two convolutional units and one nonlinear mapping, and the convolutional kernel weights were normalized. The residual unit in Figure 8 was used only as the input layer to the hidden layer; the same was true for the hidden layer to the output layer. The convolution kernel size value was 4, and the dilation coefficient was (1, 2). Dropout was added to prevent overfitting in training.
- The different features extracted from the average pooling layer, as well as the maximum pooling layer, were fused as pooling outputs. We merged the average with the maximum pooling layer.
- The combined pooling layer consisted of a maximum pooling and an average pooling layer, each of which was calculated as shown in Equation (6). Maximum pooling and average pooling were obtained by traversing the pooling window with the input from the previous layer of the network. The pooled maximum and average values were then summed and passed to the next layer of the model structure.
- e.
- Bidirectional gated recurrent unit layer: The figure shows the structure of the GRU unit when it had two layers. The output vector of the TCN model was first used as the input of the GRU to extract the long-term correlation in the time series. Then the data were output with the results obtained from two layers of BiGRU.
- f.
- Output layer: Output the result of the last moment of the BiGRU to the classification layer.
3.4. Classification Output Layer
4. Experiments and Analysis of Results
4.1. Experimental Setup
4.2. Experimental Environment and Data Set
4.3. Experimental Evaluation Criteria
4.4. Feature Selection Experiments
4.5. TCN-BiGRU Model Performance Analysis Experiments
4.6. Model Ablation Experiments
4.7. Comparison Experiments of Different Pooling Methods
4.8. Comparison Experiments for Classification Algorithms
Model | No Pooling | Average Pooling |
---|---|---|
One-class SVM [26] | Opcode + Grayscale map | 92% |
PCA and kNN [27] | Grayscale map | 96.6% |
Strand Gene Sequence [28] | Asm sequence | 98.59% |
Orthrus [10] | Byte + Opcode | 99.24% |
MalNet [11] | Opcode + Grayscale map | 99.36% |
Model in this paper | Opcode + Byte | 99.72% |
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhao, J.; Zhang, S.; Liu, B.; Cui, B. Malware detection using machine learning based on the combination of dynamic and static features. In Proceedings of the 27th International Conference on Computer Communication and Networks (ICCCN), Hangzhou, China, 11 October 2018. [Google Scholar]
- Guo, H.; Wu, J.T.; Huang, S.G.; Pan, Z.L.; Shi, F.; Yan, Z.H. Research on malware detection based on vector features of assembly instructions. Inf. Secur. Res. 2020, 6, 113–121. [Google Scholar]
- Raff, E.; Sylvester, J.; Nicholas, C. Learning the pe header, malware detection with minimal domain knowledge. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security; Association for Computing Machinery: New York, NY, USA, 2017; pp. 121–132. [Google Scholar]
- Zhao, S.; Ma, X.; Zou, W.; Bai, B. DeepCG: Classifying metamorphic malware through deep learning of call graphs. In Proceedings of the International Conference on Security and Privacy in Communication Systems; Springer: Berlin, Germany, 2019; pp. 171–190. [Google Scholar]
- Santos, I.; Brezo, F.; Ugarte-Pedrero, X.; Bringas, P.G. Opcode sequences as representation of executables for data-mining-based unknown malware detection. Inf. Sci. 2013, 227, 64–82. [Google Scholar] [CrossRef]
- Kang, B.; Yerima, S.Y.; McLaughlin, K.; Sezer, S. N-opcode Analysis for Android Malware Classification and Categorization. In Proceedings of the 2016 International Conference on Cyber Security and Protection of Digital Services (Cyber Security), London, UK, 9 July 2016. [Google Scholar]
- Pascanu, R.; Stokes, J.W.; Sanossian, H.; Marinescu, M.; Thomas, A. Malware classification with recurrent networks. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 19–24 April 2015; pp. 1916–1920. [Google Scholar]
- Kwon, I.; Im, E.G. Extracting the Representative API Call Patterns of Malware Families Using Recurrent Neural Network. In Proceedings of the International Conference on Research in Adaptive and Convergent Systems; Association for Computing Machinery: New York, NY, USA, 2017; pp. 202–207. [Google Scholar]
- Messay-Kebede, T.; Narayanan, B.N.; Djaneye-Boundjou, O. Combination of Traditional and Deep Learning based Architectures to Overcome Class Imbalance and its Application to Malware Classification. In Proceedings of the NAECON 2018-IEEE National Aerospace and Electronics Conference, Dayton, OH, USA, 23–26 July 2018; pp. 73–77. [Google Scholar]
- Gibert, D.; Mateu, C.; Planes, J. Orthrus: A Bimodal Learning Architecture for Malware Classification. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
- Yan, J.; Qi, Y.; Rao, Q. Detecting malware with an ensemble method based on deep neural network. Secur. Commun. Netw. 2018, 2018, 7247095. [Google Scholar] [CrossRef] [Green Version]
- Narayanan, B.N.; Davuluru, V.S.P. Ensemble Malware Classification System Using Deep Neural Networks. Electronics 2021, 9, 721. [Google Scholar] [CrossRef]
- Ahmadi, M.; Ulyanov, D.; Semenov, S.; Trofimov, M.; Giacinto, G. Novel Feature Extraction, Selection and Fusion for Effective Malware Family Classification. In Proceedings of the 6th ACM Conference on Data and Application Security and Privacy; Association for Computing Machinery: New York, NY, USA, 2016; pp. 183–194. [Google Scholar]
- Zhang, Y.; Huang, Q.; Ma, X.; Yang, Z.; Jiang, J. Using Multi-features and Ensemble Learning Method for Imbalanced Malware Classification. In Proceedings of the 2016 IEEE Trustcom/BigDataSE/ISPA, Tianjin, China, 23–26 August 2016; pp. 965–973. [Google Scholar]
- Bai, J.R.; Wang, J.F. Improving malware detection using multiview ensemble learning. Secur. Commun. Netw. 2016, 9, 4227–4241. [Google Scholar] [CrossRef] [Green Version]
- Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
- Fan, Y.Y.; Li, C.J.; Yi, Q.; Li, B.Q. Classification of Field Moving Targets Based on Improved TCN Network. Comput. Eng. 2021, 47, 106–112. [Google Scholar]
- Yating, G.; Wu, W.; Qiongbin, L.; Fenghuang, C.; Qinqin, C. Fault Diagn-osis for Power Converters Based on Optimized Temporal Convolutional Network. IEEE Trans. Instrum. Meas. 2020, 70, 1–10. [Google Scholar] [CrossRef]
- Huang, Q.; Hain, T. Improving Audio Anomalies Recognition Using Temporal Convolutional Attention Network. In Proceedings of the ICASSP 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 6473–6477. [Google Scholar]
- Zhu, R.; Liao, W.; Wang, Y. Short-term prediction for wind power based on temporal convolutional network. Energy Rep. 2020, 6, 424–429. [Google Scholar] [CrossRef]
- Xu, Z.; Zeng, W.; Chu, X.; Cao, P. Multi-Aircraft Trajectory Collaborative Prediction Based on Social Long Short-Term Memory Network. Aerospace 2021, 8, 115. [Google Scholar] [CrossRef]
- Liu, Y.; Ma, J.; Tao, Y.; Shi, L.; Wei, L.; Li, L. Hybrid Neural Network Text Classification Combining TCN and GRU. In Proceedings of the 2020 IEEE 23rd International Conference on Computational Science and Engineering (CSE), Guangzhou, China, 29 December–1 January 2020; pp. 30–35. [Google Scholar]
- Sun, Y.C.; Tian, R.L.; Wang, X.F. Emitter signal recognition based on improved CLDNN. Syst. Eng. Electron. 2021, 43, 42–47. [Google Scholar]
- Wang, Y.; Liao, W.L.; Chang, Y.Q. Gated Recurrent Unit Network-Based Short-Term Photovo-ltaic Forecasting. Energies 2018, 11, 2163. [Google Scholar] [CrossRef] [Green Version]
- Qi An Xin Technology Research Institute. DataCon: Multidomain Large-Scale Competition Open Data for Security Research. Available online: https://datacon.qianxin.com/opendata (accessed on 11 November 2021). (In Chinese).
- Burnaev, E.; Smolyakov, D. One-class SVM with privileged information and its application to malware detection. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), Barcelona, Spain, 12–15 December 2016; pp. 273–280. [Google Scholar]
- Narayanan, B.N.; Djaneye-Boundjou, O.; Kebede, T.M. Performance analysis of machine learning and pattern recognition algorithms for malware classification. In Proceedings of the 2016 IEEE National Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS), Dayton, OH, USA, 25–29 July 2016; pp. 338–342. [Google Scholar]
- Drew, J.; Hahsler, M.; Moore, T. Polymorphic malware detection using sequence classifcation methods and ensembles: BioSTAR 2016 Recommended Submission. EURASIP J. Inf. Secur. 2017, 2017, 2. [Google Scholar] [CrossRef]
- Guo, H.; Huang, S.; Zhang, M. Classification of malware variant based on ensemble learning. In International Conference on Machine Learning for Cyber Security; Springer: Cham, Switzerland, 2020; pp. 125–139. [Google Scholar]
- Saadat, S.; Joseph Raymond, V. Malware classification using CNN-Xgboost model. In Artificial Intelligence Techniques for Advanced Computing Applications; Springer: Singapore, 2021; pp. 191–202. [Google Scholar]
- Liu, Y.; Wang, Z.; Hou, Y. A method for feature extraction of malicious code based on probabilistic topic models. J. Compute. Res. Dev. 2019, 56, 2339–2348. [Google Scholar]
Category | Operation Codes |
---|---|
Data move | mov, movzx, push, pop, lea, xchg |
Arithmetic/logic | add, sub, inc, dec, imul, or, xor, shl, shr, ror, rol |
Control flow | jmp, jz, cmp, jnb, call, retf, retn |
Other | nop |
Location | Real Label | |
---|---|---|
For Malware | For Not Malware | |
Malware | TP | FP |
Not malware | FN | TN |
Model Parameter | Real Label |
---|---|
Batch size setting | 64 |
Optimizer | Adamax |
Optimizer learning rate | 0.002 |
Epoch setting | 50 |
Number of TCN filters | 7 |
Number of TCN convolution kernels | 4 |
TCN dilation coefficient | (1, 2) cc |
Number of BiGRU units | 32\32 |
Dropout rate | 0.2 |
Malicious Code Family | Precision | Recall | F1-Score |
---|---|---|---|
1 | 0.99 | 1.00 | 1.00 |
2 | 1.00 | 1.00 | 1.00 |
3 | 0.99 | 1.00 | 1.00 |
4 | 1.00 | 1.00 | 1.00 |
5 | 1.00 | 1.00 | 1.00 |
6 | 0.99 | 1.00 | 0.99 |
7 | 0.94 | 1.00 | 0.97 |
8 | 0.99 | 0.99 | 0.99 |
9 | 0.99 | 0.99 | 0.99 |
accuracy | - | - | 0.99 |
Overall | 99.55% | 99.54% | 99.54% |
Malicious Code Family | Precision | Recall | F1-Score |
---|---|---|---|
0 | 0.94 | 0.93 | 0.93 |
1 | 0.96 | 0.97 | 0.96 |
accuracy | - | - | 0.95 |
Overall | 96.37% | 96.63% | 96.50% |
Model | Dataset | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|
TCN | 9-class-data | 99.36% | 99.37% | 99.36% | 99.36% |
2-class-Datacon | 94.62% | 95.98% | 95.81% | 95.89% | |
GRU | 9-class-data | 99.36% | 99.29% | 99.35% | 99.32% |
2-class-Datacon | 95.7% | 95.8% | 95.72% | 95.76% | |
TCN-GRU | 9-class-data | 99.54% | 99.46% | 99.54% | 99.50% |
2-class-Datacon | 95.52% | 95.62% | 95.63% | 95.62% | |
TCN-BiGRU | 9-class-data | 99.72% | 99.55% | 99.54% | 99.54% |
2-class-Datacon | 96.54% | 96.37% | 96.63% | 96.50% |
Dataset | No Pooling | Average Pooling | Maximum Pooling | Pooling Fusion |
---|---|---|---|---|
9-class-data | 99.45% | 99.54% | 99.45% | 99.72% |
2-class-Datacon | 94.92% | 95.10% | 95.28% | 96.54% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wu, X.; Song, Y.; Hou, X.; Ma, Z.; Chen, C. Deep Learning Model with Sequential Features for Malware Classification. Appl. Sci. 2022, 12, 9994. https://doi.org/10.3390/app12199994
Wu X, Song Y, Hou X, Ma Z, Chen C. Deep Learning Model with Sequential Features for Malware Classification. Applied Sciences. 2022; 12(19):9994. https://doi.org/10.3390/app12199994
Chicago/Turabian StyleWu, Xuan, Yafei Song, Xiaoyi Hou, Zexuan Ma, and Chen Chen. 2022. "Deep Learning Model with Sequential Features for Malware Classification" Applied Sciences 12, no. 19: 9994. https://doi.org/10.3390/app12199994
APA StyleWu, X., Song, Y., Hou, X., Ma, Z., & Chen, C. (2022). Deep Learning Model with Sequential Features for Malware Classification. Applied Sciences, 12(19), 9994. https://doi.org/10.3390/app12199994