Android Malware Detection Using TCN with Bytecode Image
Abstract
:1. Introduction
- We discuss the effects of different texture image combinations of AndroidManifest.xml, classes.dex and the classes.dex data section on Android malware detection through experiments. The experimental results show that the fusion of AndroidManifest.xml and the classes.dex data section is the most accurate texture image feature for the detection model;
- We propose a new Android malware detection model by applying the time series convolution neural network to detect the bytecode image of malware for the first time. Compared with the convolution mode of traditional two-dimensional CNN and lightweight MobileNetV2, TCN based on dilated convolution and residual connection can make better use of bytecode sequence features, reduce computation and improve detection efficiency.
2. Related Work
3. Methodology
3.1. Overview
- Extract XML and DEX and processing:We extract AndroidManifest.xml and classes.dex from APK, decompiling the data section of classes.dex by using DEXparser. We save the DEX file data section as a data_section.dex file. Then, we merge AndroidManifest.xml and data_section.dex by reading them together as binary data;
- Make gray image dataset:The merged binary data is read as 8-bit vectors, then it is converted into gray-scale images with a unified size of 28 × 28. After being stored in the IDX format, the image data set is divided into a training set and test set, according to the ratio of 8:2;
- Classification:Finally, the classification module inputs the data set into our TCN model. We classify and verify the data set by using the tenfold crossvalidation method, and obtain the detection result by the Softmax method. Figure 1 shows an overview of our method.
3.2. APK File Structure
3.3. Visualization of Android APK File
Algorithm 1 Gray image generation algorithm |
Input: Binary file XML(AmdroidManifest.xml) and dexData(data_setcion.dex); Output: xml_dexData(gray_image);
|
3.4. Experiment Features
4. Malware Detection Introduction
4.1. CNN Model
4.2. MobileNetV2 Model
4.2.1. Core Architecture of MobileNetV2
4.2.2. MobileNetV2 Network Architecture
4.3. TCN Model
4.3.1. Dilated Convolution
4.3.2. Residual Connection
5. Experiment
5.1. Experiment Conditions
5.2. Dataset
5.3. Evaluation Metrics
5.4. Model Training and Result Analysis
5.4.1. CNN (Base Model)
5.4.2. MobileNetV2
5.4.3. TCN
5.4.4. Result and Analysis
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
CNN | Convolution Neural Network |
TCN | Temporal Convolution Network |
dex | DEX dataset |
xml_dex | The dataset combining AndroidManifest.xml and classes.dex |
dexData | dexData dataset |
xml_dexData | The dataset combining AndroidManifest.xml and data_section.dex |
TP | True Positives |
TN | True Negatives |
FP | False Positives |
FN | False Negatives |
References
- National Internet Emergency Center. Overview of China’s Internet Network Security Situation in 2019. Available online: https://www.cert.org.cn/publish/main/46/2020/20200811124544754595627/20200811124544754595627_.html (accessed on 1 October 2020).
- Google Play Protect. 2018. Android. Available online: https://www.android.com/play-protect/ (accessed on 15 August 2020).
- Android’s Built-In Google Play Protect Protection Is Useless. Available online: https://www.cnbeta.com/articles/tech/759727.htm (accessed on 20 August 2020).
- Naway, A.; Li, Y. A Review on The Use of Deep Learning in Android Malware Detection. arXiv 2020, arXiv:1812.10360. [Google Scholar]
- Ganesh, M.; Pednekar, P.; Prabhuswamy, P.; Nair, D.S.; Park, Y.; Jeon, H. CNN-based android malware detection. In Proceedings of the 2017 International Conference on Software Security and Assurance (ICSSA), Altoona, PA, USA, 24–25 July 2017; pp. 60–65. [Google Scholar]
- Ding, Y.; Zhao, W.; Wang, Z.; Wang, L. Automaticlly Learning Featurs Of Android Apps Using CNN. In Proceedings of the 2018 International Conference on Machine Learning and Cybernetics (ICMLC), Chengdu, China, 15–18 July 2018; pp. 331–336. [Google Scholar]
- McLaughlin, N.; del Rincon, J.M.; Kang, B.; Yerima, S.; Miller, P.; Sezer, S.; Safaei, Y.; Trickel, E.; Zhao, Z.; Doupé, A.; et al. Deep Android Malware Detection. In Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy—CODASPY ’17, Scottsdale, AZ, USA, 22–24 March 2017; pp. 301–308. [Google Scholar]
- Salah, A.; Shalabi, E.; Khedr, W. A Lightweight Android Malware Classifier Using Novel Feature Selection Methods. Symmetry 2020, 12, 858. [Google Scholar] [CrossRef]
- Wang, X.; Yang, Y.; Zeng, Y. Accurate mobile malware detection and classification in the cloud. Springerplus 2015, 4, 1–23. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Afonso, V.M.; de Amorim, M.F.; Grégio, A.R.A.; Junquera, G.B.; de Geus, P.L. Identifying Android malware using dynamically obtained features. J. Comput. Virol. Hack. Tech. 2015, 11, 9–17. [Google Scholar] [CrossRef]
- Bagheri, H.; Sadeghi, A.; Jabbarvand, R.; Malek, S. Practical, Formal Synthesis and Automatic Enforcement of Security Policies for Android. In Proceedings of the 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Toulouse, France, 28 June–1 July 2016; pp. 514–525. [Google Scholar]
- Arshad, S.; Shah, M.A.; Wahid, A.; Mehmood, A.; Song, H.; Yu, H. SAMADroid: A novel 3-level hybrid malware detection model for Android operating system. IEEE Access 2018, 6, 4321–4339. [Google Scholar] [CrossRef]
- Kouliaridis, V.; Kambourakis, G.; Geneiatakis, D.; Potha, N. Two Anatomists Are Better than One—Dual-Level Android Malware Detection. Symmetry 2020, 12, 1128. [Google Scholar] [CrossRef]
- Spreitzenbarth, M.; Schreck, T.; Echtler, F.; Arp, D.; Hoffmann, J. Mobile-sandbox: Combining static and dynamic analysis with machine-learning techniques. Int. J. Inf. Secur. 2015, 14, 141–153. [Google Scholar] [CrossRef]
- Manzhi, Y.; Qiaoyan, W. Detecting android malware by applying classification techniques on images patterns. In Proceedings of the 2017 IEEE 2nd International Conference on Cloud Computing and Big Data Analysis (ICCCBDA), Chengdu, China, 28–30 April 2017; p. 344347. [Google Scholar]
- Orralba, A.; Murphy, K.P.; Freeman, W.T.; Rubin, M.A. Context-based vision systems for place and object recognition. In Proceedings of the International Conference on Computer Vision (ICCV), Nice, France, 13–16 October 2003. [Google Scholar]
- Oliva, A.; Torralba, A. Modeling the shape of a scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 2001, 42, 145–175. [Google Scholar] [CrossRef]
- Xiao, X. An Image-Inspired and CNN-Based Android Malware Detection Approach. In Proceedings of the 2019 34th IEEEACM International Conference on Automated Software Engineering (ASE), San Diego, CA, USA, 11–15 November 2019; pp. 1259–1261. [Google Scholar]
- Radanliev, P.; De Roure, D.C.; Nurse, J.R.C.; Montalvo, R.M.; Cannady, S.; Santos, O.; Maddox, L.; Burnap, P.; Maple, C. Future developments in standardisation of cyber risk in the Internet of Things (IoT). SN Appl. Sci. 2020, 2, 169. [Google Scholar] [CrossRef] [Green Version]
- Dexparser (Pil Fork). Available online: https://pypi.org/project/dexparser/0.0.1/ (accessed on 20 November 2020).
- Nataraj, L.; Karthikeyan, S.; Jacob, G.; Manjunath, B. Malware images: Visualization and automatic classification. In Proceedings of the 8th International Symposium on Visualization for Cyber Security, VizSec’11, Pittsburgh, PA, USA, 20 July 2011. [Google Scholar]
- Jung, D.-S.; Lee, S.-J.; Euom, I.-C. ImageDetox: Method for the Neutralization of Malicious Code Hidden in Image Files. Symmetry 2020, 12, 1621. [Google Scholar] [CrossRef]
- Kumar, A.; Sagar, K.P.; Kuppusamy, K.S.; Aghila, G. Machine learning based malware classification for Android applications using multimodal image representations. In Proceedings of the 2016 10th International Conference on Intelligent Systems and Control (ISCO), Coimbatore, India, 7–8 January 2016; p. 16. [Google Scholar]
- Darus, F.M.; Ahmad, S.N.A.; Ariffin, A.F.M. Android Malware Detection Using Machine Learning on Image Patterns. In Proceedings of the 2018 Cyber Resilience Conference (CRC), Putrajaya, Malaysia, 13–15 November 2018; p. 12. [Google Scholar]
- Huang, T.H.; Kao, H. R2-D2: ColoR-inspired Convolutional NeuRal Network (CNN)-based AndroiD Malware Detections. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December 2018; pp. 2633–2642. [Google Scholar]
- Jung, I.; Choi, J.; Cho, S.; Han, S.; Park, M.; Hwang, Y.-S. Android malware detection using convolutional neural networks and data section images. In Proceedings of the 2018 Conference on Research in Adaptive and Convergent Systems, Honolulu, HI, USA, 9–12 October 2018; pp. 149–153. [Google Scholar]
- Pillow (Pil Fork). Available online: https://pillow.readthedocs.io/en/stable/index.html (accessed on 1 October 2020).
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Bai, S.; Kolter, J.Z. Vladlen Koltun: An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
- Python. Available online: https://www.python.org/ (accessed on 20 January 2020).
- TensorFlow. Available online: https://www.tensorflow.org/ (accessed on 20 January 2020).
- Keras. Available online: https://keras.io/ (accessed on 1 January 2020).
- Canandian Institute for Cybersecurity. Available online: https://www.unb.ca/cic/datasets (accessed on 20 January 2020).
- Ding, Y.; Zhang, X.; Hu, J.; Xu, W. Android malware detection method based on bytecode image. J. Ambient. Intell. Human Comput. 2020. [Google Scholar] [CrossRef]
File or Directories | Function Description |
---|---|
assets | Store static files that need to be packaged into APK |
META-INF | Store application signatures and certificates to ensure the integrity of APK packages and system security |
res | All the resource files needed by the APK |
libs | The library folder |
AndroidManifesst.xml | The configuration file of the application program, which includes the name, version number, required permissions, registration service, linked other applications, declaration of the four components, and call information of the application program |
classes.dex | The runnable file on the Dalvik virtual machine, which contains all the operating instructions of the application and runtime data |
resources.ars | The compiled binary resource files |
File Size Range | Image Width |
---|---|
<10 KB | 32 |
10 KB~30 KB | 64 |
30 KB~60 KB | 128 |
100 KB~200 KB | 384 |
200 KB~500 KB | 512 |
500 KB~1000 KB | 764 |
>1000 KB | 1024 |
Input | Operator | t | c | n | s |
---|---|---|---|---|---|
Conv2d | - | 32 | 1 | 2 | |
Bottleneck | 1 | 16 | 1 | 1 | |
Bottleneck | 6 | 24 | 2 | 2 | |
Bottleneck | 6 | 32 | 3 | 2 | |
Bottleneck | 6 | 64 | 4 | 2 | |
Bottleneck | 6 | 96 | 3 | 1 | |
Bottleneck | 6 | 160 | 3 | 2 | |
Conv2d | - | 1280 | 1 | 1 | |
Avgpool | - | - | 1 | 1 | |
Conv2d | - | k | - | - |
Input | Operator | Output |
---|---|---|
Conv2d | ||
ReLU6 | ||
Dwise s = s | ||
ReLU6 | ||
Linear |
Samples | Type and Number | Source |
---|---|---|
Adware (1104) | ||
Ransomware (101) | CICAndMal2017 | |
malware (5826) | Scareware (112) | CICInvesAndMal2019 |
SMSmalware (1680) | CICMalDroid2020 | |
Banking (1190) | ||
benign (5687) | Google Play (5687) |
Predict | Benign | Malware | |
---|---|---|---|
Actual | |||
benign | TP | FN | |
malware | FP | TN |
Data Set | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
dex | 0.9404 | 0.9407 | 0.9403 | 0.9404 |
xml_dex | 0.9411 | 0.9411 | 0.9410 | 0.9410 |
dexData | 0.9457 | 0.9459 | 0.9457 | 0.9456 |
xml_dexData | 0.9470 | 0.9474 | 0.9470 | 0.9470 |
Data Set | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
dex | 0.9484 | 0.9484 | 0.9483 | 0.9483 |
xml_dex | 0.9489 | 0.9490 | 0.9490 | 0.9489 |
dexData | 0.9482 | 0.9483 | 0.9482 | 0.9482 |
xml_dexData | 0.9506 | 0.9506 | 0.9505 | 0.9505 |
Data Set | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
dex | 0.9499 | 0.9499 | 0.9499 | 0.9499 |
xml_dex | 0.9524 | 0.9526 | 0.9524 | 0.9524 |
dexData | 0.9517 | 0.9518 | 0.9518 | 0.9517 |
xml_dexData | 0.9544 | 0.9545 | 0.9545 | 0.9544 |
Method | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
CNN | 0.9470 | 0.9474 | 0.9470 | 0.9470 |
MobileNetV2 | 0.9506 | 0.9506 | 0.9505 | 0.9505 |
TCN | 0.9544 | 0.9545 | 0.9545 | 0.9544 |
Model | Parameters | Training Times/Epochs | Accuracy |
---|---|---|---|
CNN | 1,198,850 | 9.63 s | 0.9470 |
MobileNetV2 | 2,278,210 | 44.76 s | 0.9506 |
TCN | 19,538 | 1.08 s | 0.9544 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, W.; Luktarhan, N.; Ding, C.; Lu, B. Android Malware Detection Using TCN with Bytecode Image. Symmetry 2021, 13, 1107. https://doi.org/10.3390/sym13071107
Zhang W, Luktarhan N, Ding C, Lu B. Android Malware Detection Using TCN with Bytecode Image. Symmetry. 2021; 13(7):1107. https://doi.org/10.3390/sym13071107
Chicago/Turabian StyleZhang, Wenhui, Nurbol Luktarhan, Chao Ding, and Bei Lu. 2021. "Android Malware Detection Using TCN with Bytecode Image" Symmetry 13, no. 7: 1107. https://doi.org/10.3390/sym13071107
APA StyleZhang, W., Luktarhan, N., Ding, C., & Lu, B. (2021). Android Malware Detection Using TCN with Bytecode Image. Symmetry, 13(7), 1107. https://doi.org/10.3390/sym13071107