Android Mobile Malware Detection Using Machine Learning: A Systematic Review
Abstract
:1. Introduction
2. Background
2.1. Android Architecture
Built-In Security
2.2. Threats to Android
2.2.1. Malware Attacks on Android
2.2.2. Users and App Developers’ Mistakes
2.3. Machine Learning Process
3. Methodology
3.1. Research Questions
- RQ1:
- What are the existing reviews conducted in ML/DL based models to detect Android malware and source code vulnerabilities?
- RQ2:
- What are code/APK analysing methods that can be used in malware analysis?
- RQ3:
- What are the ML/DL based methods that can be used to detect malware in Android?
- RQ4:
- What are the accuracy, strengths, and limitations of the proposed models related to Android malware detection?
- RQ5:
- Which techniques can be used to analyse Android source code to detect vulnerabilities?
3.2. Search Strategy
3.3. Study Selection Criteria
3.4. Data Extraction and Synthesis
3.5. Threats to Validity of the Review
4. Related Work
5. Machine Learning to Detect Android Malware
5.1. Static, Dynamic, and Hybrid Analysis
5.2. Static Analysis with Machine Learning
5.2.1. Manifest Based Static Analysis with ML
5.2.2. Code Based Static Analysis with ML
5.2.3. Both Manifest and Code Based Static Analysis with ML
5.3. Dynamic Analysis with Machine Learning
5.4. Hybrid Analysis with Machine Learning
5.5. Use of Deep Learning Based Methods
6. Machine Learning Methods to Detect Code Vulnerabilities
6.1. Static, Dynamic, and Hybrid Source Code Analysis
6.2. Applying ML to Detect Source Code Vulnerabilities
ML-Based Vulnerability Detection Specifically for Android
7. Results and Discussion
8. Conclusions and Future Work
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
References
- Number of Mobile Phone Users Worldwide from 2016 to 2023 (In Billions). Available online: https://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/ (accessed on 19 May 2021).
- Mobile Operating System Market Share Worldwide. Available online: https://gs.statcounter.com/os-market-share/mobile/worldwide/ (accessed on 19 May 2021).
- Number of Android Applications on the Google Play Store. Available online: https://www.appbrain.com/stats/number-of-android-apps/ (accessed on 19 May 2021).
- Gibert, D.; Mateu, C.; Planes, J. The rise of machine learning for detection and classification of malware: Research developments, trends and challenges. J. Netw. Comput. Appl. 2020, 153, 102526. [Google Scholar] [CrossRef]
- Khan, J.; Shahzad, S. Android Architecture and Related Security Risks. Asian J. Technol. Manag. Res. [ISSN: 2249–0892] 2015, 5, 14–18. Available online: http://www.ajtmr.com/papers/Vol5Issue2/Vol5Iss2_P4.pdf (accessed on 19 May 2021).
- Platform Architecture. Available online: https://developer.android.com/guide/platform (accessed on 19 May 2021).
- Android Runtime (ART) and Dalvik. Available online: https://source.android.com/devices/tech/dalvik (accessed on 19 May 2021).
- Cai, H.; Ryder, B.G. Understanding Android application programming and security: A dynamic study. In Proceedings of the 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME), Shanghai, China, 17–22 September 2017; pp. 364–375. [Google Scholar] [CrossRef]
- Liu, K.; Xu, S.; Xu, G.; Zhang, M.; Sun, D.; Liu, H. A Review of Android Malware Detection Approaches Based on Machine Learning. IEEE Access 2020, 8, 124579–124607. [Google Scholar] [CrossRef]
- Gilski, P.; Stefanski, J. Android os: A review. Tem J. 2015, 4, 116. Available online: https://www.temjournal.com/content/41/14/temjournal4114.pdf (accessed on 19 May 2021).
- Privacy in Android 11 | Android Developers. Available online: https://developer.android.com/about/versions/11/privacy (accessed on 19 May 2021).
- Garg, S.; Baliyan, N. Comparative analysis of Android and iOS from security viewpoint. Comput. Sci. Rev. 2021, 40, 100372. [Google Scholar] [CrossRef]
- Odusami, M.; Abayomi-Alli, O.; Misra, S.; Shobayo, O.; Damasevicius, R.; Maskeliunas, R. Android malware detection: A survey. In International Conference on Applied Informatics; Springer: Cham, Switzerland, 2018; pp. 255–266. [Google Scholar] [CrossRef]
- Bhat, P.; Dutta, K. A survey on various threats and current state of security in android platform. ACM Comput. Surv. (CSUR) 2019, 52, 1–35. [Google Scholar] [CrossRef]
- Tam, K.; Feizollah, A.; Anuar, N.B.; Salleh, R.; Cavallaro, L. The evolution of android malware and android analysis techniques. ACM Comput. Surv. (CSUR) 2017, 49, 1–41. [Google Scholar] [CrossRef] [Green Version]
- Li, L.; Li, D.; Bissyandé, T.F.; Klein, J.; Le Traon, Y.; Lo, D.; Cavallaro, L. Understanding android app piggybacking: A systematic study of malicious code grafting. IEEE Trans. Inf. Forensics Secur. 2017, 12, 1269–1284. [Google Scholar] [CrossRef] [Green Version]
- Ashawa, M.A.; Morris, S. Analysis of Android malware detection techniques: A systematic review. Int. J. Cyber-Secur. Digit. Forensics 2019, 8, 177–187. [Google Scholar] [CrossRef]
- Suarez-Tangil, G.; Tapiador, J.E.; Peris-Lopez, P.; Ribagorda, A. Evolution, detection and analysis of malware for smart devices. IEEE Commun. Surv. Tutor. 2013, 16, 961–987. [Google Scholar] [CrossRef] [Green Version]
- Mos, A.; Chowdhury, M.M. Mobile Security: A Look into Android. In Proceedings of the 2020 IEEE International Conference on Electro Information Technology (EIT), Chicago, IL, USA, 31 July–1 August 2020; pp. 638–642. [Google Scholar] [CrossRef]
- Faruki, P.; Bharmal, A.; Laxmi, V.; Ganmoor, V.; Gaur, M.S.; Conti, M.; Rajarajan, M. Android security: A survey of issues, malware penetration, and defenses. IEEE Commun. Surv. Tutor. 2014, 17, 998–1022. [Google Scholar] [CrossRef] [Green Version]
- Android Security & Privacy 2018 Year in Review. Available online: https://source.android.com/security/reports/Google_Android_Security_2018_Report_Final.pdf (accessed on 19 May 2021).
- Kalutarage, H.K.; Nguyen, H.N.; Shaikh, S.A. Towards a threat assessment framework for apps collusion. Telecommun. Syst. 2017, 66, 417–430. [Google Scholar] [CrossRef] [Green Version]
- Asavoae, I.M.; Blasco, J.; Chen, T.M.; Kalutarage, H.K.; Muttik, I.; Nguyen, H.N.; Roggenbach, M.; Shaikh, S.A. Towards automated android app collusion detection. arXiv 2016, arXiv:1603.02308. [Google Scholar]
- Asăvoae, I.M.; Blasco, J.; Chen, T.M.; Kalutarage, H.K.; Muttik, I.; Nguyen, H.N.; Roggenbach, M.; Shaikh, S.A. Detecting malicious collusion between mobile software applications: The Android case. In Data Analytics and Decision Support for Cybersecurity; Springer: Cham, Switzerland, 2017; pp. 55–97. [Google Scholar] [CrossRef]
- Malik, J. Making sense of human threats and errors. Comput. Fraud Secur. 2020, 2020, 6–10. [Google Scholar] [CrossRef]
- Calciati, P.; Kuznetsov, K.; Gorla, A.; Zeller, A. Automatically Granted Permissions in Android apps: An Empirical Study on their Prevalence and on the Potential Threats for Privacy. In Proceedings of the 17th International Conference on Mining Software Repositories, Seoul, Korea, 29–30 June 2020; pp. 114–124. [Google Scholar] [CrossRef]
- Nguyen, D.C.; Wermke, D.; Acar, Y.; Backes, M.; Weir, C.; Fahl, S. A stitch in time: Supporting android developers in writing secure code. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1065–1077. [Google Scholar] [CrossRef] [Green Version]
- Garg, S.; Baliyan, N. Android Security Assessment: A Review, Taxonomy and Research Gap Study. Comput. Secur. 2020, 100, 102087. [Google Scholar] [CrossRef]
- Van Engelen, J.E.; Hoos, H.H. A survey on semi-supervised learning. Mach. Learn. 2020, 109, 373–440. [Google Scholar] [CrossRef] [Green Version]
- Alauthman, M.; Aslam, N.; Al-Kasassbeh, M.; Khan, S.; Al-Qerem, A.; Choo, K.K.R. An efficient reinforcement learning-based Botnet detection approach. J. Netw. Comput. Appl. 2020, 150, 102479. [Google Scholar] [CrossRef]
- Shrestha, A.; Mahmood, A. Review of deep learning algorithms and architectures. IEEE Access 2019, 7, 53040–53065. [Google Scholar] [CrossRef]
- Page, M.; McKenzie, J.; Bossuyt, P.; Boutron, I.; Hoffmann, T.; Mulrow, C.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. BMJ 2020, 372. [Google Scholar] [CrossRef]
- Wohlin, C. Guidelines for snowballing in systematic literature studies and a replication in software engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, London, UK, 13–14 May 2014; pp. 1–10. [Google Scholar] [CrossRef]
- Li, L.; Bissyandé, T.F.; Papadakis, M.; Rasthofer, S.; Bartel, A.; Octeau, D.; Klein, J.; Traon, L. Static analysis of android apps: A systematic literature review. Inf. Softw. Technol. 2017, 88, 67–95. [Google Scholar] [CrossRef] [Green Version]
- Pan, Y.; Ge, X.; Fang, C.; Fan, Y. A Systematic Literature Review of Android Malware Detection Using Static Analysis. IEEE Access 2020, 8, 116363–116379. [Google Scholar] [CrossRef]
- Sharma, T.; Rattan, D. Malicious application detection in android—A systematic literature review. Comput. Sci. Rev. 2021, 40, 100373. [Google Scholar] [CrossRef]
- Liu, Y.; Tantithamthavorn, C.; Li, L.; Liu, Y. Deep Learning for Android Malware Defenses: A Systematic Literature Review. arXiv 2021, arXiv:2103.05292. [Google Scholar]
- Ghaffarian, S.M.; Shahriari, H.R. Software vulnerability analysis and discovery using machine-learning and data-mining techniques: A survey. ACM Comput. Surv. (CSUR) 2017, 50, 1–36. [Google Scholar] [CrossRef]
- Chen, T.; Mao, Q.; Yang, Y.; Lv, M.; Zhu, J. TinyDroid: A lightweight and efficient model for Android malware detection and classification. Mob. Inf. Syst. 2018, 2018. [Google Scholar] [CrossRef]
- Nisa, M.; Shah, J.H.; Kanwal, S.; Raza, M.; Khan, M.A.; Damaševičius, R.; Blažauskas, T. Hybrid malware classification method using segmentation-based fractal texture analysis and deep convolution neural network features. Appl. Sci. 2020, 10, 4966. [Google Scholar] [CrossRef]
- Amin, M.; Shah, B.; Sharif, A.; Ali, T.; Kim, K.l.; Anwar, S. Android malware detection through generative adversarial networks. Trans. Emerg. Telecommun. Technol. 2019, e3675. [Google Scholar] [CrossRef]
- Arp, D.; Spreitzenbarth, M.; Hubner, M.; Gascon, H.; Rieck, K.; Siemens, C. Drebin: Effective and explainable detection of android malware in your pocket. In Proceedings of the 2014 Network and Distributed System Security Symposium, San Diego, CA, USA, 23–26 February 2014. [Google Scholar] [CrossRef] [Green Version]
- Google Play. Available online: https://play.google.com/ (accessed on 19 May 2021).
- AndroZoo. Available online: https://androzoo.uni.lu/ (accessed on 19 May 2021).
- AppChina. Available online: https://tracxn.com/d/companies/appchina.com (accessed on 19 May 2021).
- Tencent. Available online: https://www.pcmgr-global.com/ (accessed on 19 May 2021).
- YingYongBao. Available online: https://android.myapp.com/ (accessed on 19 May 2021).
- Contagio. Available online: https://www.impactcybertrust.org/dataset_view?idDataset=1273/ (accessed on 19 May 2021).
- Zhou, Y.; Jiang, X. Dissecting android malware: Characterization and evolution. In Proceedings of the 2012 IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 20–23 May 2012; pp. 95–109. [Google Scholar] [CrossRef] [Green Version]
- VirusShare. Available online: https://virusshare.com/ (accessed on 19 May 2021).
- Intel Security/MacAfee. Available online: https://steppa.ca/portfolio-view/malware-threat-intel-datasets/ (accessed on 19 May 2021).
- Chen, K.; Wang, P.; Lee, Y.; Wang, X.; Zhang, N.; Huang, H.; Zou, W.; Liu, P. Finding unknown malice in 10 s: Mass vetting for new threats at the google-play scale. In Proceedings of the 24th USENIXSecurity Symposium (USENIX Security 15), Redmond, WA, USA, 7–8 May 2015; pp. 659–674. [Google Scholar]
- Android Malware Dataset. Available online: http://amd.arguslab.org/ (accessed on 19 May 2021).
- APKPure. Available online: https://m.apkpure.com/ (accessed on 19 May 2021).
- Anrdoid Permission Dataset. Available online: https://data.mendeley.com/datasets/b4mxg7ydb7/3 (accessed on 19 May 2021).
- Maggi, F.; Valdi, A.; Zanero, S. Andrototal: A flexible, scalable toolbox and service for testing mobile malware detectors. In Proceedings of the Third ACM Workshop on Security and Privacy in Smartphones & Mobile Devices, Berlin, Germany, 8 November 2013; pp. 49–54. [Google Scholar] [CrossRef]
- Wandoujia App Market. Available online: https://www.wandoujia.com/apps (accessed on 19 May 2021).
- Google Playstore Appsin Kaggle. Available online: https://www.kaggle.com/gauthamp10/google-playstore-apps (accessed on 19 May 2021).
- CICMaldroid Dataset. Available online: https://www.unb.ca/cic/datasets/maldroid-2020.html (accessed on 19 May 2021).
- AZ Dataset. Available online: https://www.azsecure-data.org/other-data.html/ (accessed on 19 May 2021).
- Github Malware Dataset. Available online: https://github.com/topics/malware-dataset (accessed on 19 May 2021).
- Alqahtani, E.J.; Zagrouba, R.; Almuhaideb, A. A Survey on Android Malware Detection Techniques Using Machine Learning Algorithms. In Proceedings of the 2019 Sixth International Conference on Software Defined Systems (SDS), Rome, Italy, 10–13 June 2019; pp. 110–117. [Google Scholar] [CrossRef]
- Lopes, J.; Serrão, C.; Nunes, L.; Almeida, A.; Oliveira, J. Overview of machine learning methods for Android malware identification. In Proceedings of the 2019 7th International Symposium on Digital Forensics and Security (ISDFS), Barcelos, Portugal, 10–12 June 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Choudhary, M.; Kishore, B. HAAMD: Hybrid analysis for Android malware detection. In Proceedings of the 2018 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 4–6 January 2018; pp. 1–4. [Google Scholar] [CrossRef]
- Kouliaridis, V.; Kambourakis, G. A Comprehensive Survey on Machine Learning Techniques for Android Malware Detection. Information 2021, 12, 185. [Google Scholar] [CrossRef]
- Chen, L.; Hou, S.; Ye, Y.; Chen, L. An adversarial machine learning model against android malware evasion attacks. In Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint Conference on Web and Big Data; Springer: Cham, Switzerland, 2017; pp. 43–55. [Google Scholar] [CrossRef]
- Lubuva, H.; Huang, Q.; Msonde, G.C. A review of static malware detection for Android apps permission based on deep learning. Int. J. Comput. Netw. Appl. 2019, 6, 80–91. [Google Scholar] [CrossRef]
- Li, J.; Sun, L.; Yan, Q.; Li, Z.; Srisa-An, W.; Ye, H. Significant permission identification for machine-learning-based android malware detection. IEEE Trans. Ind. Inform. 2018, 14, 3216–3225. [Google Scholar] [CrossRef]
- Mcdonald, J.; Herron, N.; Glisson, W.; Benton, R. Machine Learning-Based Android Malware Detection Using Manifest Permissions. In Proceedings of the 54th Hawaii International Conference on System Sciences, Maui, HI, USA, 5–8 January 2021; p. 6976. [Google Scholar] [CrossRef]
- Şahin, D.Ö.; Kural, O.E.; Akleylek, S.; Kılıç, E. A novel permission-based Android malware detection system using feature selection based on linear regression. Neural Comput. Appl. 2021, 1–16. [Google Scholar] [CrossRef]
- Nawaz, A. Feature Engineering based on Hybrid Features for Malware Detection over Android Framework. Turk. J. Comput. Math. Educ. (TURCOMAT) 2021, 12, 2856–2864. [Google Scholar]
- Cai, L.; Li, Y.; Xiong, Z. JOWMDroid: Android malware detection based on feature weighting with joint optimization of weight-mapping and classifier parameters. Comput. Secur. 2021, 100, 102086. [Google Scholar] [CrossRef]
- Zhang, P.; Cheng, S.; Lou, S.; Jiang, F. A novel Android malware detection approach using operand sequences. In Proceedings of the 2018 Third International Conference on Security of Smart Cities, Industrial Control System and Communications (SSIC), Shanghai, China, 18–19 October 2018; pp. 1–5. [Google Scholar] [CrossRef]
- Wei, L.; Luo, W.; Weng, J.; Zhong, Y.; Zhang, X.; Yan, Z. Machine learning-based malicious application detection of android. IEEE Access 2017, 5, 25591–25601. [Google Scholar] [CrossRef]
- Onwuzurike, L.; Mariconti, E.; Andriotis, P.; Cristofaro, E.D.; Ross, G.; Stringhini, G. MaMaDroid: Detecting Android malware by building Markov chains of behavioral models (extended version). ACM Trans. Priv. Secur. (TOPS) 2019, 22, 1–34. [Google Scholar] [CrossRef] [Green Version]
- Zhang, H.; Luo, S.; Zhang, Y.; Pan, L. An efficient Android malware detection system based on method-level behavioral semantic analysis. IEEE Access 2019, 7, 69246–69256. [Google Scholar] [CrossRef]
- Meng, G.; Xue, Y.; Xu, Z.; Liu, Y.; Zhang, J.; Narayanan, A. Semantic modelling of android malware for effective malware comprehension, detection, and classification. In Proceedings of the 25th International Symposium on Software Testing and Analysis, Saarbrücken, Germany, 18–20 July 2016; pp. 306–317. [Google Scholar] [CrossRef]
- Wang, Z.; Li, C.; Yuan, Z.; Guan, Y.; Xue, Y. DroidChain: A novel Android malware detection method based on behavior chains. Pervasive Mob. Comput. 2016, 32, 3–14. [Google Scholar] [CrossRef]
- Androguard. Available online: https://pypi.org/project/androguard/ (accessed on 19 May 2021).
- Damodaran, A.; Di Troia, F.; Visaggio, C.A.; Austin, T.H.; Stamp, M. A comparison of static, dynamic, and hybrid analysis for malware detection. J. Comput. Virol. Hacking Tech. 2017, 13, 1–12. [Google Scholar] [CrossRef]
- Sun, Y.; Xie, Y.; Qiu, Z.; Pan, Y.; Weng, J.; Guo, S. Detecting Android malware based on extreme learning machine. In Proceedings of the 2017 IEEE 15th International Conference on Dependable, Autonomic and Secure Computing, 15th International Conference on Pervasive Intelligence and Computing, 3rd International Conference on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), Orlando, FL, USA, 6–10 November 2017; pp. 47–53. [Google Scholar] [CrossRef]
- Tian, K.; Yao, D.; Ryder, B.G.; Tan, G.; Peng, G. Detection of repackaged android malware with code-heterogeneity features. IEEE Trans. Dependable Secur. Comput. 2017, 17, 64–77. [Google Scholar] [CrossRef]
- Kabakus, A.T. What static analysis can utmost offer for Android malware detection. Inf. Technol. Control 2019, 48, 235–249. [Google Scholar] [CrossRef] [Green Version]
- Koli, J. RanDroid: Android malware detection using random machine learning classifiers. In Proceedings of the 2018 Technologies for Smart-City Energy Security and Power (ICSESP), Bhubaneswar, India, 28–30 March 2018; pp. 1–6. [Google Scholar] [CrossRef]
- Lou, S.; Cheng, S.; Huang, J.; Jiang, F. TFDroid: Android malware detection by topics and sensitive data flows using machine learning techniques. In Proceedings of the 2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT), Kahului, HI, USA, 14–17 March 2019; pp. 30–36. [Google Scholar] [CrossRef]
- Wang, W.; Gao, Z.; Zhao, M.; Li, Y.; Liu, J.; Zhang, X. DroidEnsemble: Detecting Android malicious applications with ensemble of string and structural static features. IEEE Access 2018, 6, 31798–31807. [Google Scholar] [CrossRef]
- Garg, S.; Peddoju, S.K.; Sarje, A.K. Network-based detection of Android malicious apps. Int. J. Inf. Secur. 2017, 16, 385–400. [Google Scholar] [CrossRef]
- Sikder, A.K.; Aksu, H.; Uluagac, A.S. 6thsense: A context-aware sensor-based attack detector for smart devices. In Proceedings of the 26th USENIX Security Symposium (USENIX Security 17), Vancouver, BC, Canada, 16–18 August 2017; pp. 397–414. [Google Scholar] [CrossRef]
- Mahindru, A.; Singh, P. Dynamic permissions based android malware detection using machine learning techniques. In Proceedings of the 10th Innovations in Software Engineering Conference, Jaipur, India, 5–7 February 2017; pp. 202–210. [Google Scholar] [CrossRef]
- Salehi, M.; Amini, M.; Crispo, B. Detecting malicious applications using system services request behavior. In Proceedings of the 16th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services, Houston, TX, USA, 12–14 November 2019; pp. 200–209. [Google Scholar] [CrossRef] [Green Version]
- Thangavelooa, R.; Jinga, W.W.; Lenga, C.K.; Abdullaha, J. DATDroid: Dynamic Analysis Technique in Android Malware Detection. Int. J. Adv. Sci. Eng. Inf. Technol. 2020, 10, 536–541. [Google Scholar] [CrossRef]
- Hasan, H.; Ladani, B.T.; Zamani, B. MEGDroid: A model-driven event generation framework for dynamic android malware analysis. Inf. Softw. Technol. 2021, 135, 106569. [Google Scholar] [CrossRef]
- Raphael, R.; Mathiyalagan, P. An Exploration of Changes Addressed in the Android Malware Detection Walkways. In Proceedings of the International Conference on Computational Intelligence, Cyber Security, and Computational Models, Coimbatore, India, 19–21 December 2019; Springer: Singapore, 2019; pp. 61–84. [Google Scholar] [CrossRef]
- Jannat, U.S.; Hasnayeen, S.M.; Shuhan, M.K.B.; Ferdous, M.S. Analysis and detection of malware in Android applications using machine learning. In Proceedings of the 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox’sBazar, Bangladesh, 7–9 February 2019; pp. 1–7. [Google Scholar] [CrossRef]
- Kapratwar, A.; Di Troia, F.; Stamp, M. Static and Dynamic Analysis of Android Malware; ICISSP: Porto, Portugal, 2017; pp. 653–662. [Google Scholar] [CrossRef]
- Leeds, M.; Keffeler, M.; Atkison, T. A comparison of features for android malware detection. In Proceedings of the SouthEast Conference, Kennesaw, GA, USA, 13–15 April 2017; pp. 63–68. [Google Scholar] [CrossRef]
- Hadiprakoso, R.B.; Kabetta, H.; Buana, I.K.S. Hybrid-Based Malware Analysis for Effective and Efficiency Android Malware Detection. In Proceedings of the 2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), Jakarta, Indonesia, 19–20 November 2020; pp. 8–12. [Google Scholar] [CrossRef]
- Surendran, R.; Thomas, T.; Emmanuel, S. A TAN based hybrid model for android malware detection. J. Inf. Secur. Appl. 2020, 54, 102483. [Google Scholar] [CrossRef]
- Martín, A.; Menéndez, H.D.; Camacho, D. MOCDroid: Multi-objective evolutionary classifier for Android malware detection. Soft Comput. 2017, 21, 7405–7415. [Google Scholar] [CrossRef]
- Qaisar, Z.H.; Li, R. Multimodal information fusion for android malware detection using lazy learning. Multimed. Tools Appl. 2021, 1–15. [Google Scholar] [CrossRef]
- Mahindru, A.; Sangal, A. MLDroid—Framework for Android malware detection using machine learning techniques. Neural Comput. Appl. 2021, 33, 5183–5240. [Google Scholar] [CrossRef]
- Xu, K.; Li, Y.; Deng, R.H.; Chen, K. Deeprefiner: Multi-layer android malware detection system applying deep neural networks. In Proceedings of the 2018 IEEE European Symposium on Security and Privacy (EuroS&P), London, UK, 24–26 April 2018; pp. 473–487. [Google Scholar] [CrossRef]
- JADX. Available online: https://github.com/skylot/jadx/ (accessed on 19 May 2021).
- McLaughlin, N.; Martinez del Rincon, J.; Kang, B.; Yerima, S.; Miller, P.; Sezer, S.; Safaei, Y.; Trickel, E.; Zhao, Z.; Doupé, A.; et al. Deep android malware detection. In Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, Scottsdale, AZ, USA, 22–24 March 2017; pp. 301–308. [Google Scholar] [CrossRef] [Green Version]
- Amin, M.; Tanveer, T.A.; Tehseen, M.; Khan, M.; Khan, F.A.; Anwar, S. Static malware detection and attribution in android byte-code through an end-to-end deep system. Future Gener. Comput. Syst. 2020, 102, 112–126. [Google Scholar] [CrossRef]
- Alzaylaee, M.K.; Yerima, S.Y.; Sezer, S. DL-Droid: Deep learning based android malware detection using real devices. Comput. Secur. 2020, 89, 101663. [Google Scholar] [CrossRef]
- Vu, L.N.; Jung, S. AdMat: A CNN-on-Matrix Approach to Android Malware Detection and Classification. IEEE Access 2021, 9, 39680–39694. [Google Scholar] [CrossRef]
- Millar, S.; McLaughlin, N.; del Rincon, J.M.; Miller, P. Multi-view deep learning for zero-day Android malware detection. J. Inf. Secur. Appl. 2021, 58, 102718. [Google Scholar] [CrossRef]
- Acar, Y.; Stransky, C.; Wermke, D.; Weir, C.; Mazurek, M.L.; Fahl, S. Developers need support, too: A survey of security advice for software developers. In Proceedings of the 2017 IEEE Cybersecurity Development (SecDev), Cambridge, MA, USA, 24–26 September 2017; pp. 22–26. [Google Scholar] [CrossRef] [Green Version]
- Mohammed, N.M.; Niazi, M.; Alshayeb, M.; Mahmood, S. Exploring software security approaches in software development lifecycle: A systematic mapping study. Comput. Stand. Interfaces 2017, 50, 107–115. [Google Scholar] [CrossRef]
- Weir, C.; Becker, I.; Noble, J.; Blair, L.; Sasse, M.A.; Rashid, A. Interventions for long-term software security: Creating a lightweight program of assurance techniques for developers. Softw. Pract. Exp. 2020, 50, 275–298. [Google Scholar] [CrossRef]
- Alenezi, M.; Almomani, I. Empirical analysis of static code metrics for predicting risk scores in android applications. In Proceedings of the 5th International Symposium on Data Mining Applications, Cham, Switzerland, 29 March 2018; Springer: Cham, Switzerland, 2018; pp. 84–94. [Google Scholar] [CrossRef]
- Palomba, F.; Di Nucci, D.; Panichella, A.; Zaidman, A.; De Lucia, A. Lightweight detection of android-specific code smells: The adoctor project. In Proceedings of the 2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER), Klagenfurt, Austria, 20–24 February 2017; pp. 487–491. [Google Scholar] [CrossRef]
- Pustogarov, I.; Wu, Q.; Lie, D. Ex-vivo dynamic analysis framework for Android device drivers. In Proceedings of the 2020 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 18–21 May 2020; pp. 1088–1105. [Google Scholar] [CrossRef]
- Amin, A.; Eldessouki, A.; Magdy, M.T.; Abdeen, N.; Hindy, H.; Hegazy, I. AndroShield: Automated android applications vulnerability detection, a hybrid static and dynamic analysis approach. Information 2019, 10, 326. [Google Scholar] [CrossRef] [Green Version]
- Tahaei, M.; Vaniea, K.; Beznosov, K.; Wolters, M.K. Security Notifications in Static Analysis Tools: Developers’ Attitudes, Comprehension, and Ability to Act on Them. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 8–13 May 2021; pp. 1–17. [Google Scholar] [CrossRef]
- Goaër, O.L. Enforcing green code with Android lint. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering Workshops, Melbourne, VIC, Australia, 21–25 September 2020; pp. 85–90. [Google Scholar] [CrossRef]
- Habchi, S.; Blanc, X.; Rouvoy, R. On adopting linters to deal with performance concerns in android apps. In Proceedings of the 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE), Montpellier, France, 3–7 September 2018; pp. 6–16. [Google Scholar] [CrossRef] [Green Version]
- Wei, L.; Liu, Y.; Cheung, S.C. OASIS: Prioritizing static analysis warnings for Android apps based on app user reviews. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, Paderborn, Germany, 4–8 September 2017; pp. 672–682. [Google Scholar] [CrossRef]
- Luo, L.; Dolby, J.; Bodden, E. MagpieBridge: A General Approach to Integrating Static Analyses into IDEs and Editors (Tool Insights Paper). In Proceedings of the 33rd European Conference on Object-Oriented Programming (ECOOP 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 15–19 July 2019. [Google Scholar] [CrossRef]
- Wang, Y.; Xu, G.; Liu, X.; Mao, W.; Si, C.; Pedrycz, W.; Wang, W. Identifying vulnerabilities of SSL/TLS certificate verification in Android apps with static and dynamic analysis. J. Syst. Softw. 2020, 167, 110609. [Google Scholar] [CrossRef]
- Gupta, A.; Suri, B.; Kumar, V.; Jain, P. Extracting rules for vulnerabilities detection with static metrics using machine learning. Int. J. Syst. Assur. Eng. Manag. 2021, 12, 65–76. [Google Scholar] [CrossRef]
- Kim, S.; Yeom, S.; Oh, H.; Shin, D.; Shin, D. Automatic Malicious Code Classification System through Static Analysis Using Machine Learning. Symmetry 2021, 13, 35. [Google Scholar] [CrossRef]
- Bilgin, Z.; Ersoy, M.A.; Soykan, E.U.; Tomur, E.; Çomak, P.; Karaçay, L. Vulnerability Prediction From Source Code Using Machine Learning. IEEE Access 2020, 8, 150672–150684. [Google Scholar] [CrossRef]
- Russell, R.; Kim, L.; Hamilton, L.; Lazovich, T.; Harer, J.; Ozdemir, O.; Ellingwood, P.; McConley, M. Automated vulnerability detection in source code using deep representation learning. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA), Orlando, FL, USA, 17–20 December 2018; pp. 757–762. [Google Scholar] [CrossRef] [Green Version]
- Chernis, B.; Verma, R. Machine learning methods for software vulnerability detection. In Proceedings of the Fourth ACM International Workshop on Security and Privacy Analytics, Tempe, AZ, USA, 21 March 2018; pp. 31–39. [Google Scholar] [CrossRef]
- Wu, F.; Wang, J.; Liu, J.; Wang, W. Vulnerability detection with deep learning. In Proceedings of the 2017 3rd IEEE International Conference on Computer and Communications (ICCC), Chengdu, China, 13–16 December 2017; pp. 1298–1302. [Google Scholar] [CrossRef]
- Pang, Y.; Xue, X.; Wang, H. Predicting vulnerable software components through deep neural network. In Proceedings of the 2017 International Conference on Deep Learning Technologies, Chengdu, China, 2–4 June 2017; pp. 6–10. [Google Scholar] [CrossRef]
- Garg, S.; Baliyan, N. A novel parallel classifier scheme for vulnerability detection in android. Comput. Electr. Eng. 2019, 77, 12–26. [Google Scholar] [CrossRef]
- Ponta, S.E.; Plate, H.; Sabetta, A.; Bezzi, M.; Dangremont, C. A manually-curated dataset of fixes to vulnerabilities of open-source software. In Proceedings of the 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR), Montreal, QC, Canada, 26–27 May 2019; pp. 383–387. [Google Scholar] [CrossRef] [Green Version]
- Namrud, Z.; Kpodjedo, S.; Talhi, C. AndroVul: A repository for Android security vulnerabilities. In Proceedings of the 29th Annual International Conference on Computer Science and Software Engineering, Toronto, ON, Canada, 4–6 November 2019; pp. 64–71. [Google Scholar]
- Cui, J.; Wang, L.; Zhao, X.; Zhang, H. Towards predictive analysis of android vulnerability using statistical codes and machine learning for IoT applications. Comput. Commun. 2020, 155, 125–131. [Google Scholar] [CrossRef]
- Zhuo, L.; Zhimin, G.; Cen, C. Research on Android intent security detection based on machine learning. In Proceedings of the 2017 4th International Conference on Information Science and Control Engineering (ICISCE), Changsha, China, 21–23 July 2017; pp. 569–574. [Google Scholar] [CrossRef]








| Year | Study | Detection Approach | Feature Extraction Method | Used Datasets | ML Algorithms/Models | Selected ML Algorithms/Models | Model Accuracy | Strengths | Limitations/Drawbacks | 
|---|---|---|---|---|---|---|---|---|---|
| 2018 | [68] | Developing 3 level data purring method and applying ML models with SigPID | Manifest Analysis for Permissions | Google Play | NB, DT, SVM | SVM | 90% | High effectiveness and accuracy | Considered only the permission analysis which may lead to omit other important analysis aspects | 
| 2021 | [69] | Analysing permission and training the model with identified ML algorithm | Manifest Analysis for Permissions | Google Play, AndroZoo, AppChina | RF, SVM, Gaussian NB, K-Means, | RF | 81.5% | The model was trained with comparatively different datasets | Did not consider other static analysis features such as OpCode, API calls, etc. | 
| 2021 | [70] | Reducing dimension vector generation and based on that perform malware detection using ML models | Manifest Analysis for permissions | AMD, APKPure | MLP, NB, Linear Regression, KNN, C.4.5, RF, SMO | MLP | 96% | Efficiency, applicability and understandability are ensured | Hyper-parameter selections are not made in the use | 
| 2021 | [71] | Selecting feature using dimensionality reduction algorithms and using Info Gain method | Manifest Analysis for permissions and intents | Drebin, Google Play | RF, NB, GB, AB | RF, NB, AB | RF-98%, NB-92%, AB-97% | Analysed the features as individual components and not as a whole | Did not consider about other features such as API calls, Opcode etc. | 
| 2021 | [72] | Feature weighting with join optimisation of weight mapping with proposed JOWMDroid framework | Manifest Analysis for permission, Intents, Activities and Services | Drebin, AMD, Google Play APKPure | RF, SVM, LR, KNN | JOWM-IO method with SVM and LR | 96% | Improved accuracy and efficiency | Correlation between features were not considered | 
| Year | Study | Detection Approach | Feature Extraction Method | Used Datasets | ML Algorithms/Models | Selected ML Algorithms/Models | Model Accuracy | Strengths | Limitations/Drawbacks | 
|---|---|---|---|---|---|---|---|---|---|
| 2016 | [78] | Transforming malware detection problem to matrix model using Wxshall algo and extracting Smali codes and generated the API call graph using Androguard | Code analysis for API Calls and code instrumentation for network traffic | MalGenome | Custom build ML based Wxshall algorithm, Wxshall extended algorithm | Wxshall extended algorithm | 87.75% | Few false alarms | Required to expand the behaviour model and improve the efficiency | 
| 2017 | [74] | Using the combination of system functions to describe the application behaviours and constructing eigenvectors and then using Androidetect | Code analysis for API calls and Opcodes | Google Play | NB, J48 DT, Application functions decision algorithm | Application functions decision algorithm | 90% | Can identify the instantaneous attacks. Can judge the source of the detected abnormal behaviour High performance in model execution | Did not consider some important static analysis features such as OpCode, API calls, etc. | 
| 2018 | [39] | Using TinyDroid framework, n-Gram methods after getting the Opcode sequence from .smali after decompiling .dex | Code Analysis for Opcode | Drebin | NLP, SVM, KNN, NB, RF, AP | RF and AP with TinyDroid | 87.6% | Lightweight static detection system High performance in classification and detection | Malware samples were taken only from few research studies and some organisations which lack metamorphic malware samples | 
| 2018 | [73] | Analysing Package level information extracted from API calls using decompiled Smali files | Code Analysis for API calls and Information flow | Drebin, Contagio, Google Play | DT, RF, KNN, NB | RF | 86.89% | Model performs well even when the length of the sequence is short | Other information contained in operands were not considered which affect to the overall model | 
| Year | Study | Detection Approach | Feature Extraction Method | Used Datasets | ML Algorithms/Models | Selected ML Algorithms/Models | Model Accuracy | Strengths | Limitations/Drawbacks | 
|---|---|---|---|---|---|---|---|---|---|
| 2016 | [77] | Using Deterministic Symbolic Automaton and Semantic Modelling of Android Attack | Code Analysis for Opcode/Byte code | Drebin | AB, C4.5, NB, LinearSVM, RF | RF | 97% | Use a combined approach of ML and DSA inclusion | Unable to detect new malware patterns since this will not perform complete static analysis | 
| 2017 | [80] | Training Hidden Markov Models and comparing detection rates for models based on static data, dynamic data, and hybrid approaches | Code analysis for API calls and Opcode in static analysis and System call analysis | Harebot, Security Shield, Smart HDD, Winwebsec, Zbot, ZeroAccess | HMM | HMM | 90.51% | Check the difference approaches available to detect ML | Did not consider other ML algorithms or other important features | 
| 2019 | [75] | Determining the apps call graphs as Markov chain Then obtaining API call sequences and using ML models with MaMaDroid | Code Analysis for API calls | Drebin, oldbenign | RF, KNN, SVM | RF | 94% | the system is trained on older samples and evaluated over newer ones | Requires a high memory to perform classification | 
| 2019 | [76] | Calculating confidence of association rules between abstracted API calls which provides behavioural semantic of the app | Code Analysis for API calls | Drebin, AMD | SVM, KNN, RF | RF | 96% | Efficient feature extraction process Better stability of the system | Did not address the cases such as dynamic loading, native codes, encryption, etc. | 
| Year | Study | Detection Approach | Feature Extraction Method | Used Datasets | ML Algorithms/Models | Selected ML Algorithms/Models | Model Accuracy | Strengths | Limitations/Drawbacks | 
|---|---|---|---|---|---|---|---|---|---|
| 2017 | [81] | Using customized method named Waffle Director | Manifest Analysis for Sensitive permissions and API calls | Tencent, YingYongBao, Contagio | DT, Neural Network, SVM, NB, ELM | ELM | 97.06% | Fast Learning speed and Minimal human intervention | Combination of permissions and API calls are not refined | 
| 2017 | [82] | Using a code-heterogeneity-analysis framework to classify Android repackaged malware by Smali code intermediate representation | Manifest Analysis for Intents, Permissions and API calls | Genome, Virus-Share, Benign App | RF, KNN, DT, SVM | RF with custom model proposed | FNR-0.35%, FPR-2.96% | Provide in-depth and fine-grained behavioural analysis and classification on programs | Detection issues can happen when the malware use coding techniques like reflection and cannot handle if the encryption techniques used in DEX | 
| 2018 | [84] | Extracting features and transforming into binary vectors and training using ML with RanDroid Framework | Manifest Analysis for Permissions Code Analysis for API calls, opcode and native calls | Drebin | SVM, DT, RF NBs | DT | 97.7% | Highly accurate to analyse permission, API calls, opcode an native calls toward malware detection | Broadcast receivers, filtered intend, Control Flow Graph analysis, deep native code analysis were not considered | 
| 2018 | [86] | Creating the binary vector, apply ML models, evaluate performance of the features and their ensemble using DroidEnsemble | Manifest analysis for permissions, code analysis for API calls and system calls analysis | Google Play, AnZhi, LenovoMM, Wandoujia | SVM, KNN, RF | SVM | 98.4% | Characterises the static behaviours of apps with ensemble of string and structural features. | Mechanism will fail if the malware contains encryption, anti-disassembly, or kernel-level features to evade the detection | 
| 2019 | [83] | Extracting applications features from manifest while decompiling classes.dex into jar file and applying ML models | Manifest Analysis for permissions, activities and Code Analysis for Opcode | Drebin, playstore, Genome | KNN, SVM, BayesNet, NB, LR, J48, RT, RF, AB | RF with 1000 decision trees | 98.7% | High efficiency, Lightweight analysis and fully automated approach | Did not consider about the API calls and other important features when analysing the DEX. | 
| 2019 | [85] | Using FlowDroid for static analysis and proposing TFDroid framework to detect malware using sensitive data flow analysis | Manifest Analysis for permission and Code Analysis for information flow | Drebin, Google Play | SVM | SVM | 93.7% | Analysed the functions of applications by their descriptions to check the data flow. | Did not consider the improving clustering techniques and applicability of other ML models | 
| Year | Study | Detection Approach | Feature Extraction Method | Used Datasets | ML Algorithms/Models | Selected ML Algorithms/Models | Model Accuracy | Strengths | Limitations/Drawbacks | 
|---|---|---|---|---|---|---|---|---|---|
| 2017 | [87] | Extracting the DNS, HTTP, TCP, Origin based features of the network used by apps | Network traffic analysis for network protocols | Genome | DT, LR, KNN, Bayes Network, RF | RF | 98.7% | Work with different OS versions, Detect unknown malware, and infected apps | If the malware apps using encrypted, not possible to detect malware properly | 
| 2017 | [88] | Using Markov Chain-based detection technique, to compute the state transitions and to build transition matrix with 6thSense | System resources analysis for process reports and sensors | Google Play | Markov Chain, NB, LMT | LMT | 95% | Highly effective and efficient at detecting sensor-based attacks while yielding minimal overhead | Tradeoffs such as frequency accuracy, battery frequency are not discussed which can affect the malware detection accuracy | 
| 2017 | [89] | Using Dynamic based permission analysis using a run-time and detect malware using ML calculate the accuracy | Code instrumentation analysis Java classes and dynamic permissions | Pvsingh, Android Botnet, DroidKin | NB, RF, Simple Logistic, DT K-Star | Simple Logistic | 99.7% | High Accuracy | Need to address the app crashing issue in the selected emulators in dynamic analysis | 
| 2019 | [90] | Using dynamically tracks execution behaviours of applications and using ServiceMonitor framework | System call analysis | AndroZoo, Drebin and Malware Genome | RF, KNN, SVM | RF | 96.7% | High accuracy and high efficiency | Not detecting difference in some system calls of malware and benign apps since signature based verification was not applied | 
| 2020 | [91] | Extracting the features and permissions from Android app. Performing feature selection and proceed to classification with DATDroid | System call analysis, Code instrumentation for network traffic analysis and System resources analysis | APKPure, Genome | RF, SVM | RF | 91.7% | High efficiency | Impact from features like HTTP, DNS, TCP/IP patterns are not considered | 
| 2021 | [92] | Using decompilation, model discovery, integration and transformation, analysis and transformation, event production | Code instrumentation for java classes, intents | AMD | ML algorithms used in MEGDroid, Monkey, Droidbot | MEGDroid | 91.6% | Considerably increases the number of triggered malicious payloads and execution code coverage | System calls are not monitored | 
| Year | Study | Detection Approach | Feature Extraction Method | Used Datasets | ML algorithms/Models | Selected ML algorithms/Models | Model Accuracy | Strengths | Limitations/Drawbacks | 
|---|---|---|---|---|---|---|---|---|---|
| 2017 | [96] | Using a set of Python and Bash scripts which automated the analysis of the Android data. | Manifest analysis for permissions and System call analysis for dynamic analysis | Andrototal | NB, DT | DT | 80% | Model execution is efficient | Consider system call appearance rather than frequency and Lower number of samples used to train | 
| 2018 | [95] | Using Binary feature vector and permission vector datasets were created using the analysis techniques and was used with the ML algorithms | Manifest analysis for permissions and system call analysis | Drebin | RF, J.48, NB, Simple Logistic, BayesNet TAN, BayesNet K2, SMO PolyKernel, IBK, SMO NPolyKernel | RF | Static-96%, Dynamic-88% | Compared with several ML algorithms | Accuracy depends on the 3rd party tool (Monkey runner) used to collect features. | 
| 2019 | [94] | Preparing a JSON file after reverse engineering, decompiling, and analysing the APK by running in a sandbox environment and then extracting the key features and applied ML | Manifest analysis for permissions, code analysis for API calls and System call analysis | MalGenome, Kaggle, Androguard [79] | SVM, LR, KNN, RF | LR for static analysis and RF for dynamic analysis | Static-81.03%, Dynamic-93% | Dynamic analysis performed was better than the static analysis approach in terms of detection accuracy | Did not perform a proper hybrid analysis approach to increase the overall accuracy | 
| Year | Study | Detection Approach | Feature Extraction Method | Used Datasets | ML Algorithms/Models | Selected ML Algorithms/Models | Model Accuracy | Strengths | Limitations/Drawbacks | 
|---|---|---|---|---|---|---|---|---|---|
| 2017 | [99] | Using import term extraction, clustering and applying genetic algorithm with MOCODroid | Code analysis for API calls and information flow and system call analysis | Virus-total, Google Play | Genatic algorithm, Multiobjective evolutionary algorithm | Multiobjective evolutionary classifier | 95.15% | Possible to avoid the effects of the concealment strategies | Did not consider about other clustering methods. | 
| 2020 | [97] | Extracted 261 combined features of the hybrid analysis with using the support of datasets and performed the ML/DL models | Manifest analysis for permissions and system call analysis | MalGenome, Drebin, CICMalDroid | SVM, KNN, RF, DT, NB, MLP, GB | GB | 99.36% | Hybrid analysis is having higher accuracy comparing to static analysis and dynamic analysis individually | Runtime environment and configuration is not considered | 
| 2020 | [98] | Using Conditional dependencies among relevant static and dynamic features. Then trained ridge regularised LR classifiers and modelled their output relationships as a TAN | Manifest analysis for permissions, code analysis for API calls and system call analysis | Drebin, AMD, AZ, Github, GP | TAN | TAN | 97% | Highly accurate | Possibility of some malwares remain undetected | 
| 2021 | [100] | Using exploit static, dynamic, and visual features of apps to predict the malicious apps using information fusion and applied Case Based Reasoning (CBR) | Manifest analysis for permissions and System call analysis | Drebin | CBR, SVM, DT | CBR | 95% | Require limited memory and processing capabilities | Require to present the knowledge representation to address some limitations | 
| Year | Study | Detection Approach | Feature Extraction Method | Used Datasets | ML/DL Algorithms/Models | Selected DL Algorithms/Models | Model Accuracy | Strengths | Limitations/Drawbacks | 
|---|---|---|---|---|---|---|---|---|---|
| 2017 | [104] | Using n-Gram methods after getting the Opcode sequence from .smali after dissembling .apk | Code Analysis for Opcodes | Genome, IntelSecurity, MacAfee, Google Play | CNN, NLP | Deep CNN | 87% | Automatically learn the feature indicative of malware without hand engineering | Assumption of all APKs are benign in Google Play dataset while all are malicious in malware dataset | 
| 2021 | [108] | Using DL based method which uses Convolution Neural Network based approach to analyse features | Code Analysis for API calls, Opcode and Manifest Analysis for Permission | Drebin, AMD | CNN | CNN | 91% and 81% on two datasets | Reduce over fitting and possible to train to detect new malware just by collecting more sample apps | Did not compared with other ML/DL methods | 
| Year | Study | Detection Approach | Feature Extraction Method | Used Datasets | ML/DL Algorithms/Models | Selected DL Algorithms/Models | Model Accuracy | Strengths | Limitations/Drawbacks | 
|---|---|---|---|---|---|---|---|---|---|
| 2018 | [102] | Applying LSTM on semantic structure of bytecode with 2 layers of detection and validating with DeepRefiner | Code Analysis for Opcode/bytecode | Google Play, VirusShare, MassVet | RNN, LSTM | LSTM | 97.4% | High efficiency with average of 0.22 s to the 1st layer and 2.42 s to the 2nd layer detection | Need to train the model regularly to update the training model on new malware | 
| 2020 | [105] | Detecting Malware attributes by vectorised opcode extracted from the bytecode of the APKs with one-hot encoding before apply DL Techniques | Code Analysis for Opcode | Drebin, AMD, VirusShare | BiLSTM, RNN, LSTM, Neural Networks, Deep Convents, Diabolo Network model | BiLSTMs | 99.9% | Very high accuracy, Able to achieve zero day malware family without overhead of previous training | Did not analyse complete byte code | 
| 2020 | [106] | Using DynaLog to select and extract features from Log files and using DL-Droid to perform feature ranking and apply DL | Code instrumentation analysis for java classes, intents, and systems calls | Intel Security | NB, SL, SVM, J48, PART, RF, DL | DL | 99.6% | Experiments were performed on real devices High accuracy | Could have implemented the intrusion detection part also to make it more comprehensive malware detection tool | 
| 2021 | [101] | Selecting features gained by feature selection approaches. Applying ML/DL models to detect malware | Code instrumentation for java classes, permissions, and API calls at the runtime | Android Permissions Dataset, Computer and security dataset | farthest first clustering, Y-MLP, nonlinear ensemble decision tree forest, DL | DL with methods in MLDroid | 98.8% | High accuracy and easy to retrain the model to identify new malware | Human interaction would be required in some cases. Can contain issues in the datasets | 
| 2021 | [107] | Characterising apps and treating as images. Then constructing the adjacency matrix. Then applying CNN to identify malware with AdMat framework | Code Analysis for API calls, Information flow, and Opcode | Drebin AMD | CNN | CNN | 98.2% | High Accuracy and efficiency | Performance is depending on number of used features | 
| Year | Study | Code Analysis Method | Approach | Used ML/DL Methods/Frameworks | Accuracy of the Model | 
|---|---|---|---|---|---|
| 2017 | [127] | Dynamic Analysis | Collected 9872 sequences of function calls as features. Performed dynamic analysis with DL methods | CNN-LSTM | 83.6% | 
| 2017 | [133] | Hybrid Analysis | Decompiled the apk file. Performed static analysis of the manifest file to obtain the components/permissions. Dynamic analysis and fuzzy testing were conducted and obtained system status. | AB and DT | 77% | 
| 2019 | [115] | Hybrid Analysis | Reverse engineered the APK, Decoded the manifest files & codes and extracted meta data from it. Performed dynamic analysis to identify intent crashing and insecure network connections for API calls. Generated the report. | AndroShield | 84% | 
| 2020 | [124] | Hybrid Analysis | Performed intelligent analysis of generated AST. Checked ML can differentiate vulnerable and nonvulnerable. | MLP and a customised model | 70.1% | 
| Year | Study | Code Analysis Method | Approach | Used ML/DL Methods/Frameworks | Accuracy of the Model | 
|---|---|---|---|---|---|
| 2017 | [113] | Static Analysis | Generated the AST, navigated it, and computed detection rules. Identified smells when training with manually created dataset. | ADOCTOR framework | 98% | 
| 2017 | [128] | Static Analysis | Combined N-gram analysis and statistical feature selection for constructing features. Evaluated the performance of the proposed technique based on a number of Java Android programs. | Deep Neural Network | 92.87% | 
| 2019 | [129] | Hybrid Analysis | Decompiled the APK and selected the features and executed the APK and generated log files with system calls. Generated the vector space and trained with ML algorithms as parallel classifiers. | MLP, SVM, PART, RIDOR, MaxProb, ProdProb | 98.37% | 
| 2020 | [121] | Hybrid Analysis | In static analysis, vulnerabilities of SSL/TLS certification were identified. Results from static analysis about user interfaces were analysed to confirm SSL/TLS misuse in dynamic analysis. | DCDroid | 99.39% | 
| 2021 | [122] | Static Analysis | 32 supervised ML algorithms were considered for 3 common vulnerabilities: Lawofdemeter, BeanMemberShouldSerialize, and LocalVariablecouldBeFinal | J48 | 96% | 
| 2021 | [123] | Static Analysis | Classified malicious code using a PE structure and a method for classifying it using a PE structure | CNN | 98.77% | 
| Algorithm | Advantages | Disadvantages | 
|---|---|---|
| DT | 
 | 
 | 
| NB | 
 | 
 | 
| Regression Models | 
 | 
 | 
| KNN | 
 | 
 | 
| SVM | 
 | 
 | 
| K-Means | 
 | 
 | 
| RF | 
 | 
 | 
| Neural Networks | 
 | 
 | 
| LSTM | 
 | 
 | 
| CNN | 
 | 
 | 
| Ensemble Learning | 
 | 
 | 
| Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. | 
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Senanayake, J.; Kalutarage, H.; Al-Kadri, M.O. Android Mobile Malware Detection Using Machine Learning: A Systematic Review. Electronics 2021, 10, 1606. https://doi.org/10.3390/electronics10131606
Senanayake J, Kalutarage H, Al-Kadri MO. Android Mobile Malware Detection Using Machine Learning: A Systematic Review. Electronics. 2021; 10(13):1606. https://doi.org/10.3390/electronics10131606
Chicago/Turabian StyleSenanayake, Janaka, Harsha Kalutarage, and Mhd Omar Al-Kadri. 2021. "Android Mobile Malware Detection Using Machine Learning: A Systematic Review" Electronics 10, no. 13: 1606. https://doi.org/10.3390/electronics10131606
APA StyleSenanayake, J., Kalutarage, H., & Al-Kadri, M. O. (2021). Android Mobile Malware Detection Using Machine Learning: A Systematic Review. Electronics, 10(13), 1606. https://doi.org/10.3390/electronics10131606
 
         
                                                

 
       