Detecting Malicious .NET Executables Using Extracted Methods Names
Abstract
:1. Introduction
- Develop a framework for detecting malicious .NET executables using extracted method names.
- Create a dataset by collecting malware and benign .NET executables from online sources and extract .NET methods from these samples.
- Evaluate the effectiveness of accurately detecting .NET malware using only .NET method names.
- Compare the performance of different machine learning models and identify the most accurate model for .NET malware detection.
2. Related Work
3. Preliminaries
3.1. Malware Analysis Techniques
- A.
- Static analysis
- B.
- Dynamic analysis
3.2. .NET vs. C/C++ Executables
- A.
- .NET Executables
- B.
- C/C++ Executables
3.3. Decompiling .NET Executables for Malware Analysis
4. Proposed Framework
4.1. Portable Executable Samples Collection
4.2. Features Extraction: .NET Methods Extraction
Algorithm 1: .NET Features extraction |
Input: Directory path of .NET executables samples Output: Features of each sample written in a text file Begin source_dir ← Directory path of .NET executables samples result_dir ← Directory path of the samples features for each file in source_dir do load the assembly file and iterate through modules, types, and methods for each method do if method is a .NET standard method then preprocess the method name add the cleaned method name to a HashSet of .NET method names end if end for if the total number of unique .NET method names is >= 10 then create a new text file in result_dir write methods names from the HashSet to a text file else: skip to next assembly end if end for |
4.3. Dataset Creation
- Count the frequency of method names within both the malware class and the benign class.
- Filter out method names with a frequency below a predefined threshold (frequency threshold equals to 50 in our case) from both classes.
- Identify the top 30 most frequent method names in each class separately.
- Determine the set of common method names between malware and benign classes by intersecting the two sets.
- Merge the feature set of both classes using the OR operation between the sets.
- Remove any common method names from the merged feature set.
- A.
- Machine Learning Training
5. Evaluation and Results
5.1. Experimental Results
5.2. Feature Importnace
5.3. Impact of Feature Length on Classification Performance
6. Limitations and Future Work
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- BillWagner. NET Managed Languages Strategy. Microsoft, 6 February 2023. Available online: https://learn.microsoft.com/en-us/dotnet/fundamentals/languages/ (accessed on 28 March 2024).
- Maniriho, P.; Mahmood, A.N.; Chowdhury, M.J.M. API-MalDetect: Automated malware detection framework for windows based on API calls and deep learning techniques. J. Netw. Comput. Appl. 2023, 218, 103704. [Google Scholar] [CrossRef]
- D’Angelo, G.; Ficco, M.; Palmieri, F. Malware detection in mobile environments based on Autoencoders and API-images. J. Parallel Distrib. Comput. 2020, 137, 26–33. [Google Scholar] [CrossRef]
- Amer, E.; Zelinka, I. A dynamic Windows malware detection and prediction method based on contextual understanding of API call sequence. Comput. Secur. 2020, 92, 101760. [Google Scholar] [CrossRef]
- Li, C.; Lv, Q.; Li, N.; Wang, Y.; Sun, D.; Qiao, Y. A novel deep framework for dynamic malware detection based on API sequence intrinsic features. Comput. Secur. 2022, 116, 102686. [Google Scholar] [CrossRef]
- Jang-Jaccard, J.; Nepal, S. A survey of emerging threats in cybersecurity. J. Comput. Syst. Sci. 2014, 80, 973–993. [Google Scholar] [CrossRef]
- Mani, G.; Sethuraman, S.C. A comprehensive survey on deep learning based malware detection techniques. Comput. Sci. Rev. 2023, 47, 100529. [Google Scholar] [CrossRef]
- Shankarapani, M.K.; Ramamoorthy, S.; Movva, R.S.; Mukkamala, S. Malware detection using assembly and API call sequences. J. Comput. Virol. 2011, 7, 107–119. [Google Scholar] [CrossRef]
- Introducing .NET Assemblies. In Pro VB 2008 and the.NET 3.5 Platform; Apress: Berkeley, CA, USA, 2008; pp. 437–481. [CrossRef]
- Troelsen, A.; Japikse, P. Understanding CIL and the Role of Dynamic Assemblies. In Pro C# 8 with .NET Core 3; Apress: Berkeley, CA, USA, 2020; pp. 661–696. [Google Scholar] [CrossRef]
- Rabadi, D.; Teo, S.G. Advanced Windows Methods on Malware Detection and Classification. Assoc. Comput. Mach. 2020, 54–68. [Google Scholar] [CrossRef]
- Pistelli, D. The .NET File Format. CodeProject. Available online: https://www.codeproject.com/Articles/12585/The-NET-File-Format (accessed on 4 August 2024).
- Richter, J. Applied Microsoft: NET Framework Programming; Microsoft Press Redmond: Redmond, WA, USA, 2002; Volume 1. [Google Scholar]
- Zhang, S.; Wu, J.; Zhang, M.; Yang, W. Dynamic Malware Analysis Based on API Sequence Semantic Fusion. Appl. Sci. 2023, 13, 6526. [Google Scholar] [CrossRef]
- Shin, K.; Lee, Y.; Lim, J.; Kang, H.; Lee, S. System API Vectorization for Malware Detection. IEEE Access 2023, 11, 53788–53805. [Google Scholar] [CrossRef]
- Cui, L.; Cui, J.; Ji, Y.; Hao, Z.; Li, L.; Ding, Z. API2Vec: Learning Representations of API Sequences for Malware Detection. In Proceedings of the ISSTA 2023—Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, Seattle, WA, USA, 17–21 July 2023; pp. 261–273. [Google Scholar] [CrossRef]
- Prachi; Dabas, N.; Sharma, P. MalAnalyser: An effective and efficient Windows malware detection method based on API call sequences. Expert Syst. Appl. 2023, 230, 120756. [Google Scholar] [CrossRef]
- Almousa, M.; Basavaraju, S.; Anwar, M. API-Based Ransomware Detection Using Machine Learning-Based Threat Detection Models. In Proceedings of the 2021 18th International Conference on Privacy, Security and Trust, PST 2021, Auckland, New Zealand, 13–15 December 2021. [Google Scholar] [CrossRef]
- Mathew, J.; Kumara, M.A.A. API call based malware detection approach using recurrent neural network—LSTM. In Advances in Intelligent Systems and Computing; Springer: Berlin/Heidelberg, Germany, 2020; pp. 87–99. [Google Scholar] [CrossRef]
- Catak, F.O.; Yazi, A.F.; Elezaj, O.; Ahmed, J. Deep learning based Sequential model for malware analysis using Windows exe API Calls. PeerJ Comput Sci 2020, 6, e285. [Google Scholar] [CrossRef] [PubMed]
- “0xd4d/dnlib.” 0xd4d, 29 February 2024. Available online: https://github.com/0xd4d/dnlib (accessed on 3 March 2024).
- Abujayyab, S.K.M.; Almajalid, R.; Wazirali, R.; Ahmad, R.; Taşoğlu, E.; Karas, I.R.; Hijazi, I. Integrating object-based and pixel-based segmentation for building footprint extraction from satellite images. J. King Saud Univ.—Comput. Inf. Sci. 2023, 35, 101802. [Google Scholar] [CrossRef]
- Ahmad, R. Smart remote sensing network for disaster management: An overview. Telecommun. Syst. 2024, 87, 213–237. [Google Scholar] [CrossRef]
- Liu, G.; Zhao, H.; Fan, F.; Liu, G.; Xu, Q.; Nazir, S. An Enhanced Intrusion Detection Model Based on Improved kNN in WSNs. Sensors 2022, 22, 1407. [Google Scholar] [CrossRef]
- Zidi, S.; Moulahi, T.; Alaya, B. Fault detection in wireless sensor networks through SVM classifier. IEEE Sens J 2018, 18, 340–347. [Google Scholar] [CrossRef]
- Dener, M.; Ok, G.; Orman, A. Malware Detection Using Memory Analysis Data in Big Data Environment. Appl. Sci. 2022, 12, 8604. [Google Scholar] [CrossRef]
- Maniriho, P.; Mahmood, A.N.; Chowdhury, M.J.M. EarlyMalDetect: A Novel Approach for Early Windows Malware Detection Based on Sequences of API Calls. arXiv 2024, arXiv:2407.13355. [Google Scholar] [CrossRef]
- Ahmed, M.; Afreen, N.; Ahmed, M.; Sameer, M.; Ahamed, J. An inception V3 approach for malware classification using machine learning and transfer learning. Int. J. Intell. Netw. 2023, 4, 11–18. [Google Scholar] [CrossRef]
- Manna, M.; Case, A.; Ali-Gombe, A.; Richard, G.G. Memory analysis of .NET and .Net Core applications. Forensic Sci. Int. Digit. Investig. 2022, 42, 301404. [Google Scholar] [CrossRef]
- Or-Meir, O.; Nissim, N.; Elovici, Y.; Rokach, L. Dynamic malware analysis in the modern era—A state of the art survey. ACM Comput. Surv. 2019, 52, 1–48. [Google Scholar] [CrossRef]
- Souri, A.; Hosseini, R. A state-of-the-art survey of malware detection approaches using data mining techniques. Hum.-Centric Comput. Inf. Sci. 2018, 8, 3. [Google Scholar] [CrossRef]
- Dick, J.R.; Kent, K.B.; Libby, J.C. A partitioning analysis of the.NET common language runtime. In Proceedings of the International Symposium and Workshop on Engineering of Computer Based Systems, Tucson, AR, USA, 27–29 November 2007; pp. 317–323. [Google Scholar] [CrossRef]
- Microsoft. What Is .NET Framework? A Software Development Framework. Available online: https://dotnet.microsoft.com/en-us/learn/dotnet/what-is-dotnet-framework/ (accessed on 28 March 2024).
- ARM. Arm Architecture. Available online: https://www.arm.com/architecture (accessed on 4 August 2024).
- MalwareBazaar. MalwareBazaar|Malware Sample Exchange. Available online: https://bazaar.abuse.ch/ (accessed on 3 March 2024).
- VirusShare. VirusShare.com. Available online: https://virusshare.com/ (accessed on 3 March 2024).
- SourceForge. Compare, Download & Develop Open Source & Business Software—SourceForge. Available online: https://sourceforge.net/ (accessed on 3 March 2024).
- GitHub. GitHub: Let’s Build from Here. GitHub. Available online: https://github.com/ (accessed on 3 March 2024).
- Ventura, E.C. Pefile. 2023. Available online: https://github.com/erocarrera/pefile (accessed on 3 March 2024).
- Galal, H.S.; Mahdy, Y.B.; Atiea, M.A. Behavior-based features model for malware detection. J. Comput. Virol. Hack Tech. 2016, 12, 59–67. [Google Scholar] [CrossRef]
- Banin, S.; Dyrkolbotn, G.O. Multinomial malware classification via low-level features. Digit. Investig. 2018, 26, S107–S117. [Google Scholar] [CrossRef]
- Syeda, D.Z.; Asghar, M.N. Dynamic Malware Classification and API Categorisation of Windows Portable Executable Files Using Machine Learning. Appl. Sci. 2024, 14, 1015. [Google Scholar] [CrossRef]
- Singh, J.; McCann, B.; Socher, R.; Xiong, C. BERT is Not an Interlingua and the Bias of Tokenization. In Proceedings of the 2nd Workshop on Deep Learning Approaches for Low-Resource NLP (DeepLo 2019), Hong Kong, China, 3 November 2019; pp. 47–55. [Google Scholar] [CrossRef]
- Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. Available online: http://arxiv.org/abs/1301.3781 (accessed on 3 March 2024).
Method Name | Frequency in Malware Class | Frequency in Benign Class |
---|---|---|
system.string::concat() | 1377 | 2864 |
system.string::get_length() | 1091 | 1984 |
system.string::replace() | 923 | 1325 |
system.diagnostics.process::start() | 757 | 1368 |
system.string::op_equality() | 810 | 2196 |
system.threading.thread::sleep() | 795 | 801 |
system.string::get_chars() | 841 | 920 |
system.io.stream::close() | 705 | 944 |
system.io.file::exists() | 727 | 1571 |
system.string::format() | 735 | 1524 |
Rank | Malware Class—Method Name | Count | Benign Class—Method Name | Count |
---|---|---|---|---|
1 | system.text.encoding::getstring() | 860 | system.string::op_inequality() | 1705 |
2 | system.text.encoding::get_utf8() | 855 | system.string::trim() | 1135 |
3 | system.text.encoding::getbytes() | 850 | system.string::indexof() | 1077 |
4 | system.convert::frombase64string() | 807 | system.io.textwriter::writeline() | 981 |
5 | system.io.stream::read() | 724 | system.io.directory::exists() | 927 |
6 | system.io.stream::write() | 720 | system.string::startswith() | 923 |
7 | system.runtime.compilerservices.runtimehelpers::initializearray() | 704 | system.reflection.assembly::getname() | 887 |
8 | system.io.memorystream::toarray() | 674 | system.text.stringbuilder::append() | 872 |
9 | system.reflection.assembly::load() | 617 | system.io.textwriter::close() | 836 |
10 | system.io.stream::get_length() | 616 | system.string::equals() | 796 |
Class | Before Dataset Creation Process | After Dataset Creation Process | Reduction Percentage |
---|---|---|---|
Malware | 8759 | 1598 | 81% |
Benign | 5248 | 2435 | 37% |
Class | Before Dataset Balancing | After Dataset Balancing | Reduction Percentage |
---|---|---|---|
Malware | 1598 | 1500 | 6% |
Benign | 2435 | 1500 | 38% |
Model | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|
XGBoost | 96.16% | 96.17% | 96.14% | 96.15% |
Random forest | 95.36% | 96.94% | 93.96% | 95.28% |
KNN | 90.73% | 90.25% | 90.51% | 90.71% |
SVM | 95.16% | 96.41% | 93.79% | 95.08% |
Logistic regression | 95.3% | 95.6% | 94.92% | 95.27% |
Naïve Bayes | 88.66% | 91.79% | 84.96% | 88.24% |
Model | Features Length | Accuracy | Precision | Recall | F1-Score |
---|---|---|---|---|---|
XGBoost | 20 | 88.00% | 93.85% | 78.67% | 85.60% |
40 | 94.33% | 95.07% | 92.27% | 93.65% | |
60 | 94.83% | 94.79% | 93.75% | 94.26% | |
80 | 95.50% | 95.53% | 94.48% | 95.00% | |
100 | 96.00% | 95.25% | 95.95% | 95.60% | |
Random forest | 20 | 91.00% | 96.58% | 83.08% | 89.32% |
40 | 94.00% | 97.20% | 89.33% | 93.10% | |
60 | 94.33% | 96.12% | 91.17% | 93.58% | |
80 | 95.00% | 96.53% | 92.27% | 94.36% | |
100 | 95.50% | 96.57% | 93.38% | 94.95% | |
KNN | 20 | 83.16% | 82.75% | 79.41% | 81.05% |
40 | 87.50% | 87.45% | 84.55% | 85.98% | |
60 | 89.50% | 89.13% | 87.50% | 88.31% | |
80 | 89.83% | 89.51% | 87.86% | 88.68% | |
100 | 90.50% | 89.96% | 88.97% | 89.46% | |
SVM | 20 | 88.33% | 95.08% | 78.30% | 85.88% |
40 | 92.50% | 95.21% | 87.86% | 91.39% | |
60 | 93.33% | 94.61% | 90.44% | 92.48% | |
80 | 94.33% | 95.07% | 92.27% | 93.65% | |
100 | 94.66% | 94.77% | 93.38% | 94.07% | |
Logistic regression | 20 | 87.04% | 93.04% | 78.67% | 85.25% |
40 | 93.00% | 94.57% | 89.70% | 92.07% | |
60 | 92.66% | 93.84% | 89.70% | 91.72% | |
80 | 93.50% | 94.63% | 90.80% | 92.68% | |
100 | 94.50% | 94.75% | 93.01% | 93.87% | |
Naïve Bayes | 20 | 88.83% | 90.51% | 84.19% | 87.23% |
40 | 89.33% | 91.60% | 84.19% | 87.73% | |
60 | 89.50% | 91.63% | 84.55% | 87.95% | |
80 | 89.66% | 92.00% | 84.55% | 88.12% | |
100 | 89.66% | 92.00% | 84.55% | 88.12% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Thabit, H.; Ahmad, R.; Abdullah, A.; Abualkishik, A.Z.; Alwan, A.A. Detecting Malicious .NET Executables Using Extracted Methods Names. AI 2025, 6, 20. https://doi.org/10.3390/ai6020020
Thabit H, Ahmad R, Abdullah A, Abualkishik AZ, Alwan AA. Detecting Malicious .NET Executables Using Extracted Methods Names. AI. 2025; 6(2):20. https://doi.org/10.3390/ai6020020
Chicago/Turabian StyleThabit, Hamdan, Rami Ahmad, Ahmad Abdullah, Abedallah Zaid Abualkishik, and Ali A. Alwan. 2025. "Detecting Malicious .NET Executables Using Extracted Methods Names" AI 6, no. 2: 20. https://doi.org/10.3390/ai6020020
APA StyleThabit, H., Ahmad, R., Abdullah, A., Abualkishik, A. Z., & Alwan, A. A. (2025). Detecting Malicious .NET Executables Using Extracted Methods Names. AI, 6(2), 20. https://doi.org/10.3390/ai6020020