Detecting and Analyzing Botnet Nodes via Advanced Graph Representation Learning Tools †
Abstract
:1. Introduction
2. Case Study: Detection of Bot Nodes Within Unknown Networks in Cloud Environments
- Input Layer: This layer serves as input for the system, as it collects data from both network traffic and node activities in the Cloud. The collected data may consist of various types of information, such as logs, network traffic measurements, and different signals, these information help in discovering the features and characteristics of the targeted botnet.
- Graph Learning Layer: This part of the system is responsible for generating a graph that contains information about node interactions. At this level, graph learning techniques are used to model the relationships between all nodes in the graph. The goal is to discover groups of nodes exhibiting unusual or suspicious activity. Therefore, the detection of these nodes led to the revealing of the key operational behaviors of active botnets. Additionally, the graph learning model is trained using GNNs. The functionality of the Graph Learning Layer is supported by the integration of the following components:
- Original Graph: This component represents the primary graph, which provides the foundation for the data used to train and test the models. Graph nodes in the network are represented by circles and connections between them by lines. The graph has different labels or classes, denoted by nodes highlighted in red and blue.
- Training set and Test set: The representation of the original graph provides the opportunity to extract both the training set and test set. These are the data used for training the model and for testing its output. The separation step ensures that the model generalization capability is verified by using unseen data from the model.
- K-Fold Split: This step involves the use of the K-Fold Cross-Validation technique to divide the training set into K subsets, known as folds. This technique permits the model to be trained and validated on a variety of datasets, increasing its accuracy and robustness.
- Base Model: Several instances of GNN are used in the base model, each trained on a different subset of data obtained from K-Fold Split. Each GNNi (where i varies from 1 to S, with S as the number of folds) represents a version of the model that learns from the graph features and the relationships between nodes in a specific data split.
- Training and Test Result: Test results for each GNN are generated for its specific fold, which results in a series of evaluations of the model performance. These outcomes can incorporate metrics such as precision, recall, and other accuracy metrics.
- Average of Results: The results of the different GNNs are combined through an average (indicated with a summation symbol and a final average) to obtain an overall estimate of model performance. By averaging, the variance in the results can be reduced, and the performance measurement becomes more stable.
- Secondary Training Set and Secondary Test Set: The end of the process involves using secondary training and test sets. An iterative training process or a fine-tuning phase is suggested to refine the model based on a new subset of data.
- Big Data Analytics Layer: This layer includes advanced big data analytics in order to handle large quantities of data sourced from the input layer. Here, behavioral triangulation is carried out for multiple nodes and employs higher-level analysis to look for suspicious patterns. Machine learning (ML) and statistical inference techniques are also employed for traffic pattern identification or fraud detection.
- Result Layer: This is the last layer of the analysis, with the resultant outputs of the investigation being displayed. The resulting layer presents the last analysis, to reveal which nodes are part of a botnet and which nodes have shown to be clean. The findings can be used to inform the security system or to inform automatic counteractive measures directed at the infected nodes so as to contain the intrusion.
3. Related Work
4. Analysis of State-of-the-Art Approaches
4.1. The Detection of Botnets
4.2. Graph Representation Learning
- For each node, perform “random passes” starting at that node.
- Treat each walk as a sequence of node-id strings.
- Train a Word2Vec model using the Skip-Gram algorithm on the string sequences obtained previously.
4.3. Cybersecurity Standards
5. Description of Datasets
6. The Proposed Methodology: Inferential SIR-GN
Algorithm 1 SIR-GN [45] |
. |
. |
Begin null; null; null; ; ) do ; end for ) do ) do ; ; ; ; end for end for ; End |
Algorithm 2 Inferential SIR-GN [46] |
. |
. |
Begin null; null; null; ; ) do ; end for ) do ) do ; ; ; ; end for end for ) do ; end for ; ; End |
- Providing robustness against evolving botnet structures, as it captures both local and global topological patterns, which enables robust detection in unknown network topologies.
- Improving generalization capability by allowing the model to perform well on previously unseen datasets with minimal retraining requirements.
- Reducing the risk of overfitting, as the iterative learning process ensures that structural representations remain meaningful across diverse network environments.
- Enhancing classification accuracy by outperforming existing GNN-based approaches by leveraging neural network classifier trained on structurally representative node embedding.
- Structural Representation Learning for Robustness: Unlike conventional machine learning models that rely on static feature sets, SIR-GN captures both local and global structural properties of network nodes. This characteristic mitigates the risk of adversarial perturbations that attempt to mask botnet behavior.
- Generalization across Diverse Topologies: By leveraging iterative representation learning, our model is designed to generalize well across different network topologies. This ability to adapt to unseen network structures ensures that our detection approach is not restricted to predefined attack patterns.
- Adversarial Training and Robust Classification: To further enhance resistance to adversarial attacks, we incorporate adversarial training strategies in our learning process. This involves exposing our model to manipulated botnet samples during training, thereby improving its ability to recognize and neutralize adversarial bot behaviors.
- Detection of Covert Communication Patterns: Many botnets use encrypted communication channels to avoid detection. SIR-GN identifies botnets based on their structural communication patterns rather than relying exclusively on packet content analysis.
- Resilience to Poisoning and Evasion Attacks: Data poisoning attacks, where attackers inject malicious samples to mislead the learning model, pose a significant threat to machine learning-based detection systems. Our approach mitigates this risk by leveraging an iterative representation learning process that aggregates information from multi-hop neighborhoods, thereby reducing the impact of poisoned samples on individual nodes.
7. Experimental Assessment and Analysis
7.1. Setup
- To begin, we contrast the inferential classifier SIR-GN plus a neural network that was trained on 50 graphs from a dataset (botnet topology) and used to classify 96 graphs from that dataset with the ABD-GN one, which was trained on 80% (768) of the dataset graphs and used to classify a test set of 20% (on the same 96 graphs) from the same topology.
- Next, we contrast the ABD-GN classifier, which is trained on 768 graphs from topology and used to classify the test set, with the inferential SIR-GN plus classifier, which is trained on 50 graphs from a single topology and used to classify the test set of 96 graphs from each of the other topology datasets and real P2P attack data.
7.2. Experimental Results
7.3. Discussion and Remarks
- Evolving Botnet Architectures. Botnets are constantly evolving, making it difficult to detect and analyze them. Traditional centralized botnets are being replaced by more resilient Peer-to-Peer (P2P) botnets (e.g., [70]), which are harder to detect and take down.
- Real-Time Detection. Detecting botnet activities in real-time is crucial to prevent damage. However, real-time detection requires efficient algorithms and significant computational resources (e.g., [73]).
- Encrypted Traffic. Many botnets use encryption to hide their communication (e.g., [74]), making it difficult to detect malicious activities. Developing methods to analyze encrypted traffic without compromising privacy is an ongoing challenge.
- Scalability. As the number of devices connected to the internet grows, scalable solutions for botnet detection and analysis are needed (e.g., [75]). This includes handling large volumes of data and detecting botnets in diverse environments.
8. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Priyadarshini, I. Anomaly Detection of IoT Cyberattacks in Smart Cities Using Federated Learning and Split Learning. Big Data Cogn. Comput. 2024, 8, 21. [Google Scholar] [CrossRef]
- Hasani, Z.; Krrabaj, S.; Krasniqi, M. Proposed Model for Real-Time Anomaly Detection in Big IoT Sensor Data for Smart City. Int. J. Interact. Mob. Technol. 2024, 18, 32–44. [Google Scholar] [CrossRef]
- Alanhdi, A.; Toka, L. A Survey on Integrating Edge Computing with AI and Blockchain in Maritime Domain, Aerial Systems, IoT, and Industry 4.0. IEEE Access 2024, 12, 28684–28709. [Google Scholar] [CrossRef]
- Saheb, T.; Izadi, L. Paradigm of IoT Big Data Analytics in the Healthcare Industry: A Review of Scientific Literature and Mapping of Research Trends. Telemat. Inform. 2019, 41, 70–85. [Google Scholar] [CrossRef]
- Ould Rabah, M.A.; Drid, H.; Medjadba, Y.; Rahouti, M. Detection and Mitigation of Distributed Denial of Service Attacks Using Ensemble Learning and Honeypots in a Novel SDN-UAV Network Architecture. IEEE Access 2024, 12, 128929–128940. [Google Scholar] [CrossRef]
- Ming, L.; Leau, Y.; Xie, Y. Distributed Denial of Service Attack in HTTP/2: Review on Security Issues and Future Challenges. IEEE Access 2024, 12, 33296–33308. [Google Scholar] [CrossRef]
- Musa, N.S.; Mirza, N.M.; Rafique, S.H.; Abdallah, A.M.; Murugan, T. Machine Learning and Deep Learning Techniques for Distributed Denial of Service Anomaly Detection in Software Defined Networks—Current Research Solutions. IEEE Access 2024, 12, 17982–18011. [Google Scholar] [CrossRef]
- Asad, H.; Adhikari, S.; Gashi, I. A Perspective-Retrospective Analysis of Diversity in Signature-Based Open-Source Network Intrusion Detection Systems. Int. J. Inf. Secur. 2024, 23, 1331–1346. [Google Scholar] [CrossRef]
- Hussain, A.; Marín-Tordera, E.; Masip-Bruin, X.; Leligou, H.C. Rule-Based with Machine Learning IDS for DDoS Attack Detection in Cyber-Physical Production Systems (CPPS). IEEE Access 2024, 12, 114894–114911. [Google Scholar] [CrossRef]
- Wu, T.; Tian, S.; Tang, S. Transmission Scheduling of P2P Real-Time Communication Based on Restless Multi-Armed Bandit. Telecommun. Syst. 2024, 86, 281–293. [Google Scholar] [CrossRef]
- Joshi, H.P.; Dutta, R. GADFly: A Fast and Robust Algorithm to Detect P2P Botnets in Communication Graphs. In Proceedings of the GLOBECOM’ 18—2018 IEEE Global Communications Conference, Abu Dhabi, United Arab Emirates, 9–13 December 2018; pp. 1–6. [Google Scholar]
- Karuppayah, S.; Böck, L.; Grube, T.; Manickam, S.; Mühlhäuser, M.; Fischer, M. SensorBuster: On Identifying Sensor Nodes in P2P Botnets. In ARES ’17, Proceedings of the 12th ACM International Conference on Availability, Reliability and Security, Reggio Calabria, Italy, 29 August–1 September 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 1–6. [Google Scholar]
- Zhen, Z.; Zhao, X.; Zhang, J.; Wang, Y.; Chen, H. DA-GNN: A Smart Contract Vulnerability Detection Method Based on Dual Attention Graph Neural Network. Comput. Netw. 2024, 242, 110238. [Google Scholar] [CrossRef]
- Liu, Y.; Wang, X.; Meng, T.; Ai, W.; Li, K. LG-GNN: Local and Global Information-Aware Graph Neural Network for Default Detection. Comput. Oper. Res. 2024, 169, 106738. [Google Scholar] [CrossRef]
- Esmaeili, B.; Azmoodeh, A.; Dehghantanha, A.; Srivastava, G.; Karimipour, H.; Lin, J.C. A GNN-Based Adversarial Internet of Things Malware Detection Framework for Critical Infrastructure: Studying Gafgyt, Mirai, and Tsunami Campaigns. IEEE Internet Things J. 2024, 11, 26826–26836. [Google Scholar] [CrossRef]
- Ben Yahia, N. Enhancing Social and Collaborative Learning Using a Stacked GNN-Based Community Detection. Soc. Netw. Anal. Min. 2024, 14, 205. [Google Scholar] [CrossRef]
- Carpenter, J.; Layne, J.; Serra, E.; Cuzzocrea, A.; Gallo, C. Structural Node Representation Learning for Detecting Botnet Nodes. In Computational Science and Its Applications—ICCSA 2023, Proceedings of the 23rd International Conference on Computational Science and Its Applications, Athens, Greece, 3–6 July 2023; Springer: Cham, Switzerland, 2023; pp. 731–743. [Google Scholar]
- Zhou, L.; Pan, S.; Wang, J.; Vasilakos, A.V. Machine Learning on Big Data: Opportunities and Challenges. Neurocomputing 2017, 237, 350–361. [Google Scholar] [CrossRef]
- Nazir, A.; He, J.; Zhu, N.; Wajahat, A.; Ma, X.; Ullah, F.; Qureshi, S.; Pathan, M.S. Advancing IoT Security: A Systematic Review of Machine Learning Approaches for the Detection of IoT Botnets. J. King Saud Univ.—Comput. Inf. Sci. 2023, 35, 101820. [Google Scholar] [CrossRef]
- Lagraa, S.; Husák, M.; Seba, H.; Vuppala, S.; State, R.; Ouedraogo, M. A Review on Graph-Based Approaches for Network Security Monitoring and Botnet Detection. Int. J. Inf. Secur. 2024, 23, 119–140. [Google Scholar] [CrossRef]
- Beigi, E.B.; Jazi, H.H.; Stakhanova, N.; Ghorbani, A.A. Towards Effective Feature Selection in Machine Learning-Based Botnet Detection Approaches. In Proceedings of the 2014 IEEE Conference on Communications and Network Security, San Francisco, CA, USA, 29–31 October 2014; pp. 247–255. [Google Scholar]
- Salih, Y.T.; Fenjan, A.; Ahmed, S.R.; Ali, H.; Abdulwahab, E.N.; Algruri, S.; Kurdi, N.A.; Al-Sarem, M.; Tawfeq, J.F. Machine Learning Approaches for Botnet Detection in Network Traffic. In AICCONF ’24, Proceedings of the 2024 ACM Cognitive Models and Artificial Intelligence Conference, Istanbul, Turkey, 25–26 May 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 310–315. [Google Scholar]
- McDermott, C.D.; Majdani, F.; Petrovski, A. Botnet Detection in the Internet of Things Using Deep Learning Approaches. In Proceedings of the 2018 IEEE International Joint Conference on Neural Networks, Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
- Lefoane, M.; Ghafir, I.; Kabir, S.; Awan, I.U.; El Hindi, K.M.; Mahendran, A. Latent Semantic Analysis and Graph Theory for Alert Correlation: A Proposed Approach for IoT Botnet Detection. IEEE Open J. Commun. Soc. 2024, 5, 3904–3919. [Google Scholar] [CrossRef]
- Ceci, M.; Cuzzocrea, A.; Malerba, D. Supporting Roll-Up and Drill-Down Operations over OLAP Data Cubes with Continuous Dimensions via Density-Based Hierarchical Clustering. In Proceedings of the 19th Italian Symposium on Advanced Database Systems, Maratea, Italy, 26–29 June 2011; pp. 57–65. [Google Scholar]
- Serra, E.; Joaristi, M.; Cuzzocrea, A. Large-Scale Sparse Structural Node Representation. In Proceedings of the 2020 IEEE International Conference on Big Data, Atlanta, GA, USA, 10–13 December 2020; pp. 5247–5253. [Google Scholar]
- Braun, P.; Cuzzocrea, A.; Keding, T.D.; Leung, C.K.; Padzor, A.G.M.; Sayson, D. Game Data Mining: Clustering and Visualization of Online Game Data in Cyber-Physical Worlds. In Proceedings of the 21st International Conference on Knowledge-Based and Intelligent Information & Engineering Systems, Marseille, France, 6–8 September 2017; pp. 2259–2268. [Google Scholar]
- Ali, M.; Shahroz, M.; Mushtaq, M.F.; Alfarhood, S.; Safran, M.S.; Ashraf, I. Hybrid Machine Learning Model for Efficient Botnet Attack Detection in IoT Environment. IEEE Access 2024, 12, 40682–40699. [Google Scholar] [CrossRef]
- Morris, K.J.; Egan, S.D.; Linsangan, J.L.; Leung, C.K.; Cuzzocrea, A.; Hoi, C.S.H. Token-Based Adaptive Time-Series Prediction by Ensembling Linear and Non-Linear Estimators: A Machine Learning Approach for Predictive Analytics on Big Stock Data. In Proceedings of the 17th IEEE International Conference on Machine Learning and Applications, Orlando, FL, USA, 17–20 December 2018; pp. 1486–1491. [Google Scholar]
- Serra, E.; Subrahmanian, V.S. A Survey of Quantitative Models of Terror Group Behavior and an Analysis of Strategic Disclosure of Behavioral Models. IEEE Trans. Comput. Soc. Syst. 2014, 1, 66–88. [Google Scholar] [CrossRef]
- Cuzzocrea, A.; Saccà, D.; Serafino, P. Semantics-Aware Advanced OLAP Visualization of Multidimensional Data Cubes. Int. J. Data Warehous. Min. 2007, 3, 1–30. [Google Scholar] [CrossRef]
- Korzh, O.; Joaristi, M.; Serra, E. Convolutional Neural Network Ensemble Fine-Tuning for Extended Transfer Learning. In Big Data—BigData 2018, Proceedings of the 7th International Congress on Big Data, Seattle, WA, USA, 25–30 June 2018; Springer: Cham, Switzerland, 2018; pp. 110–123. [Google Scholar]
- Cuzzocrea, A. Improving Range-SUM Query Evaluation on Data Cubes via Polynomial Approximation. Data Knowl. Eng. 2006, 56, 85–121. [Google Scholar] [CrossRef]
- Serra, E.; Sharma, A.; Joaristi, M.; Korzh, O. Unknown Landscape Identification with CNN Transfer Learning. In Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Barcelona, Spain, 28–31 August 2018; pp. 813–820. [Google Scholar]
- Serra, E.; Shrestha, A.; Spezzano, F.; Squicciarini, A.C. DeepTrust: An Automatic Framework to Detect Trustworthy Users in Opinion-Based Systems. In CODASPY ’20, Proceedings of the 10th ACM Conference on Data and Application Security and Privacy, New Orleans, LA, USA, 16–18 March 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 29–38. [Google Scholar]
- Joaristi, M.; Serra, E.; Spezzano, F. Inferring Bad Entities through the Panama Papers Network. In Proceedings of the 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Barcelona, Spain, 28–31 August 2018; pp. 767–773. [Google Scholar]
- Joaristi, M.; Serra, E.; Spezzano, F. Detecting Suspicious Entities in Offshore Leaks Networks. Soc. Netw. Anal. Min. 2019, 9, 62. [Google Scholar] [CrossRef]
- Möller, D.P.F. NIST Cybersecurity Framework and MITRE Cybersecurity Criteria. In Guide to Cybersecurity in Digital Transformation: Trends, Methods, Technologies, Applications and Best Practices; Springer: Cham, Switzerland, 2023; Volume 103, pp. 231–271. [Google Scholar]
- Pleshakova, E.; Osipov, A.; Gataullin, S.; Gataullin, T.; Vasilakos, A. Next Gen Cybersecurity Paradigm Towards Artificial General Intelligence: Russian Market Challenges and Future Global Technological Trends. J. Comput. Virol. Hacking Tech. 2024, 20, 429–440. [Google Scholar] [CrossRef]
- CAIDA. The CAIDA UCSD Anonymized Internet Traces. 2018. Available online: https://www.caida.org/catalog/datasets/passive_dataset/ (accessed on 10 December 2024).
- Kaashoek, M.F.; Karger, D.R. Koorde: A Simple Degree-Optimal Distributed Hash Table. In Peer-to-Peer Systems II, Proceedings of the 2nd International Workshop on Peer-To-Peer Systems, Berkeley, CA, USA, 21–22 February 2003; Springer: Berlin/Heidelberg, Germany, 2003; pp. 98–107. [Google Scholar]
- Maymounkov, P.; Mazieres, D. Kademlia: A Peer-To-Peer Information System Based on the XOR Metric. In Peer-to-Peer Systems, Proceedings of the 1st International Workshop on Peer-To-Peer Systems, Cambridge, MA, USA, 7–8 March 2002; Springer: Berlin/Heidelberg, Germany, 2002; pp. 53–65. [Google Scholar]
- Stoica, I.; Morris, R.; Karger, D.R.; Kaashoek, M.F.; Balakrishnan, H. Chord: A Scalable Peer-To-Peer Lookup Service for Internet Applications. In SIGCOMM ’01, Proceedings of the 2001 ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, San Diego, CA, USA, 27–31 August 2001; Association for Computing Machinery: New York, NY, USA, 2001; pp. 149–160. [Google Scholar]
- Jelasity, M.; Bilicki, V. Towards Automated Detection of Peer-To-Peer Botnets: On the Limits of Local Approaches. In LEET’09, Proceedings of the 2nd USENIX Workshop on Large-Scale Exploits and Emergent Threats, Boston, MA, USA, 22–24 April 2009; USENIX Association: Berkeley, CA, USA, 2009. [Google Scholar]
- Layne, J.; Serra, E. INFSIR-GN: Inferential Labeled Node and Graph Representation Learning. arXiv 2021, arXiv:1918.10503. [Google Scholar]
- Joaristi, M.; Serra, E. SIR-GN: A Fast Structural Iterative Representation Learning Approach for Graph Nodes. ACM Trans. Knowl. Discov. Data 2021, 15, 100. [Google Scholar] [CrossRef]
- Yumlembam, R.; Issac, B.; Jacob, S.M.; Yang, L. Comprehensive Botnet Detection by Mitigating Adversarial Attacks, Navigating the Subtleties of Perturbation Distances and Fortifying Predictions with Conformal Layers. Inf. Fusion 2024, 111, 102529. [Google Scholar] [CrossRef]
- Krishnan, D.; Shrinath, P. Robust IoT Botnet Detection Framework Resilient to Gradient Based Adversarial Attacks. SN Comput. Sci. 2024, 5, 870. [Google Scholar] [CrossRef]
- Zhou, J.; Xu, Z.; Rush, A.M.; Yu, M. Automating Botnet Detection with Graph Neural Networks. arXiv 2020, arXiv:2003.06344. [Google Scholar]
- Al-Mashhadi, S.; Anbar, M.; Hasbullah, I.H.; Alamiedy, T.A. Hybrid Rule-Based Botnet Detection Approach Using Machine Learning for Analysing DNS Traffic. PeerJ Comput. Sci. 2021, 7, e640. [Google Scholar] [CrossRef]
- Almuqren, L.; Alqahtani, H.; Aljameel, S.S.; Salama, A.S.; Yaseen, I.; Alneil, A.A. Hybrid Metaheuristics With Machine Learning Based Botnet Detection in Cloud Assisted Internet of Things Environment. IEEE Access 2023, 11, 115668–115676. [Google Scholar] [CrossRef]
- Guerra-Manzanares, A.; Bahsi, H.; Nomm, S. Hybrid Feature Selection Models for Machine Learning Based Botnet Detection in IoT Networks. In Proceedings of the 2019 International Conference on Cyberworlds, Kyoto, Japan, 2–4 October 2019; pp. 324–327. [Google Scholar]
- May Raju, P.; Gupta, G.P. Intrusion Detection Framework using an Improved Deep Reinforcement Learning Technique for IoT Network. In Soft Computing for Security Applications, Proceedings of the 2021 International Conference on Soft Computing for Security Applications, Salem, India, 10–11 June 2021; Springer: Singapore, 2021; pp. 765–779. [Google Scholar]
- Manimurugan, S. IoT-Fog-Cloud Model for Anomaly Detection using Improved Naïve Bayes and Principal Component Analysis. J. Ambient. Intell. Humaniz. Comput. 2021, 1–10. [Google Scholar] [CrossRef]
- Liu, J.; Liu, S.; Zhang, S. Detection of IoT Botnet Based on Deep Learning. In Proceedings of the 2019 IEEE Chinese Control Conference, Guangzhou, China, 27–30 July 2019; pp. 8381–8385. [Google Scholar]
- Bahsi, H.; Nomm, S.; La Torre, F.B. Dimensionality Reduction for Machine Learning Based IoT Botnet Detection. In Proceedings of the 15th IEEE International Conference on Control, Automation, Robotics and Vision, Singapore, 18–21 November 2018; pp. 1857–1862. [Google Scholar]
- Yin, L.; Luo, X.; Zhu, C.; Wang, L.; Xu, Z.; Lu, H. ConnSpoiler: Disrupting C&C Communication of IoT-Based Botnet Through Fast Detection of Anomalous Domain Queries. IEEE Trans. Ind. Inform. 2020, 16, 1373–1384. [Google Scholar]
- Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Slay, J. Towards Developing Network Forensic Mechanism for Botnet Activities in the IoT Based on Machine Learning Techniques. In Mobile Networks and Management, Proceedings of the 9th International Conference on Mobile Networks and Management, Melbourne, Australia, 13–15 December 2017; Springer: Cham, Switzerland, 2017; pp. 30–44. [Google Scholar]
- Al Shorman, A.R.; Faris, H.; Aljarah, I. Unsupervised Intelligent System Based on One Class Support Vector Machine and Grey Wolf Optimization for IoT Botnet Detection. J. Ambient. Intell. Humaniz. Comput. 2020, 11, 2809–2825. [Google Scholar] [CrossRef]
- Guerra-Manzanares, A.; Medina-Galindo, J.; Bahsi, H.; Nõmm, S. MedBIoT: Generation of an IoT Botnet Dataset in a Medium-Sized IoT Network. In Proceedings of the 6th International Conference on Information Systems Security and Privacy, Valletta, Malta, 25–27 February 2020; pp. 207–218. [Google Scholar]
- Gandhi, R.; Li, Y. Comparing Machine Learning and Deep Learning for IoT Botnet Detection. In Proceedings of the 2021 IEEE International Conference on Smart Computing, Irvine, CA, USA, 23–27 August 2021; pp. 234–239. [Google Scholar]
- Nguyen, H.T.; Ngo, Q.D.; Le, V.H. IoT Botnet Detection Approach Based on PSI Graph and DGCNN Classifier. In Proceedings of the 2018 IEEE International Conference on Information Communication and Signal Processing, Singapore, 28–30 September 2018; pp. 118–122. [Google Scholar]
- Cunningham, P.; Delany, S.J. K-Nearest Neighbour Classifiers—A Tutorial. ACM Comput. Surv. 2022, 54, 128. [Google Scholar] [CrossRef]
- Patel, H.H.; Prajapati, P. Study and Analysis of Decision Tree Based Classification Algorithms. Int. J. Comput. Sci. Eng. 2018, 6, 74–78. [Google Scholar] [CrossRef]
- Resende, P.A.A.; Drummond, A.C. A Survey of Random Forest Based Methods for Intrusion Detection Systems. ACM Comput. Surv. 2018, 51, 48. [Google Scholar] [CrossRef]
- Wazzan, M.; Alghazzawi, D.M.; Albeshri, A.; Hasan, S.H.; Rabie, O.B.J.; Asghar, M.Z. Cross Deep Learning Method for Effectively Detecting the Propagation of IoT Botnet. Sensors 2022, 22, 3895. [Google Scholar] [CrossRef]
- Yu, B.; Cuzzocrea, A.; Jeong, D.H.; Maydebura, S. On Managing Very Large Sensor-Network Data Using Bigtable. In Proceedings of the 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, Ottawa, ON, Canada, 13–16 May 2012; pp. 918–922. [Google Scholar]
- Setiawan, Y.; Maulidevi, N.U.; Surendro, K. The Optimization of n-Gram Feature Extraction Based on Term Occurrence for Cyberbullying Classification. Data Sci. J. 2024, 23, 31. [Google Scholar] [CrossRef]
- Singh, N.M.; Sharma, S.K. An Efficient Automated Multi-Modal Cyberbullying Detection using Decision Fusion Classifier on Social Media Platforms. Multimed. Tools Appl. 2024, 83, 20507–20535. [Google Scholar] [CrossRef]
- Dehkordi, M.J.; Sadeghiyan, B. Reconstruction of C&C Channel for P2P Botnet. IET Commun. 2020, 14, 1318–1326. [Google Scholar]
- Vajrobol, V.; Gupta, B.B.; Gaurav, A.; Chuang, H.M. Adversarial Learning for Mirai Botnet Detection based on Long Short-Term Memory and XGBoost. Int. J. Cogn. Comput. Eng. 2024, 5, 153–160. [Google Scholar] [CrossRef]
- Hoang, X.D.; Vu, X.H. An Improved Model for Detecting DGA Botnets using Random Forest Algorithm. Inf. Secur. J. Glob. Perspect. 2022, 31, 441–450. [Google Scholar] [CrossRef]
- Masoudi-Sobhanzadeh, Y.; Emami-Moghaddam, S. A Real-Time IoT-based Botnet Detection Method using a Novel Two-Step Feature Selection Technique and the Support Vector Machine Classifier. Comput. Netw. 2022, 217, 109365. [Google Scholar] [CrossRef]
- Zhang, H.; Papadopoulos, C.; Massey, D. Detecting Encrypted Botnet Traffic. In Proceedings of the 2013 IEEE INFOCOM, Turin, Italy, 14–19 April 2013; pp. 3453–3458. [Google Scholar]
- Mousavi, S.H.; Khansari, M.; Rahmani, R. A Fully Scalable Big Data Framework for Botnet Detection based on Network Traffic Analysis. Inf. Sci. 2020, 512, 629–640. [Google Scholar] [CrossRef]
- Lin, Y.D.; Chan, W.H.; Lai, Y.C.; Yu, C.M.; Wu, Y.S.; Lee, W.B. Enhancing can Security with ML-based IDS: Strategies and Efficacies Against Adversarial Attacks. Comput. Secur. 2025, 151, 104322. [Google Scholar] [CrossRef]
- Gómez, A.L.P.; Maimó, L.F.; Celdrán, A.H.; Clemente, F.J.G. Detection of Adversarial Attacks Using Deep Learning and Features Extracted from Interpretability Methods in Industrial Scenarios. IEEE Access 2025, 13, 2705–2722. [Google Scholar] [CrossRef]
- Coronato, A.; Cuzzocrea, A. An Innovative Risk Assessment Methodology for Medical Information Systems. IEEE Trans. Knowl. Data Eng. 2020, 34, 3095–3110. [Google Scholar] [CrossRef]
- Leung, C.K.; Cuzzocrea, A.; Mai, J.J.; Deng, D.; Jiang, F. Personalized DeepInf: Enhanced Social Influence Prediction with Deep Learning and Transfer Learning. In Proceedings of the 2019 IEEE International Conference on Big Data, Los Angeles, CA, USA, 9–12 December 2019; pp. 2871–2880. [Google Scholar]
- Leung, C.K.; Braun, P.; Hoi, C.S.H.; Souza, J.; Cuzzocrea, A. Urban Analytics of Big Transportation Data for Supporting Smart Cities. In Big Data Analytics and Knowledge Discovery, Proceedings of the 21st International Conference on Big Data Analytics and Knowledge Discovery, Linz, Austria, 26–29 August 2019; Springer: Cham, Switzerland, 2019; pp. 24–33. [Google Scholar]
- Leung, C.K.; Chen, Y.; Hoi, C.S.H.; Shang, S.; Wen, Y.; Cuzzocrea, A. Big Data Visualization and Visual Analytics of COVID-19 Data. In Proceedings of the 24th IEEE International Conference on Information Visualisation, Melbourne, Australia, 7–11 September 2020; pp. 415–420. [Google Scholar]
- Leung, C.K.; Chen, Y.; Hoi, C.S.H.; Shang, S.; Cuzzocrea, A. Machine Learning and OLAP on Big COVID-19 Data. In Proceedings of the 2020 IEEE International Conference on Big Data, Atlanta, GA, USA, 10–13 December 2020; pp. 5118–5127. [Google Scholar]
- Barkwell, K.E.; Cuzzocrea, A.; Leung, C.K.; Ocran, A.A.; Sanderson, J.M.; Stewart, J.A.; Wodi, B.H. Big Data Visualisation and Visual Analytics for Music Data Mining. In Proceedings of the 22nd IEEE International Conference Information Visualisation, Fisciano, Italy, 10–13 July 2018; pp. 235–240. [Google Scholar]
- Camara, R.C.; Cuzzocrea, A.; Grasso, G.M.; Leung, C.K.; Powell, S.B.; Souza, J.; Tang, B. Fuzzy Logic-Based Data Analytics on Predicting the Effect of Hurricanes on the Stock Market. In Proceedings of the 2018 IEEE International Conference on Fuzzy Systems, Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
- Shi, M.; Tang, Y.; Zhu, X.; Liu, J. Multi-Label Graph Convolutional Network Representation Learning. IEEE Trans. Big Data 2022, 8, 1169–1181. [Google Scholar] [CrossRef]
Notation | Description |
---|---|
The number of clusters chosen for node representation | |
The number of clusters chosen for graph representation | |
The depth of exploration, equal to a node’s k-hop neighborhood |
Trained on Chord | Trained on Debru | |||||||||
Chord | Kadem | Debru | Leet | P2P | Chord | Kadem | Debru | Leet | P2P | |
ABD-GN | 99.0 | 97.5 | 99.6 | 99.4 | 0.0 | 10.0 | 2.5 | 100.0 | 0.0 | 2.5 |
SIR-GN | 99.4 | 93.0 | 100.0 | 99.0 | 97.0 | 93.0 | 94.0 | 99.5 | 92.5 | 97.0 |
Trained on Kadem | Trained on Leet | |||||||||
Chord | Kadem | Debru | Leet | P2P | Chord | Kadem | Debru | Leet | P2P | |
ABD-GN | 97.0 | 98.0 | 99.5 | 99.0 | 2.5 | 73.0 | 95.0 | 100.0 | 100.0 | 2.0 |
SIR-GN | 99.0 | 99.0 | 99.0 | 99.0 | 97.5 | 99.0 | 99.2 | 94.5 | 100.0 | 98.0 |
Trained on P2P | ||||||||||
Chord | Kadem | Debru | Leet | P2P | ||||||
ABD-GN | 15.0 | 22.5 | 16.0 | 17.5 | 99.5 | |||||
SIR-GN | 93.0 | 93.0 | 93.0 | 93.0 | 97.5 |
Trained on Chord | Trained on Debru | ||||
1 Graph | 50 Graphs | 1 Graph | 50 Graphs | ||
Chord | SIR-GN | 99.5 | 99.5 | 92.5 | 92.5 |
ABD-GN | 98.7 | 98.7 | 90.3 | 90.5 | |
Debru | SIR-GN | 99.9 | 100.0 | 99.8 | 99.8 |
ABD-GN | 98.0 | 99.0 | 98.9 | 99.1 | |
Kadem | SIR-GN | 94.0 | 96.0 | 94.0 | 93.0 |
ABD-GN | 92.5 | 94.0 | 92.0 | 92.0 | |
Leet | SIR-GN | 92.5 | 99.0 | 92.25 | 92.25 |
ABD-GN | 88.0 | 90.0 | 90.5 | 90.5 | |
P2P | SIR-GN | 98.0 | 98.0 | 97.0 | 97.0 |
ABD-GN | 95.0 | 95.0 | 93.0 | 90.0 | |
Trained on Kadem | Trained on Leet | ||||
1 Graph | 50 Graphs | 1 Graph | 50 Graphs | ||
Chord | SIR-GN | 99.0 | 99.0 | 99.0 | 99.0 |
ABD-GN | 98.0 | 98.0 | 98.0 | 98.0 | |
Debru | SIR-GN | 98.5 | 99.0 | 93.0 | 95.0 |
ABD-GN | 95.5 | 95.5 | 89.5 | 90.0 | |
Kadem | SIR-GN | 99.25 | 99.0 | 99.0 | 99.25 |
ABD-GN | 98.0 | 98.5 | 98.5 | 98.0 | |
Leet | SIR-GN | 99.25 | 99.0 | 99.5 | 100.0 |
ABD-GN | 98.5 | 98.5 | 98.0 | 98.0 | |
P2P | SIR-GN | 98.0 | 97.0 | 98.0 | 98.0 |
ABD-GN | 95.0 | 95.5 | 95.0 | 95.5 | |
Trained on P2P | |||||
1 Graph | 50 Graph | ||||
Chord | SIR-GN | 93.0 | 93.0 | ||
ABD-GN | 89.0 | 90.0 | |||
Debru | SIR-GN | 93.0 | 93.0 | ||
ABD-GN | 90.0 | 90.0 | |||
Kadem | SIR-GN | 93.0 | 93.0 | ||
ABD-GN | 89.0 | 90.0 | |||
Leet | SIR-GN | 93.0 | 93.0 | ||
ABD-GN | 89.0 | 90.0 | |||
P2P | SIR-GN | 98.0 | 98.0 | ||
ABD-GN | 95.0 | 96.0 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cuzzocrea, A.; Hafsaoui, A.; Gallo, C. Detecting and Analyzing Botnet Nodes via Advanced Graph Representation Learning Tools. Algorithms 2025, 18, 253. https://doi.org/10.3390/a18050253
Cuzzocrea A, Hafsaoui A, Gallo C. Detecting and Analyzing Botnet Nodes via Advanced Graph Representation Learning Tools. Algorithms. 2025; 18(5):253. https://doi.org/10.3390/a18050253
Chicago/Turabian StyleCuzzocrea, Alfredo, Abderraouf Hafsaoui, and Carmine Gallo. 2025. "Detecting and Analyzing Botnet Nodes via Advanced Graph Representation Learning Tools" Algorithms 18, no. 5: 253. https://doi.org/10.3390/a18050253
APA StyleCuzzocrea, A., Hafsaoui, A., & Gallo, C. (2025). Detecting and Analyzing Botnet Nodes via Advanced Graph Representation Learning Tools. Algorithms, 18(5), 253. https://doi.org/10.3390/a18050253