Towards Developing a Robust Intrusion Detection Model Using Hadoop–Spark and Data Augmentation for IoT Networks †
Abstract
:1. Introduction
- 1
- Multi-class classification algorithms in Pyspark are limited to the usage of Random Forest, Decision Trees, Naive Bayes, and Logistic Regression. For this reason, in this paper we proposed the usage of One vs. Rest (OVR) strategy to evaluate the accuracy and performance of other algorithms available in Pyspark such as Gradient Boosted Tree and SVM Linear. We evaluate all the algorithms with the entire BoT-IoT dataset and identify which is the best algorithm in terms of accuracy and performance.
- 2
- The BoT-IoT dataset is an extremely imbalanced dataset; therefore, we propose the usage of a new tabular data generator denoted as CTGAN to increase the number of datasamples of minority classes and obtained outstanding results in terms of F1-score.
- 3
- We compare CTGAN oversampling method with other traditional methods such as Synthetic Minority Over-sampling (SMOTE) and Adaptive Synthetic oversample (ADASYN) demonstrating its accuracy to generate datasamples.
2. Related Work
3. Methodology
3.1. Methodologies to Train and Test Multi-Class Classification Using Multi-Class and Binary Algorithms
- Multi-class classification algorithms: Logistic regression, Naive Bayes, Decision Tress, and Random Forest.
- Binary classification algorithms: Decision Trees, Logistic Regression, Gradient boosted tree, SVM Linear, Naive Bayes, and Random Forest.
3.1.1. Methodology 1: Multi-Class Classification. Spark Multi-Class Classifiers
3.1.2. Methodology 2: Multi-Class Classification Using Binary Classification Algorithms Available in Spark
3.1.3. Pipeline 1
3.1.4. Pipeline 2
3.2. Oversampling CTGAN
3.2.1. Synthetic Minority Over-Sampling (SMOTE)
3.2.2. Oversample Using Adaptive Synthetic (ADASYN)
4. Experiments and Results
4.1. BoT-IoT Dataset and Previous Contributions
4.2. Train and Test Multi-Class Classification Using Multi-Class and Binary Algorithms Available
4.3. Oversampling CTGAN
5. Conclusions and Future Work
Author Contributions
Funding
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
IoT | Internet of Things |
CTGAN | Conditional Tabular Generative Adversarial Network |
DDoS | Distributed Denial of Service |
DoS | Denial of Service |
SMOTE | Synthetic Minority Over-sampling |
ADASYN | Oversample using Adaptive Synthetic |
GAN | Generative Adversarial Neural Network |
SVM | Support Vector Machine |
OVR | One vs. The Rest |
CNN | Convolutional Neural Network |
KNN | K-nearest Neighbors |
MLP | Multi-layer Perceptron |
DT | Decision Trees |
ANN | Artificial Neural Network |
TP | True Positives |
GPU | Graphics Processing Unit |
References
- Cisco. Cisco Annual Internet Report (2018–2023); White Paper; Cisco: San Francisco, CA, USA, 2020. [Google Scholar]
- Hung, M. Leading the IoT. Technical Report; Gartner Research. 2017. Available online: https://www.gartner.com/imagesrv/books/iot/iotEbook_digital.pdf (accessed on 14 September 2022).
- Soe, Y.; Feng, Y.; Santosa, P.; Hartanto, R.; Sakurai, K. Rule Generation for Signature Based Detection Systems of Cyber Attacks in IoT Environments. Bull. Netw. Comput. Syst. Softw. 2019, 8, 93–97. [Google Scholar]
- Filus, K.; Domańska, J.; Gelenbe, E. Random neural network for lightweight attack detection in the iot. In Proceedings of the Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems, Nice, France, 17–19 November 2020; Springer: Berlin, Germany, 2020; pp. 79–91. [Google Scholar]
- Kumar, P.; Gupta, G.P.; Tripathi, R. Toward design of an intelligent cyber attack detection system using hybrid feature reduced approach for iot networks. Arab. J. Sci. Eng. 2021, 46, 3749–3778. [Google Scholar] [CrossRef]
- Shafiq, M.; Tian, Z.; Sun, Y.; Du, X.; Guizani, M. Selection of effective machine learning algorithm and Bot-IoT attacks traffic identification for internet of things in smart city. Future Gener. Comput. Syst. 2020, 107, 433–442. [Google Scholar] [CrossRef]
- Khraisat, A.; Gondal, I.; Vamplew, P.; Kamruzzaman, J.; Alazab, A. A Novel Ensemble of Hybrid Intrusion Detection System for Detecting Internet of Things Attacks. Electronics 2019, 8, 1210. [Google Scholar] [CrossRef] [Green Version]
- Shyam, R.; HB, B.G.; Kumar, S.; Poornachandran, P.; Soman, K. Apache spark a big data analytics platform for smart grid. Procedia Technol. 2015, 21, 171–178. [Google Scholar] [CrossRef] [Green Version]
- Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [Google Scholar] [CrossRef] [Green Version]
- Ibitoye, O.; Shafiq, O.; Matrawy, A. Analyzing Adversarial Attacks against Deep Learning for Intrusion Detection in IoT Networks. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar] [CrossRef] [Green Version]
- Alsamiri, J.; Alsubhi, K. Internet of things cyber attacks detection using machine learning. Int. J. Adv. Comput. Sci. Appl. 2019, 10. [Google Scholar] [CrossRef] [Green Version]
- Ferrag, M.A.; Maglaras, L. DeepCoin: A Novel Deep Learning and Blockchain-Based Energy Exchange Framework for Smart Grids. IEEE Trans. Eng. Manag. 2020, 67, 1285–1297. [Google Scholar] [CrossRef] [Green Version]
- Manzano Sanchez, R.; Goel, N.; Zaman, M.; Joshi, R.; Naik, K. Design of a Machine Learning Based Intrusion Detection Framework and Methodology for IoT Networks. In Proceedings of the 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC), Virtual, 26–29 January 2022; pp. 0191–0198. [Google Scholar] [CrossRef]
- Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling Tabular data using Conditional GAN. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2019; Volume 32. [Google Scholar]
- Soe, Y.N.; Feng, Y.; Santosa, P.I.; Hartanto, R.; Sakurai, K. Towards a lightweight detection system for cyber attacks in the IoT environment using corresponding features. Electronics 2020, 9, 144. [Google Scholar] [CrossRef]
- Bagui, S.; Li, K. Resampling imbalanced data for network intrusion detection datasets. J. Big Data 2021, 8, 1–41. [Google Scholar] [CrossRef]
- Fatani, A.; Dahou, A.; Al-Qaness, M.A.; Lu, S.; Elaziz, M.A. Advanced feature extraction and selection approach using deep learning and Aquila optimizer for IoT intrusion detection system. Sensors 2021, 22, 140. [Google Scholar] [CrossRef] [PubMed]
- Zixu, T.; Liyanage, K.S.K.; Gurusamy, M. Generative adversarial network and auto encoder based anomaly detection in distributed IoT networks. In Proceedings of the GLOBECOM 2020—2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–7. [Google Scholar]
- Ferrag, M.A.; Maglaras, L.; Ahmim, A.; Derdour, M.; Janicke, H. Rdtids: Rules and decision tree-based intrusion detection system for internet-of-things networks. Future Internet 2020, 12, 44. [Google Scholar] [CrossRef] [Green Version]
- Prabakaran, P.; Mohana, R.; Kalaiselvi, S. Enhancing the Cyber Security Intrusion Detection based on Generative Adversarial Network. Elem. Educ. Online 2021, 20, 7401. [Google Scholar]
- Ullah, I.; Mahmoud, Q.H. A Framework for Anomaly Detection in IoT Networks Using Conditional Generative Adversarial Networks. IEEE Access 2021, 9, 165907–165931. [Google Scholar] [CrossRef]
- Belouch, M.; El Hadaj, S.; Idhammad, M. Performance evaluation of intrusion detection based on machine learning using Apache Spark. Procedia Comput. Sci. 2018, 127, 1–6. [Google Scholar] [CrossRef]
- Haggag, M.; Tantawy, M.M.; El-Soudani, M.M. Implementing a deep learning model for intrusion detection on apache spark platform. IEEE Access 2020, 8, 163660–163672. [Google Scholar] [CrossRef]
- Morfino, V.; Rampone, S. Towards near-real-time intrusion detection for IoT devices using supervised learning and apache spark. Electronics 2020, 9, 444. [Google Scholar] [CrossRef] [Green Version]
- Abushwereb, M. An accurate IoT intrusion detection framework using Apache Spark. Ph.D. Thesis, Princess Sumaya University for Technology, Amman, Jordan, 2020. [Google Scholar]
- Rish, I. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Seattle, WA, USA, 4–6 August 2001; Volume 3, pp. 41–46. [Google Scholar]
- Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
- Xu, L.; Skoularidou, M.; Cuesta-Infante, A.; Veeramachaneni, K. Modeling tabular data using conditional gan. Adv. Neural Inf. Process. Syst. 2019, 32, 7335–7345. [Google Scholar]
- Brandt, J.; Lanzén, E. A Comparative Review of SMOTE and ADASYN in Imbalanced Data Classification. 2021. Available online: https://www.diva-portal.org/smash/get/diva2:1519153/FULLTEXT01.pdf (accessed on 14 September 2022).
Algorithm | Hyper-Parameters |
---|---|
Random Forest | numTrees = 30, maxDepth = 30, Impurity = Gini |
Decision Tree | maxDepth = 5, Impurity = Gini |
Gradient Boosted Tree | maxDepth = 5, Learning_rate = 0.1, Impurity = variance |
SVM Linear | regParam = 0.1, kernel = Linear, HingeLoss |
Naive Bayes | N/A |
Logistic Regression | elasticNetParam = 0.8, penalty = Elasticnet |
Ref | Normal [%] | DDoS [%] | DoS [%] | Reconnaissance [%] | Theft [%] | Algorithm | Dataset |
---|---|---|---|---|---|---|---|
Shafiq et al. [6] | 75 | 98 | 100 | 81 | 93 | NB | Short version BoT-IoT |
Soe et al. [15] | - | 99.9 | - | 99.9 | 98.18 | Random Forest | Short version BoT-IoT |
Kumar et al. [5] | 100 | 100 | 100 | 100 | 93 | XGBoost | Short version BoT-IoT |
Fatani et al. [17] | 60.7 | 99 | 99 | 99 | 85.7 | Aquila optimizer (AQU) | Short version BoT-IoT |
Abushwereb et al. [25] | 71.8 | 99.9 | 99.13 | 88.83 | 23.2 | MLIB(RF) | Large version BoT-IoT |
Our approach | 98 | 94 | 93 | 99.86 | 99 | Random Forest | Entire BoT-IoT dataset |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Manzano Sanchez, R.A.; Zaman, M.; Goel, N.; Naik, K.; Joshi, R. Towards Developing a Robust Intrusion Detection Model Using Hadoop–Spark and Data Augmentation for IoT Networks. Sensors 2022, 22, 7726. https://doi.org/10.3390/s22207726
Manzano Sanchez RA, Zaman M, Goel N, Naik K, Joshi R. Towards Developing a Robust Intrusion Detection Model Using Hadoop–Spark and Data Augmentation for IoT Networks. Sensors. 2022; 22(20):7726. https://doi.org/10.3390/s22207726
Chicago/Turabian StyleManzano Sanchez, Ricardo Alejandro, Marzia Zaman, Nishith Goel, Kshirasagar Naik, and Rohit Joshi. 2022. "Towards Developing a Robust Intrusion Detection Model Using Hadoop–Spark and Data Augmentation for IoT Networks" Sensors 22, no. 20: 7726. https://doi.org/10.3390/s22207726
APA StyleManzano Sanchez, R. A., Zaman, M., Goel, N., Naik, K., & Joshi, R. (2022). Towards Developing a Robust Intrusion Detection Model Using Hadoop–Spark and Data Augmentation for IoT Networks. Sensors, 22(20), 7726. https://doi.org/10.3390/s22207726