Real-Time Large-Scale Intrusion Detection and Prevention System (IDPS) CICIoT Dataset Traffic Assessment Based on Deep Learning

Samuel Kofi Erskine

doi:10.3390/asi8020052

¹

Department of Computer Information Science, Florida Agricultural and Mechanical University, Tallahassee, FL 32310, USA

²

College of Science and Engineering, University of Bridgeport, Bridgeport, CT 06604, USA

Appl. Syst. Innov.2025, 8(2), 52;https://doi.org/10.3390/asi8020052

This article belongs to the Special Issue Advancements in Deep Learning and Its Applications

Version Notes

Order Reprints

Abstract

This research utilizes machine learning (ML), and especially deep learning (DL), techniques for efficient feature extraction of intrusion attacks. We use DL to provide better learning and utilize machine learning multilayer perceptron (MLP) as an intrusion detection (IDS) and intrusion prevention (IPS) system (IDPS) method. We deploy DL and MLP together as DLMLP. DLMLP improves the high detection of all intrusion attack features on the Internet of Things (IoT) device dataset, known as the CICIoT2023 dataset. We reference the CICIoT2023 dataset from the Canadian Institute of Cybersecurity (CIC) IoT device dataset. Our proposed method, the deep learning multilayer perceptron intrusion detection and prevention system model (DLMIDPSM), provides IDPST (intrusion detection and prevention system topology) capability. We use our proposed IDPST to capture, analyze, and prevent all intrusion attacks in the dataset. Moreover, our proposed DLMIDPSM employs a combination of artificial neural networks, ANNs, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). Consequently, this project aims to develop a robust real-time intrusion detection and prevention system model. DLMIDPSM can predict, detect, and prevent intrusion attacks in the CICIoT2023 IoT dataset, with a high accuracy of above 85% and a high precision rate of 99%. Comparing the DLMIDPSM to the other literature, deep learning models and machine learning (ML) models have used decision tree (DT) and support vector machine (SVM), achieving a detection and prevention rate of 81% accuracy with only 72% precision. Furthermore, this research project breaks new ground by incorporating combined machine learning and deep learning models with IDPS capability, known as ML and DLMIDPSMs. We train, validate, or test the ML and DLMIDPSMs on the CICIoT2023 dataset, which helps to achieve higher accuracy and precision than the other deep learning models discussed above. Thus, our proposed combined ML and DLMIDPSMs achieved higher intrusion detection and prevention based on the confusion matrix’s high-rate attack detection and prevention values.

Keywords:

DL; IDS; IPS; IDPS; CICIoT2023 dataset; AI; DLMLP; MLP; ML; CIC; DLMIDPSMs; IDPST

1. Introduction

Internet of Things (IoT) devices have emerged extensively in industrial applications that play a crucial role in society. IoT has paved the way for increased possibilities for researchers to investigate intrusion attacks in diverse industrial IoT applications, including [1,2,3]. Many IoT projects, including healthcare, intelligent transportation systems (ITS), smart agriculture, and innovative city designs utilize IoT applications and have experienced tremendous growth. Due to the application growth, IoT devices and their networks are constantly under attack. Thus, no known robust intrusion attack detection and prevention method exists. In addition, intrusion attacks may also include many anomalies for detection [4,5,6], which worsen the situation.

Many healthcare applications utilize IoT network devices. Some of these healthcare applications improve patients’ conditions through regular monitoring and help health workers, including doctors and nurses, to keep track of a patient’s health conditions on a timely basis, using ICT (information and communication technology) with IoT network capability [7,8]. Moreover, intelligent transportation systems (ITS) is one area that uses IoT network devices for real-time information communication with drivers. ITS has resulted in more possibilities for drivers to identify and prevent accidents from occurring [9,10,11] on the road.

Smart agriculture is another critical area where IoT devices are applicable [12]. Based on this, farmers or agriculturists use IoT network device applications on a large-scale basis in smart agriculture, which plays a vital role in large-scale crop harvesting and the production of agricultural products in larger quantities. Another crucial area where IoT applications have been practical is in smart city [12,13,14] development. Thus, using IoT network device applications in smart city building provides innovative solutions to transform cities into thoughtful city developments. Furthermore, the Industrial Internet of Things (IIoT) application of IoT has brought a lot of capabilities, including high reliability with low-latency automation monitoring and control [15].

IoT and IIoT connections will grow in the next couple of years with diverse IoT and IIoT applications [16]. IoT and IIoT devices utilize network topology for their use in applications. However, IoT device topology experiences diverse challenges. These challenges include efficient operations, security, and the interoperability of standards in IoT and IoT network device applications [17,18,19].

For example, when considering the development of new applications for IoT devices, it is noteworthy that it is always associated with challenges related to new security requirements. Some critical security requirement challenges include new intrusion attacks that may affect IoT and IIoT network devices or systems [20,21]. Moreover, intelligent transportation systems (ITS) applications, such as the Internet of Vehicles (IoV), require restrictive reliable information delivery response times and help drivers to make urgent decisions on the road. Thus, drivers use reliable response times to make proper judgments of the road and to consider pedestrian conditions as a priority. In addition, drivers should avoid many road casualties. Consequently, an IoT device’s road application requires real-time IoV data more than any typical IoT application.

Furthermore, IoT device adoption is growing exponentially. There are a vast amount of data for processing and analysis. Consequently, IoT devices are vulnerable to many security threats and other challenges. Some IoT devices in sensitive environments, which include e-health [22] and smart homes, have many challenges for IoT advancements. Accordingly, we will review some security threats, requirements, and challenges [23].

The non-standardization of IoT technologies and intensified vulnerabilities generate intrusion attacks. Intrusion attacks augment security incidents in IoT systems, including the CICIoT2023 dataset. Other security threats and challenges also focus on IoT system layers. The physical/perception layer is one of the IoT system device layers that can include significant security threats at the physical/perception layer. Therefore, the following essential security threats are worth discussing.

Eavesdropping can occur in the physical layer. As a result, malicious intruder devices in IoT systems, including the end nodes, use passive sniffing traffic to obtain valuable information.

Furthermore, hardware malfunctioning can occur in IoT system devices. Intelligent transportation systems (ITS), e-health, smart homes/cities, and the smart grid are affected. Consequently, hardware malfunctioning can lead to production faults, resulting in cyberattacks. There can be a significant impact not only on the system, but also, users’ lives can be under threat [24].

In addition, we may use many of these IoT system smart devices to generate datasets. These datasets conform to many IoT applications but are prone to intrusions or cyberattacks [25]. Moreover, Malign data injection can affect IoT systems. Malign IoT can also counterfeit IoT devices and can be injected into IoT systems. It also can sniff the wireless traffic and introduce intrusion attacks. Intrusion attacks generally lead to the malfunctioning of the channel or system and produce fake messages, making the system unavailable for users.

Furthermore, node cloning is also another IoT security challenge. As a result, there is no standardization, which leads to the forging of device duplication in IoT ecosystems [26]. Based upon this, insider intrusion attacks can swap legitimate devices with illegitimate devices, which happens during IoT system production in an operational state. Thus, node cloning attacks can lead to security parameter removal and firmware overwriting. Additionally, due to wireless vulnerability, unauthorized access to IoT devices can quickly occur. Moreover, there are many security vulnerabilities in default password usage, which can lead to built-in credentials by the intrusion attacker.

These IoT system security vulnerabilities lead to design challenges for producing practical large-scale datasets with IoT security solutions and IDPS capability. The investigation of such a secure IoT solution with IDPS capability that includes IoT application for the CICIoT2023 dataset is currently unavailable in the literature.

To investigate these secure IoT security challenges based on the literature, researchers have proposed machine learning (ML) methods based on the literature. For instance, researchers [27,28,29,30] have proposed IoT security in distributed IoT devices without any robust security mechanism. They only used intrusion detection in IoT devices without intrusion prevention mechanisms. In addition, they did not include any clear network topology for investigating intrusion attacks in large-scale datasets.

Moreover, other researchers, including [31,32], have tried to provide IoT security solutions in their strategies for solving intrusion attacks. Researchers [32] used large-scale datasets without IDPS capability, whilst researchers [31] used deep learning (DL) multilayer perceptron (MLP); however, they did not consider extensive large-scale IoT device topology. Consequently, these researchers did not succeed in investigating and preventing intrusion attacks because of the limitations of the methods used for intrusion attacks and prevention proposed in the literature.

Typically, to ensure the development of security analytical solutions for IDPS in real-time secure data provision scenarios, this research resolves the gap left by previous methods such that the data produced in the IoT devices should (i) include an IDPS IoT network dataset with robust intrusion detection and prevention mechanisms in place, being able to detect and prevent any threats, thereby mitigating any new form of intrusion attacks that may harm IoT operations; (ii) be obtained from large-scale, extensive, real-time IDPS IoT network device topology, enabling the detection and prevention of any new or different intrusions; and (iii) detect and prevent any new intrusion attacks that have a malicious intention to pervade in IoT network devices.

Consequently, this study’s main objective includes proposing a novel intrusion detection and prevention system (IDPS) intrusion attack system model. The model investigates intrusion attacks in large-scale datasets. The proposed research method uses the DLMIDPS model for intrusion detection and prevention system models. Using the dataset, our proposed DLMIDPS model investigates intrusion attacks in a full large-scale IoT dataset. The proposed method in this research promotes, develops, designs, and implements a new IoT security analytical application solution. It also provides real-time IoT network device IDPS operations capability. We have implemented 33 intrusion attacks on IoT network devices based on this. The implementation of our DLMIDPS model utilizes IDPS topology, which comprises about 105 devices with intrusion attacks.

We classify the intrusion attacks into six categories based on benign, DDoS (distributed denial of service), DoS (denial of service), MITM (man in the middle attack), Marai, and Recon. Moreover, malicious IoT network devices include attack categories such as web-based, brute force, and spoofing. Data processing has appropriately screened off these categories. Therefore, the intrusion attacks are excluded from the trained dataset by the intrusion detection and prevention system. The processed data are prepared and ready for use in several formats. We use feature extraction in our evaluation and engineer new dataset features.

The research contributions involve the following:

We develop, design, and implement the DLMIDPSM. The DLMIDPSM evaluates intrusion attacks in the topology of IoT devices based on our proposed topology in this research, IDPST.
We develop, design, and implement the intrusion detection and prevention system topology (IDPST). IDPST analyzes intrusion detection systems (IDS) and intrusion prevention systems (IPS).
We propose a large-scale real-time CICIoT2023 dataset with IDPS capability.
We implement, document, obtain, and analyze data based on 33 intrusion attacks categorized into six classes against the IoT devices dataset.
We develop, design, implement, and evaluate the deep learning multilayer perceptron intrusion detection and prevention system (DLMIDPSM) and machine learning (ML). The combined model is known as ML and DLMIDPSM. We use this combined model to classify, detect, and prevent IoT network device traffic that represents benign or malicious attacks.
We develop performance metrics using the precision, accuracy, F1-score, and confusion matrix to assess the performance of the proposed solution.

The research is organized as follows: Section 2 is the literature review; Section 3 is the proposed solution and the methodology; Section 4 is the training, validating/testing of the dataset utilizing the DLMIDPSM model; Section 5 is the experimental simulation setup and analysis; and Section 6 is the conclusion.

2. Literature Review

2.1. Internet of Things Machine Learning-Based IDPS

In recent years, diverse contributions to IoT security have begun to exist. There have been diverse objectives in which the data produced are made possible using diverse methods and resources. We can review the features that govern existing datasets through many proposals. These proposals are available in the existing literature and are compared with the dataset features proposed by IDPS capability in this research.

In [33], the authors proposed an artificial intelligence (AI)-based intrusion detection and prevention system (IDPS). The IDPS stops intrusions on IoT networks in real-time. The proposed research method used ensemble feature selection algorithms to address and eliminate the dataset’s irrelevant and redundant features, improving the proposed model’s performance. The dataset used in the study was the NBaIoT dataset, which is known for its high accuracy in intrusion detection. However, the NBaIoT dataset deployment was not on large-scale IoT devices. The NBaIoT dataset requires the use of deep learning multilayer perceptron (DLMLP). In addition, the model only trained processed network data at the IoT gateway, which predicted real-time attacks.

In [34], the authors proposed a conjugate gradient-based improved GAN (CG-IGAN) anomaly detection system for IoT. The proposed CG-IGAN learned and collected reliable and practical features in unstable conditions. The CG-IGAN method uses a Botnet dataset, which is collected and extracted using Independent Component Analysis (ICA). The CG-IGAN method classifier used these attributes and improved the precision that identified malicious data. CG-IGAN improved the accuracy in identifying malicious data, contrasted with advanced IoT-based anomaly detection methods. The CG-IGAN method required a large-scale Botnet dataset. The CG-IGAN method required IDPS capability for the deep learning multilayer perceptron (DLMLP) method and network topology capability to capture and assess intrusion attacks.

The authors in [35] proposed a method that used an intelligent system based on a particle swarm optimization algorithm to detect early attacks practically. The result shows that the proposed algorithm can detect the attack source. The proposed method is more bandwidth-consuming than traditional methods. The proposed method also supports intrusion prevention system functionality in next-generation firewalls. The proposed attack detection method could detect all modern IoT device attacks. The described method could include DLMLP and IDPS’s capability to investigate intrusion attacks.

Therefore, the authors in [36] proposed a method of IoT security that leveraged machine learning (ML) and deep learning (DL) techniques. The proposed method combined techniques to solve IoT security problems. In addition, the proposed method investigated how to enhance IoT security devices and applications that safeguard user data from unauthorized access and misuse. The proposed method could be deployed in a large-scale IoT dataset and could utilize the IDPS network topology available for evaluating the intrusions.

In addition, the authors in [37] proposed a method that examined Snort’s role as a robust intrusion detection tool in a network. Thus, the method in [37] proposed ways to prevent intrusion by thoroughly investigating security challenges in IoT network devices. In addition, the study focused on specific vulnerability detection and emphasized resource constraints, different communication protocols, and the need to tailor security measures for them. The paper thoroughly examined Snort intrusion detection architecture, and its application to develop custom rules for detecting intrusion attacks was studied. The method could use machine learning (ML) or deep learning techniques, including DLMLP and IDPS capability, to investigate intrusion attacks. Moreover, the proposed method could include a large-scale IoT dataset deployment solution and network topology to investigate intrusion attacks in their research.

Consequently, the authors in [38] proposed a method that evaluated the specificity and complication of IoT security protection. They discovered that artificial intelligence (AI) approaches, like machine learning (ML), including ensemble classifiers, offer strong abilities and satisfy IoT security demands. The enhancement was associated with ensemble machine learning methods that used a variety of learning processes based on variable capacities. The authors improved predictability and decreased the likelihood of classification errors by combining the techniques used. The proposed method outcome ensured that the architecture enhanced the effectiveness of anomaly detection and intrusion prevention systems with high accuracy. The proposed methodology could have used large-scale IoT dataset deployments in the research investigation. The methodology did not utilize deep learning with IDPS capabilities for intrusion attack detection, including intrusion prevention methods.

The authors in [39] proposed a survey method based on comparing and reviewing the latest deep learning-based approaches for intrusion detection. The authors also provided a comparative analysis of compatibility, challenges, feasibility, and real-time issues. The proposed method aimed to detect zero-day attacks. The proposed method could use precise intrusion detection and prevention methodology, and the process could be deployed in large-scale datasets, utilizing deep learning intrusion detection and prevention capability.

Therefore, the authors in [40] proposed a method for analyzing the efficiency of deep learning techniques like CNN, LSTM, and XGBoost. Their research used a dual CNN approach, LSTM with three stacked layers, and XGBoost techniques. The method also utilized cross-validation to reduce overfitting by utilizing the IOBOTNET 2020 dataset. Based on the results, LSTM and XGBoost performed well and achieved fit scores with higher percentages. The proposed method dataset does not depend on large-scale IoT dataset deployment and IoT devices; the method could also use network topology. The proposed deep learning method could be precise in the research method.

Furthermore, the authors in [41] proposed an intrusion detection system using the ROUT-4-2023 dataset. The dataset encompassed Black Hole, Flooding, DODAH Version Number, and Decreased Rank attacks. The proposed method in their study used statistical information graphs to investigate network traffic features encompassing all four attacks. The experiment testing the proposed method used machine learning models and deep learning architectures, comparing and analyzing confusion matrix outcomes and computational efficiency. In addition, the result of the proposed research method indicates that the Random Forest classifier achieved high accuracy. In addition, the dataset proposed in their research could use large-scale IoT device deployment capability and use IDPS network topology, and IDPS model capability could be proposed to be added to the method with the intrusion prevention system (IPS) method.

Therefore, the authors in [42] proposed a hybrid intrusion detection system that used machine learning, file integrity monitoring, anomaly-based and signature-based rootkit detection, and signature-and-anomaly-based detection. The proposed research method used distributed architecture to share real-time threat intelligence amongst many nodes. The proposed method system offered proactive responses for automatic incident responses, endpoint quarantine, and improved detection capabilities. In addition, the proposed hybrid intrusion detection system combined rule-based and machine learning methods to detect and prevent attacks but did not use large-scale IoT datasets. The proposed method did not include network topology deployment with intrusion detection, prevention, and deep learning model capability.

Ultimately, the authors in [43] proposed deep learning methods to analyze sensor data amounts in IoT devices to predict and detect patterns and to improve IoT system efficiency. The proposed research method also explored existing techniques in anomaly detection in IoT networks. The proposed method also studied machine learning and deep learning methods that focus on diverse datasets that aim to detect specific anomalies to see the most appropriate solution for implementation. However, the proposed method did not use a large-scale dataset method deployed with a transparent network topology capability and with the deployment of a large-scale IoT dataset to investigate intrusion attacks.

2.2. Deep Learning-Based Intrusion Detection and Prevention System (IDPS)

Researchers have used broad concepts in their proposed intrusion detection and prevention system (IDPS) based on deep learning (DL) and machine learning (ML) approaches. Based on this, the following review is worth discussing to ensure that deep learning-based intrusion detection is worth investigating in intrusion detection and prevention.

The author in [44] proposed an innovative intrusion detection system with superior network performance and detected unknown attack packages using a deep learning neural network. They used binary and multiclass classification to detect the intrusion attacks. The authors proposed that the method could be used in large-scale IoT security datasets and could use IDPST. The proposed research had a high accuracy and could also utilize the DL intrusion detection method, but it lacked any prevention method.

Consequently, the authors in [45] explained how neural networks prevent and detect intrusion attacks for data security. The proposed method studied various IDS and IPS (intrusion prevention) methods. The study was concerned with using neural networks to solve related problems. The study also did not demonstrate intrusion detection or the prevention of IoT networks with IDPST, nor did it show any large-scale IoT dataset used in investigating intrusion attacks.

The authors of [46] just discussed the differences between IDS and IPS without providing the details of how to achieve them. The proposed method was only based on examining the advantages and disadvantages of using IDS and IPS. The study also explained current developments in machine learning and how they improve the IDPS. The study did not depend on a large-scale IoT dataset, including IDPS topology, with the intrusion detection and prevention systems deployed together.

Therefore, the authors of [47] used the hybrid IDS method, which detailed intrusion detection and prevention. The proposed hybrid method combined J48DT (J48 decision tree) and SVM (support vector machine learning). However, the proposed method did not depend on large-scale IoT dataset deployment and utilized an IoT network with IDPST.

From the literature discussion, the design of a deep learning-based multilayer perceptron intrusion detection and prevention model, DLMIDPSM, is important. Our proposed research method includes urgently needed designs of the DLMIDPSM to investigate intrusion detection and prevention in this research. We also investigate a large-scale IoT dataset in our proposed IoT intrusion detection and prevention IoT device topology (IDPST) for assessing and capturing intrusion attacks. Our new proposed DLMIDPSM can re-evaluate performance metrics, such as precision, accuracy, F1-score, and confusion matrix, for the new datasets proposed in this research, such as the CICIoT2023 dataset with IDPS capability.

3. Proposed Solution and Methodology

3.1. Preliminary Problem Resolution Objectives

Based on the objectives for this proposed research problem resolution, we proposed an optimized IDPS solution with a large-scale IoT dataset, such as CICIoT2023, using DLMLP. The proposed IDPS solution was proposed through a method [31] that investigated an intrusion detection system (IDS) and intrusion prevention system (IPS) using deep learning (DL) multilayer perceptron (DLMLP). We deem this objective important, because the number of networked digital devices is increasing, which is leading to a corresponding number of intrusion attacks. We know that an increase in intrusion attacks continues to increase in IoT (Internet of Things) network devices, which affects many IoT device applications. Consequently, we used the method from [31], which proposed an intrusion detection and prevention system (IDPS) for investigating, detecting, and preventing these intrusion attacks. However, the proposed IDPS method [31] did not investigate intrusion attacks using real-time and large-scale IoT device scenarios. In addition, we realized in our investigation that the method [31] did not use any IoT device network topology for data collection and processing based on the methodology used to deploy IDPS. However, the proposed method focused on deep learning (DL) with an IDPS that immediately detected and prevented intrusion attacks.

Furthermore, we also realized in our investigation that the method [31] only used DoS, Probe, R2L, and U2R to represent the deployed intrusion attacks, which do not truly depict large-scale IoT device dataset attributes. In addition, the dataset used did not confirm any IoT security solution. Moreover, we also realized in our investigation that the dataset utilized in the method [32], which was a large-scale IoT device dataset, had limitations in terms of its IDPS and DLMLP intrusion detection and prevention system model (DLMIDPSM) capability, such as that used in our proposed research method. Consequently, this research used the DLMIDPSM method as our proposed optimized model for an IoT security solution.

Then, we designed our proposed DLMIDPSM, as depicted in Figure 1 below:

Figure 1. Proposed deep learning multilayer perceptron intrusion detection and prevention system model (DLMIDPSM).

3.2. Proposed Methodology

We created the intrusion detection phase using a deep learning (DL) model to detect all possible intrusion threats experienced in the network. We used a sequence of steps to accomplish the intrusion detection and prevention phases tasks, based on our proposed DLMIDPSM, that ensured high accuracy and reduced loss. However, due to the large-scale deployment limitation that was used in the DL method [31], which only used knowledge-based database datasets like KDDCupp99 limitation in large-scale IoT dataset deployment, therefore, the accuracy determination and loss elimination in the proposed method did not provide a true reflection of IoT security solutions in the used dataset.

Therefore, to solve this security challenge in the used dataset in Figure 1, we proposed using a large-scale IoT dataset, such as the CICIoT2023 dataset [32], based on knowledge created in the intrusion detection phase to ensure high accuracy and minimize loss. Before training the dataset, we preprocessed it using one hot encoding. We used one-of-K to transform all categorical features into binary features using one hot encoding. We proceeded with the dataset using feature scaling and standardized the independent features with a fixed range. Based on this, we scaled the values with a distribution centered on 0 and with a standard deviation of 1.

From Figure 1, it can be seen that we prepared the input data as a pre-trained model. We used a multilayer perceptron that consisted of two dense layers, which included the Relu and SoftMax functions. A typical design limitation in the method [31] was using dataset testing with Wireshark. However, there were no IoT network devices and no IDPS topology. The following section will discuss our proposed IDPST to resolve this design limitation.

3.3. Intrusion Detection System (IDS) Phase Used in Proposed DLMIDPS Model

Based on our proposed DLMIDPSM in Figure 1, we deploy deep learning as a unique machine learning technique for the better extraction of features, for better understanding, and for machine perception. Subsequently, we utilize the IoT dataset, such as CICIoT2023, which in our proposed method has IDPS capability. As shown below, we use multiple consecutive deep learning layers (DL) for our Algorithm 1. DL layers interconnect with each layer that receives input as an output of the previous layer. Based on this, we ensure that DL provides efficient algorithm usage and provides an advantage in hierarchical feature extraction.

In Figure 1, we briefly discuss the intrusion prevention (IP) phase, which we will discuss in detail in a subsequent section. In the IP phase, a script is generated and uses all of the Canadian Institute of Cybersecurity (CIC) admin privileges. This script can prevent intrusion and all malicious activity requests, including DoS intrusion attacks. The script terminates connections and informs the CIC admin about all potential intrusions and malicious activities in the network, which begins the IP phase (we will discuss this in detail in a later section).

We deploy both the intrusion detection (ID) and intrusion prevention (IP) phases and integrate the two as a software in Algorithm 1, which we implement below. Since the intrusion attacks detected in [31] only concern attacks which are not of IoT dataset devices, data representation was based on just detecting manual features in deep learning [48]. To investigate this and provide a solution, we aim to investigate intrusion attacks, including MITM, Recon, DoS, DDoS, and benign traffic typical of the proposed large-scale CICIoT2023 dataset that utilizes an artificial neural network architecture, named multilayer perceptron, convolutional neural network, and recurrent neural network.

The proposed DLMIDPSM uses multilayer perceptron (MLP). MLP is a neural network type that consists of one or many neuron layers. We feed data to the input layer, which consists of one or many hidden layers that provide some abstraction levels and make predictions at the output layer, known as the visible layer. We consider MLP to be a feed-forward artificial neural network class.

MLP generally consists of three nodes: the input, output, and hidden layers. Each node may also be a neutron that uses a nonlinear activation function. MLP also uses a guided learning method for teaching, known as backpropagation. Additionally, MLP is distinguished from linear perceptron [49] by the use of multiple layers and linear activations. It also can distinguish datasets that are not linearly independent.

Algorithm 1: Framework Algorithm for Deep Learning Multilayer Perceptron Intrusion Detection and Prevention System (DLMIDPS) Model

$Notations : D_{0}$ $— initial file captured from CICIOT 2023 dataset; DL — deep learning; MLP — multilayer perceptron; a_{i} — s u b s e q u e n t i n t r u s i o n a t t a c k s$ ; IDPSM model—the combined output model in Figure 1

Begin
Input: $D_{0}$ : Industrial Internet of Things (IIoT) office-based devices IDPST Topology data
Output: IDPS: Proposed Intrusion Detection and Prevention System Model Action
Procedure: DLMPIDPSM Model: $Deep learning Multilayer Perceptron Intrusion Detection and Prevention System (D_{0}$ )
Industrial Office Environment:
{AI, IoT, IIoT, cloud computing, Sensor Devices}
IoT Sensor Devices = {Real-time intrusion attacks detection and prevention from large-scale CICIoT2023 Dataset, Training, and Validation/Testing}
IoT Intrusion-based attacks and benign traffic—(IIBABT)
IIBABT = {Mirai, DDoS, DoS, MITM, Recon, and Benign traffic}
${I n t r u s i o n a t t a c k s}_{S o r t e d} =$ (AI ( $a_{i})$ )
Dataset Collection & Preprocessing (S):
- $\leftarrow D_{C o l l e c t e d & P r e p r o c e s s i n g} = \{C I C I o T 2023 d a t a s e t s o u r c e, n o r m a l i z a t i o n, f e a t u r e s c a l i n g, a n d e n g i n e e r i n g\}$
AI DL-based Intrusion Detection& Prevention and Classification (AIDPLC):
While (Intrusion (I) and IoT Sensor Devices (IS)) do.
AIDPLC = {DL, MLP, Evaluation Metrics, S}
end while
Intelligent Intrusion Attacks Prioritization and Prevention:
If (Intrusion Attacks Detected (IAD) do
Prioritize Intrusion Attacks based on the Industrial IoT Sensor Devices (ISD)
Impact $(a_{i}) = C a t e g o r y ((a_{i}), I n d u s t r i a l I o T S e n s o r D e v i c e s (A p p l i c a t i o n D a t a s e t)$
while (IAD, ISD) do
Prevention Actions take place
end while
end if
return IDPS
end

3.4. Proposed DLMIDPSM Predictive Framework for Intrusion Detection and Prevention

In Algorithm 1, we show the procedure of the proposed DLMIDPSM. The algorithm uses several components for intrusion detection and the prevention of all cyberattacks in the IoT devices used in our proposed IDPST. A typical environment for deploying the algorithm is in industrial office settings with cloud computing and AI/DL capability, as shown in our proposed model IDPST in the next section. We deploy the algorithm to assign the real-time large-scale dataset for the learning, training, and validating of the dataset in the CIC lab setting. The intrusion attacks considered for detection and prevention by our proposed DLMIDPSM are Mirai, MITM, Recon, DDoS, DoS, benign traffic scenarios, and other cyberattacks with a tendency to risk the IoT devices in our network. We perform the dataset collection and preprocessing, cleaning, normalization, feature scaling, and engineering from our system model in Figure 1. We use the labeled dataset with evaluation criteria and AI/DL base detection and prevention and classification, including decision tree (DT), support vector machine (SVM), multilayer perceptron, and autoencoder models to detect, prevent, and classify intrusion attacks. We use the intelligent prioritization prevention system and develop customized preventive strategies for intrusion attacks, thereby limiting any danger and classifying intrusion attacks based on how they affect the industrial office equipment. Proactive preventive measures, prioritization strategies, and AI/DL-driven detection and prevention techniques are also necessary.

3.5. Investigating IDPS Capability and Producing the Dataset

The CICIoT2023 dataset [32] proposed lacks intrusion detection capability. It is essential to initially implement intrusion detection capability with the dataset to provide an in-depth description of the necessary steps and resources used to generate the dataset. Therefore, our proposed DLMIDPSM is used to achieve IDPS capability by deploying it through the training and validating/testing process, as seen in Figure 1. However, describing the CIC IoT admin lab is also necessary, as shown in Figure 1 and in the procedures of Algorithm 1. We also focus on the IoT topology, such as the IDPST proposed in this research (shown in Figure 1), which we will discuss later. Our proposed IDPST should list all of the IoT devices used in our topology for better explanation. Subsequently, we present all the intrusion attacks captured in the dataset based on malicious traffic and benign scenarios.

3.6. CIC IoT Lab with IDPS Capability

IoT security data supports real applications. Producing this application is challenging for several reasons. A peculiar design challenge is that by providing an extensive network that includes several real-time IoT devices, IDPST capability is essential. The literature details many methods’ datasets, including the CICIoT2023 dataset, which only used simulated or a few IoT device topologies based on costs and network equipment, such as routers, switches, and network taps, without IDPS capability. The topologies also include personnel that maintain the lab setting infrastructure.

The Canadian Institute for Cybersecurity (CIC) has distinguished capabilities in the cybersecurity ecosystem. They have also contributed to producing datasets for industry and academia. CIC has created a dataset used to develop new cybersecurity applications. They have formed several partnerships with industries and academia to improve cybersecurity practices when developing new security solutions. CIC has successfully established an IoT lab with a dedicated network to foster IoT security solution development. We aim to share our data collection from the extensive network IDPST, as proposed in this research, to provide intrusion detection capability in the CICIoT2023 dataset produced in the CIC lab setup. Based on this, we intend to advance IoT security research and support several initiatives in different aspects of IoT security solutions.

Figure 2 below depicts the CIC lab IoT and devices. The IoT devices are thrown haphazardly across the lab setup such that they attract intrusion scenarios on the table, floor, and walls, which require the attention of our proposed IDPST. Our proposed IDPST utilizes several power plugs, racks, and storage rooms that organize the IoT and network device setup in our proposed IDPST, as shown in Figure 3.

Figure 2. CIC IoT lab setting with IDPS capability [32].

Figure 3. Proposed IoT intrusion detection and prevention system topology (IDPST) experiment for a smart industrial office [32].

3.7. Proposed Intrusion Detection and Prevention System Topology (IDPST)

Figure 3 depicts the implementation of our proposed intrusion detection and prevention system topology, IDPST. Our proposed IDPST aims to investigate and create the CICIoT2023 dataset produced in the CIC lab setup, such as in Figure 2, with IDPS capability, using our proposed DLMIDPSM. Our proposed IDPST comprises over 106 IoT devices. The number of IoT devices directly involved with intrusion attacks is over 68, with 39 Zigbee devices establishing connections with six hubs.

Our proposed IoT IDPST emulates the CIC IoT lab setting with an IDPS capability solution (Figure 3). Accordingly, our proposed IDPST has IDPS capability based on the DLMIDPSM algorithm. Therefore, our proposed IDPST aims to simulate real-time IoT products and services in innovative industrial office environments with a high incidence of intrusion attack scenarios. The devices involved in the topology are located in bright industrial offices, including microcontrollers, sensors, and cameras. We configure and connect the devices so that they can execute many intrusion attacks and capture the resultant intrusion attack traffic, including benign traffic. Consequently, we deploy our proposed IDPST with DLMIDPSM software and Algorithm 1 functionality. Deploying DLMIDPSM software and Algorithm 1 in our IDPST permits us to execute several intrusion attacks and capture benign and malicious intrusion attack traffic together.

Furthermore, our proposed IDPST deployment consists of two portions. The first portion includes industrial settings and office routers that establish a connection with the Internet of Things (IoT) network. The first portion also has Internet capability with Windows 11 Desktop computers. We interconnect Cisco switches with computers involving wireless access points, connecting more than seven Raspberry Pi devices. These devices are responsible for executing intrusion attacks and malicious activities through our proposed IDPST, utilizing our DLMIDPSM software and Algorithm 1 capability. Consequently, we show the capability of our proposed DLMIDPSM method for the detection and prevention of malicious agents and intrusion attacks through our proposed IDPST. Thus, the capabilities of our IDPST show the uniqueness of the IDPS method capability, which is incomparable to many other researchers’ methods used in the literature.

In the second portion of our proposed IDPST, we deploy the Cisco switches and connect them with the Gigabit network tap. We ensure that IoT traffic occurs through all the devices in our topology, sending traffic to two network monitors. We task the network devices with storing the IoT network traffic in Wireshark [50]. The Gigabit network tap comprises a hardware device that permits intrusion attacks and malicious agent traffic monitoring and analysis. We connect a network cable that provides traffic copy to other monitoring and security tools. Thus, we connect the network taps so that they do not affect normal operations. Therefore, we offer a full-duplex, non-intrusive, and passive way of accessing the network intrusion traffic. In addition, we ensure that no latencies occur and that the network performance is not affected. Thus, our proposed IDPST device arrangement includes two networks and two monitoring ports between the intrusion attacks and legitimate IoT devices. Our IDPST also connects one port to the malicious attackers and the other to the victim’s network. We use the monitor port to capture intrusion and malicious traffic from the IoT network devices in our topology.

3.8. Benign and Malicious Dataset Collection Scenario from Our IDPST

Generating Intrusion/Malicious Data from the Dataset

Section 3 describes the Gigabit network tap for our proposed IDPST, which monitors the IoT device’s network traffic. We use the Gigabit network tap to generate network packets in our proposed IDPST on separate computers. Therefore, we associate two interfaces with two other network monitoring ports. The network ports can transfer incoming traffic packets to the computers connected to our proposed IDPST. Consequently, we monitor the network traffic packets in our IDPST through Wireshark as a cap-captured file format. Mergecap [51] unifies the pcap files in each experiment based on the two data streams stored.

Accordingly, we perform different experiments based on each intrusion attack that targets all applicable IoT devices in our IDPST. The intrusion attacks occur through malicious intrusions in the IoT devices that target all vulnerable devices. For instance, all IoT devices may suffer DDoS attacks, while web-based intrusion attacks can target devices that support web applications. Figure 4, Figure 5 and Figure 6 depict each attack category scenario and the number of instances of all intrusions detected in the CICIoT2023 dataset. Figure 4 shows the number of rows and the counts for each attack scenario. Figure 5 depicts several rows of the counts of all of the attacks, including the DDoS, DoS, MITM, Recon, and Mirai attacks, and the benign network traffic. Figure 6 shows valid categorical intrusion data from the dataset.

Figure 4. Number of rows and counts for each attack scenario.

Figure 5. Number of rows and counts for each intrusion attack category.

Figure 6. Valid categorical intrusion data.

3.9. Generating Benign Data from the CICIoT2023 Dataset

We generate benign data from our proposed IDPST by analyzing and capturing benign data traffic from the CICIoT2023 dataset. Generating benign data is essential, since it represents acquiring legitimate network traffic using the proposed IDPST. Thus, we envisage data capturing as one of the main objectives in our proposed IDPST. Capturing data relies greatly on collecting the CICIoT2023 dataset IoT traffic in idle conditions and by humans (e.g., using sensors, accessing the video from smart cameras, and using echo requests).

We design the physical layer hardware of our proposed IDPST and capture various intrusion attacks in the dataset. The hardware depends on the network tap, such as the Gigabit network, which we implement in our proposed IDPST, and the hardware combines with the two network monitors. We use Wireshark software (version number 4.4.5) most current version to capture the entire IoT intrusion attack network traffic. In addition, we configure all IoT devices with default parameters, which means without any intrusion/malicious attack scripts. Consequently, we capture benign data traffic, since it becomes apparent that no intrusion attack occurs in our proposed IDPST. The entire process happens over 20 h.

3.9.1. DDoS and DoS Intrusion Attacks

DDoS (distributed denial-of-service) and DoS (denial-of-service) attacks are flooding intrusion threats capable of compromising the availability of IoT device operations. DoS intrusion attacks require one Raspberry Pi attack to flood IoT devices. Moreover, we use multiple Raspberry Pis to accomplish DDoS intrusion attacks via an SSH-based master–client configuration in our proposed IDPST. Various DDoS and DoS intrusion attack features are detected and are well classified according to Figure 4 and Figure 5 above.

3.9.2. IoT Mirai Intrusion Threats

Special IoT intrusion attacks of interest, such as Mirai threats, include botnet intrusion attacks with a high capability of corrupting thousands of IoT devices. Accordingly, we deem it a high priority to investigate the high incidence of Mirai intrusion attacks in the CICIoT2023 dataset. Therefore, we use our proposed IDPST that deploys our proposed DLMIDPSM to detect and prevent the Mirai intrusion attacks. As a result, we conduct complete large-scale intrusion attack detection and prevention using the CICIoT2023 dataset. Therefore, in this research, we deploy our proposed IDPST that employs five different raspberries, as illustrated in Figure 7 below. Deploying the raspberries in our proposed IDPST can authorize different intrusion detection capability connections. Based upon this, the IDPS (intrusion detection and prevention system) becomes possible in different IoT layers that detect and prevent all intrusion attacks from the CICIoT2023 dataset.

Figure 7. Basic intrusion attack framework of the CICIoT2023 dataset [32].

In addition, we can connect to the internet through a gateway that utilizes Windows 11 instances and promotes and monitors internet access. Based upon this, we experience Internet access due to an unmanaged Netgear switch, which leads to the connection of high intrusion attacks with general IoT device involvement. We utilize tools, including a unique Mirai configuration, which is accessible through our proposed IDPST network interface. We also use an IoT supervisor found online that coordinates the operation of several IoT devices in our proposed IDPST (including cameras, sensors, and smart speakers). We attest that other previous researchers’ works do not investigate Mirai intrusion detection when considering a set of intrusion attack detections, including the method [31] dataset. This research focuses on many intrusion attacks executed against the CICIoT2023 dataset devices. However, we consider it essential to investigate and manage new IoT intrusion attacks in future research.

3.10. Preprocessing and Feature Scaling of the Categorical CICIoT2023 Dataset Based on the DLMIDPSM

Since our main aim is to deploy IDPS capability in the CICIoT2023 dataset, we preprocess and transform the categorical CICIoT2023 dataset through our proposed IDPST. Consequently, we utilize our proposed DLMIDPSM capability to provide IDPS capability. By preprocessing the CICIoT2023 dataset, we transform all features to numerical values using one hot encoding (one-of-K). In addition, preprocessing the CICIoT2023 dataset enables us to transform all categorical features, as shown in the histogram graphs in Figure 8. By preprocessing, we also ensure that the transformer inputs obtained are matrices of integers, which denote the values taken through categorical–discrete features [52]. The expected output is a sparse matrix with each column corresponding to one possible value of one feature. Thus, it presupposes that input features prevent values within the range [0, n values].

Figure 8. Twenty-one valid categorical feature columns trained in our CICIoT2023 dataset with IDPS capability.

We perform feature scaling for scaling the features of the categorical CICIoT2023 dataset, and we standardize the independent features present within the established range. Consequently, since we conduct feature scaling on the CICIoT2023 dataset and preprocess with a standard scalar, the values scale with the distribution. Figure 9 demonstrates the number of valid categorical feature columns trained in the CICIoT2023 dataset, as shown graphically by the histogram.

Figure 9. Ten numerical features of the trained dataset for detecting and preventing intrusion.

4. Training and Validating/Testing the Dataset

4.1. Training and Validating the CICIoT2023 Dataset Using Our Proposed DLMIDPSM

After preprocessing and receiving the dataset from our proposed IDPST, we trained and used it to create a pre-trained model. The training of our model included feeding through our proposed DLMIDPSM sequential model capability, which employed more than 10 dense layers, in addition to the input and output layers. Based on the model description, our model training utilized the ReLu and SoftMax activation functions to train CIC IoT devices in the CICIoT2023 dataset.

The intrusion attacks aimed to infect the CICIoT2023 dataset devices and form botnet intrusion attacks capable of flooding target device victims. If we failed to mitigate the intrusion attacks, they would disrupt and cause the IoT devices to malfunction, giving different perspectives for their intended application use.

4.2. Discussion on Validating the CICIoT2023 Dataset Utilizing the DLMIDPSM

We used validation/testing of the dataset to ensure that our proposed DLMIDPSM was fed with the trained model and ran through 80 epochs. Based on this, we achieved a model accuracy of 85 percent for both the training and validation of the complete model, using 80 epochs, as shown in Figure 10. The model incurred a validation loss of only 1% in 80 epochs, as shown in Figure 11. The pre-trained model, including large-scale IoT devices, was finalized and optimized on the benchmark dataset. The optimization solution considered feasible computation costs and the maximum accuracy obtained through our proposed DLMIDPSM model. In addition, this marked the end of the intrusion attacks initially present in the CICIoT2023 dataset used.

Figure 10. Complete model accuracy in 80 epochs.

Figure 11. Complete model accuracy loss vs. number of epochs.

We further interpreted the plots of the complete model accuracies, most of the simple model accuracies, and the no-regularization model accuracies in the graphs below in Figure 12, Figure 13, Figure 14 and Figure 15. Thus, from Figure 12 and Figure 13, the model used the most straightforward accuracy. In Figure 12 and Figure 13, the model accuracy dropped for both training and validation based on the simple accuracy model of 78% at 80 epochs, whilst the training and validation loss was only 5% at 80 epochs, leading to the model accuracy dropping. In Figure 14 and Figure 15, the model used no-regularization accuracy for training and validation, with an accuracy of 82% and a loss of 1% at just 80 epochs. These figures and their corresponding values showed the progress and the capability of our proposed deep neural network DLMIDPS. Based on Table 1 below, the learning curve plots indicate the best-fit models, because the training loss plot decreases to the stability point, and the plot, including the validation loss, decreases to the stability point with a small gap with the training loss. The codes are in https://www.kaggle.com/code/samerskine/seguridad-informatica-iot (readers should have kaggle account to access link).

Table 1. Showing the training and validation accuracies and their corresponding losses of the DLMIDPS models.

MOST_SIMPLE_MODEL

Figure 12. Most simple model accuracy vs. number of epochs.

Figure 13. Most simple model accuracy loss vs. number of epochs.

NO_REGULARIZATION_MODEL

Figure 14. No regularization model accuracy vs. number of epochs.

Figure 15. No regularization model accuracy loss vs. number of epochs.

4.3. Intrusion Prevention System (IPS) Deep Learning Approach

We utilize our proposed DLMIDPSM capability to highlight other IPS evaluations which are further based on training and validation based on our intrusion prevention system (IPS) method, which we achieved through our proposed IDPST for capturing data for analysis. IDPST is effective for intrusion detection and prevention using command operations in Linux, such as iptables. Thus, the data packets can be easily blocked (as observed in our Algorithm 2) by our proposed IDPST through our DLMIDPSM using iptables, based upon the prevention method required for the specific intrusion attacks. The IPS method employs our proposed DLMIDPS model. It prevents all new intrusion attacks, including DDoS, DoS, MITM, Mirai, and Recon, as investigated in this research [53], and based on the analysis of the CICIoT2023 dataset.

Moreover, our proposed DLMIDPSM drives an input through the classification part to identify the intrusion attack features corresponding to which IPS operation is suitable. For example, if DDoS and DoS attacks occur, they reveal the sender’s IP address and any intrusion attacker, and all packets from the attacker’s IP are either dropped or blocked (as we showed in Algorithm 2 previously). Similarly, any MITM and Recon intrusion attacks can take place. In this case, our proposed DLMIDPSM IPS system notices the connection port number at which the intrusion attack occurs and simultaneously blocks or prevents all of the packets from going through that port number. Nothing happens if the output of the detection part is benign or normal traffic [54,55].

4.4. Deep Learning Multilayer Perceptron (DLMLP) Role Using Our Proposed DLMIDPSM Algorithm

The deep learning multilayer perceptron (DLMLP), employed through our proposed DLMIDPSM, plays a profound neural network-type role that includes multiple layers of interconnected neurons known as neurons [56]. Consequently, we utilized the DLMLPIDPSM model of multiple interconnected neurons to deploy intrusion detection and prevention in the CICIoT2023 dataset. Thus, our interest in IDPS in this research was focused on using the large-scale CICIoT2023 dataset with our DLMIDPSM capability for the efficient detection and prevention of all forms of intrusion attacks, including the DDoS, DoS MITM Mirai, Recon, and benign traffic.

Furthermore, the DLMIDPSM can learn nonlinear relationships in the CICIoT2023 dataset through backpropagation. Thus, the DLMIDPSM is more flexible and can approximate any continuous functions, preparing them for more suitable responsibilities. However, we were required to train and validate carefully by tuning the parameters to prevent overfitting. Consequently, we implemented our proposed DLMIDPSM algorithm (Algorithm 2 below). The algorithm employed feed-forward and backpropagation steps, where the weights of the inputs of the dataset were updated to minimize errors between the predicted and actual output. When the data were highly complex, we also deployed the DLMIDPSM for the CICIoT2023 dataset and performed feature engineering effectively.

As we show in Algorithm 2 below, we have created two classifiers trained on more than 80% of the CICIoT2023 dataset and validated the remaining 20% of samples for performance evaluation for determining the features of the intrusion attacks [57,58,59].

Algorithm 2: Proposed DLMIDPSM Training, Classification, and Validation Algorithm

Input: X_train, X_validation (CICIoT2023 dataset selected/extracted features)
Output: Performance metrics and run time

Initialize CICIoT2023 dataset:(X_train, X_validation, with features_reduced [], and y_train, y_train, y_validation).
Initialize a list of deep learning and ML models: ([models in methods [31] and method [32]])
Initialize classification function: [binary, multiclass]
For the CICIoT2023 dataset:
For each of the model’s methods:
For classification in each classification function:
Begin running the model training and validation starting time.
Train the model on the training set
Predict on the validation set.
Determine the runtime after stopping the timer.
Determine the performance metrics: accuracy, precision, recall, and f1 score.
Preserve performance metrics and runtime.
end of (classification task)
end for (model methods)
End for (CIVIoT2023 IDPS dataset)

4.5. Feature Extraction and Data Processing Description Using ML and DLMIDPSM Approach

To further emphasize the performance of our proposed DLMIDPSM, we employed an ML (machine learning) approach, using decision tree (DT) and support vector machine (SVM) methods. Previous researchers used the DT and SVM methods for deep learning in [31,32]; however, we instead dwelled on the DT and SVM evaluation metrics. Based upon this, we captured the CICIoT203 dataset in two different forms: the csv and pcap file formats. We used the CSV file format to load the data in our proposed IDPST. We also generated the dataset based on the different scenarios which we captured. The files included the features extracted using the pcap files based on a fixed-size packet window. We used a sequence of packets containing information between hosts to extract the features.

Figure 16a below depicts the method used to produce the dataset. Initially, we captured the data generated, then extracted and labeled the data. Furthermore, the initial stage was to execute intrusion attacks against IoT devices. Thereafter, we processed the data so that other researchers could have access to and generate the data with ease. Consequently, we conducted the deep learning multilayer perceptron and the ML evaluation using classification, training, and validation with Algorithm 2 to leverage the dataset generated.

Figure 16. (a). ML method to produce intrusion attacks in the CICIoT2023 dataset, (b) ML method to produce the intrusion attacks in the CICIoT2023 dataset.

In Figure 16b below, we show how we generated, extracted, labeled, and conducted each intrusion attack scenario, including benign traffic scenarios, for the deep learning and the other machine learning methods used for analysis. We used different tools to execute intrusion attacks against the IoT devices in our proposed IDPST. Thereafter, we captured the network traffic in pcap format using Wireshark. Consequently, we captured all the labeled data based on the executed intrusion attack.

Figure 17 below depicts the data processing steps. In the data processing steps, network traffic data included all intrusion attacks captured and benign traffic. Since the network traffic was enormous, above 520 GB, we needed to split the data into small packet chunks of over 8 MB in size for parallel conversion. We used tcpdump [60] connected to our proposed IDPST to attract all intrusion packets from IoT devices and analyzed them effectively for any intrusion attacks generated through the CICIoT2023.

Figure 17. Data processing and converting from pcap to csv.

Consequently, we executed a parallel procedure to extract all dataset features using the DPKT package [61] and stored them in independent csv files. Examples of these features are shown in Table 2 below.

Table 2. Intrusion attack features extracted in the IDPST network traffic.

We extracted these IoT network device features using state-of-the-art technology [62]. However, previous researchers have made an effort to validate these features. We can further extract and engineer many other features using the scripts generated in this research, including the raw network traffic generated, such as using the pcap files.

Consequently, we grouped the values and counts of the intrusion attack features captured using a window size above 8 (i.e., Uploading_attack, Recon_PingSweep, BrowserHijacking, XSS, Backdoor_Malware, Sqlinjection, CommandInjection, Dictionary Brute force, DDoS-SlowLowris, DDoS-HTTP_Flood, VulnerabilityScan, DoS_HTTP_Flood, Recon-PortScan, ReconOSScan, Recon-HostDiscovery, DNS_Spoonfing, DDoS-ACK Fragmentation, DDoS-UDP_Fragmentation, MITM-ArpSpoofing, DDoS-ICMP_Fragmentation, Mirai-greip_flood, Mirai-greeth_flood, BenignTraffic, DoS-SYN_Flood, DoS-TCP_Flood, DoS-UDP_Flood, DDoS-SynanimousIP_Flood, DDoS-RSTFINFlood, DDoS-PSHACK_Flood, DDoS-SYN_Flood, DDoS-TCP_Flood, DDoS-UDP_Flood, DDoS-ICMP_flood), packets justifying the data size and any inconsistencies (e.g., DDoS, DoS CommandInjection), then determined their mean values, variances and covariances, and other features, including the mean (AVG), standard deviation (std), weight, max (maximum), and min (minimum), using Pandas [63] and NumPy [64]. These detected intrusion attack features are shown in Figure 18 below in histogram graphs. Consequently, we combined all the files and processed them into csv format using Pandas. Ultimately, the final csv datasets represented the combination of features of each data packet chunk.

Figure 18. Intrusion attack categorical features extracted using IDPST network traffic.

4.6. Machine Learning (ML) and Deep Learning Multilayer Perceptron Evaluation (DLMLP)

We demonstrate how we train the CICIoT2023 dataset with machine learning (ML) and DLMIDPSMs, evaluating intrusion attacks with detection, prevention, and classification methods. We use Figure 19 below to demonstrate the proposed ML and DLMIDPSM evaluation pipeline adopted in our proposed research data processing architecture that converts cap-captured files to csv. The combined ML and DLMIDPSMs help to combine and shuffle all the malicious attacks or intrusions and benign traffic into one blended dataset. After the successful integration of the dataset, we base our evaluation of the ML and DLMIDPSM performance on (i) multiclass classification that considers more than six attack groups (for example, DDoS, DoS, MITM, Mirai, and Recon intrusion attacks and a vulnerability scan), and (ii) binary classification (an example is malicious and benign traffic classification), which we mainly investigate in this research. Thus, we divide the dataset into two sets, namely training (80%) and validation/testing (20%), and we normalize that through the StandardScaler method [65], which occurs before processing the data.

Figure 19. The ML and DLMIDPSMs pipeline used in this research.

5. Experimental and Simulation Setup Analysis

We used high-computing hardware and software in the experimental setup to run simulations and observed the results to predict intrusion attacks. So, we used a high-memory computer processor, intel Xeon W1370 (Intel Corporation, Santa Clara, CA, USA), and GPU RTX3072 (NVIDIA Corporation, Santa Clara, CA, USA) on the Windows 11 platform (Microsoft Corporation, Redmond, WA, USA). The Windows 11 platform had 16 GB RAM and SSD. We used Python (Version 3.10) and Jupiter Notebook software (Version 7.3.3) current versions to conduct the experiment. Our proposed method used a sample batch size of 256 in the experiment, along with the Adam optimizer, SoftMax, and Relu activation functions. We ran a maximum number of 25–80 epochs. We also used binary classification with Cross Entropy loss.

Our research method values and parameters used in the experimental setup are in Table 3. We present an extensive experiment examining the performance of our proposed DLMIDPSM algorithm and the ML and DLMIDPSMs. The evaluation process included using deep learning-based and other machine learning-based classification models. The criteria for assessing our performance metrics were accuracy, precision, recall, and F1-score. These were based on the performance evaluation of the simulation hyperparameter parameters and their values, as shown in Table 4 below.

Table 3. Experimental simulation setup hyperparameters.

Table 4. Simulation figure values about precision/accuracy and f1-score results.

Furthermore, we showed our proposed machine learning-based (ML) model, as well as the deep learning multilayer perceptron (DLMLP) intrusion detection and prevention system (IDPS)-based models. We also showed the combined machine learning and deep learning models as ML and DLMIDPSMs. The proposed ML and DLMIDPSMs provided an outstanding performance in terms of precision, recall, f1-score, and support, as shown in Table 3 above. In addition, we presented the confusion matrix of our proposed exceptional combined ML and DLMIDPSMs, shown in Figure 20. Based on the confusion matrix performance, we could determine that the models were more effective in detecting the DDoS, DoS, and Mirai intrusion attacks, which depict the actual botnet attacks for intrusion detection.

Figure 20. Confusion matrix of proposed models’ binary classification between methods [31,32] that result from the DLMLP.

5.1. Performance Metric Evaluation Measurements Used

We use performance metric evaluation measurements based on the differences derived in the configuration of the ML and DLMIDPSMs, such as accuracy, precision, f1-score, and recall. Here, we denote TP as the True Positive, TN as the True Negative, FP as the False Positive, and FN as the False Negative for the performance metric measurements applicable below.

Accuracy: Evaluation of the classification models by identifying the proportion of correct predictions in the given dataset.
Precision: Ratio of correctly identified labels to the absolute number of positive classifications, based on the formula below.
Recall: Ratio of correctly identified labels to the total absolute number of occurrences of those labels.
F1-score: Average geometric based on precision and recall.

5.2. Binary Classification Result Analysis

We use binary classification metric performance evaluation, such as the best-trained model’s precision, accuracy, and f1-score. We show this with our proposed combined machine learning and deep learning models with intrusion detection capabilities, known as ML and DLMIDPSMs. ML and DLMIDPSMs are applied and evaluated with the methods [26] DT (decision tree) and SVM (support vector machine), and the results are shown in Table 4 below. Thus, the DT and the SVM models are initially tested and validated with the combined ML and DLMIDPSMs. The DT has a precision/accuracy of 74% in 100 epochs. Table 4 shows the simulation results of performance metrics used.

Consequently, we modify the ML and DLMIDPSMs to investigate the SVM using the linear kernel function in state 1. When we consider detecting botnet attacks like Mirai, we validate the newly developed SVM model at 100 epochs with a precision/accuracy of 83%. To develop and implement the DT and SVM models more precisely and accurately, we develop the proposed combined ML and DLMIDPSM methods for [31,32]. The combined model has a significant precision/accuracy which is 99% higher in the detection of Mirai attacks compared to the model in method [32], DT, SVM ML models separately, and method [31], when considered in the detection of all intrusion attacks of all kinds, as shown in Table 5. This also presupposes that our proposed model is capable of being used for producing a standard IoT security solution for many internet of things applications, including intelligent transportation systems (ITS) for internet of vehicles (IoV), smart health applications, and smart city development.

Table 5. Values of precision of intrusion attack detection based on the combined model method used.

6. Conclusions

With the proliferation of many IoT datasets, including the CICIoT2023 dataset, urgent security solutions are essential for protecting IoT devices. Such a security solution requires urgent attention to redesign the dataset of IoT devices. We utilize the CICIoT2023 dataset and provide it with an IDPS capability, using an intrusion detection system (IDS) and an intrusion prevention system (IPS) model capability proposed in this research as the DLMIDPSM. Our proposed DLMIDPSM detects and mitigates all intrusion attacks that arise nowadays in IoT devices. In our proposed research method, we utilize the proposed deep learning multilayer perceptron (DLMLP) [31] and deploy it with a large-scale CICIoT2023 dataset, optimized for IDPS. DLMLP provides IDPS capability. Therefore, we deploy DLMLP with IDPS capability in our proposed research method, the DLMIDPSM. We use the DLMIDPSM to offer an efficient intrusion detection and prevention system (IDPS). Furthermore, our proposed IDPST model can utilize machine learning (ML) and DLMLP models, which are combined as ML and DLMIDPSMs, for efficiently capturing all new intrusion attacks that emerge in the IoT devices in CICIoT2023 dataset.

Furthermore, ML and DLMIDPSMs are also used in our proposed research method to redesign a new IoT device dataset, such as the CICIoT2023 dataset, providing IDPS capability, which we investigate in our proposed IDPST. ML and DLMIDPSMs detect and prevent new intrusions in IoT device datasets such as the CICIoT2023. In addition, our proposed DLMIDPSM also predicts all intrusion attacks with an accuracy of over 85 percent, compared to that generated in the method in [31], which had an accuracy of 81 percent. The combined machine learning model, the deep learning ML model, and the DLMIDPSMs are efficient and have a significant precision/accuracy of 99% in detecting Mirai attacks compared to the model in method [31], which used decision tree (DT) and support vector machine (SVM) and had a precision and accuracy of 74% and 83%, respectively, when detecting Mirai attacks, depicting a trustworthy intrusion source.

In the future, we would like to explore and build on our proposed methods and depend more on ANN, CNN, RNN, and other DLMLP models to investigate IoT device dataset security solutions. We aim to analyze performance metrics like accuracy and precision and F1-score, which confirm the legitimacy of our proposed IoT security method solution.

Funding

This research received no external funding at this time.

Data Availability Statement

Data are available upon request.

Acknowledgments

The author acknowledges that he is the sole contributor to this research.

Conflicts of Interest

The author declares no conflicts of interest.

References

Nataraj, B.; Duraisamy, P. An Investigation on Attacks in Application Layer Protocols and Ransomeware Threats in Internet of Things. In Proceedings of the 2023 9th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 17–18 March 2023; pp. 668–672. [Google Scholar]
Choudhary, V.; Tanwar, S.; Rana, A. Demystifying Security and Applications of Internet of Things. In Proceedings of the 2021 9th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 3–4 September 2021; pp. 1–5. [Google Scholar]
Wang, J.; Liu, Y.; Su, W.; Feng, H. A DDoS attack detection based on deep learning in software-defined Internet of things. In Proceedings of the 2020 IEEE 92nd Vehicular Technology Conference (VTC2020-Fall), Victoria, BC, Canada, 18 November–16 December 2020; pp. 1–5. [Google Scholar]
Villanueva-Miranda, I.; Nazeran, H.; Martinek, R. A Semantic Interoperability Approach to Heterogeneous Internet of Medical Things (IoMT) Platforms. In Proceedings of the 2018 IEEE 20th International Conference on e-Health Networking, Applications and Services (Healthcom), Ostrava, Czech Republic, 17–20 September 2018; pp. 1–5. [Google Scholar]
Mishra, A.R.; Vishwakarma, N.K.; Shukla, R.; Mishra, R. Internet of Things Application: E-health data acquisition system and Smart agriculture. In Proceedings of the 2022 10th International Conference on Emerging Trends in Engineering and Technology—Signal and Information Processing (ICETET-SIP-22), Nagpur, India, 29–30 April 2022; pp. 1–5. [Google Scholar]
Kumar, B.J.S.; Sinha, S. An Intrusion Detection and Prevention System against DOS Attacks for Internet-Integrated WSN. In Proceedings of the 2022 7th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 22–24 June 2022; pp. 793–797. [Google Scholar] [CrossRef]
Selvaraj, S.; Sundaravaradhan, S. Challenges and opportunities in IoT healthcare systems: A systematic review. SN Appl. Sci. 2019, 2, 139. [Google Scholar] [CrossRef]
Akkaş, M.; Sokullu, R.; Çetin, H.E. Healthcare and patient monitoring using IoT. Internet Things 2020, 11, 100173. [Google Scholar] [CrossRef]
Zantalis, F.; Koulouras, G.; Karabetsos, S.; Kandris, D. A review of machine learning and IoT in intelligent transportation. Future Internet 2019, 11, 94. [Google Scholar]
Uma, S.; Eswari, R. Accident prevention and safety assistance using IOT and machine learning. J. Reliab. Intell. Environ. 2021, 8, 79–103. [Google Scholar] [CrossRef]
Celesti, A.; Galletta, A.; Carnevale, L.; Fazio, M.; Lay-Ekuakille, A.; Villari, M. An IoT Cloud System for Traffic Monitoring and Vehicular Accidents Prevention Based on Mobile Sensor Data Processing. IEEE Sensors J. 2017, 18, 4795–4802. [Google Scholar] [CrossRef]
Hassan, R.; Sagar, A.K.; Banda, L. Future Internet of Things: A Framework for Next Generation Smart Cities. In Proceedings of the 2021 IEEE 6th International Conference on Computing, Communication and Automation (ICCCA), Arad, Romania, 17–19 December 2021; pp. 106–112. [Google Scholar]
He, H. Research on the Application of Electronic Technology of Internet of Things in Smart City. In Proceedings of the 2020 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Vientiane, Laos, 11–12 January 2020; pp. 454–457. [Google Scholar]
Kumar, R.; Sharma, B. Comparative Analysis of Smart Cities based Architecture, Applications, Technologies, & Challenges in Internet of Things. In Proceedings of the 2023 6th International Conference on Information Systems and Computer Networks (ISCON), Mathura, India, 3–4 March 2023; pp. 1–5. [Google Scholar]
Al-Emran, M.; Malik, S.I.; Al-Kabi, M.N. A Internet of Things (IoT) survey in education: Opportunities and challenges. In Toward Social Internet of Things (IoT): Enabling Technologies, Architectures, and Applications; Springer: Berlin/Heidelberg, Germany, 2020; pp. 197–209. [Google Scholar]
Nayak, S.; Das, S.; Chakraborty, B.; Chakraborty, T.; Roy, K. Internet of Things (IoT) Based Continuous Growth Rate Monitoring System of Plant Stem. In Proceedings of the 2022 IEEE VLSI Device Circuit and System (VLSI DCS), Kolkata, India, 26–27 February 2022; pp. 275–279. [Google Scholar]
Shafique, K.; Khawaja, B.A.; Sabir, F.; Qazi, S.; Mustaqim, M. Internet of things (IoT) for next-generation innovative systems: A review of current challenges, future trends and prospects for emerging 5G-IoT scenarios. IEEE Access 2020, 8, 23022–23040. [Google Scholar] [CrossRef]
Neto, E.C.P.; Dadkhah, S.; Ghorbani, A.A. Collaborative DDoS Detection in Distributed Multi-Tenant IoT using Federated Learning. In Proceedings of the 2022 19th Annual International Conference on Privacy, Security & Trust (PST), Fredericton, NB, Canada, 22–24 August 2022. [Google Scholar]
Kaur, B.; Dadkhah, S.; Xiong, P.; Iqbal, S.; Ray, S.; Ghorbani, A.A. Verification-based scheme to restrict iot attacks. In Proceedings of the 2021 IEEE/ACM 8th International Conference on Big Data Computing, Applications and Technologies (BDCAT’21), Leicester, UK, 6–9 December 2021; pp. 63–68. [Google Scholar]
Sharma, S.; Kaushik, B. A survey on internet of vehicles: Applications, security issues & solutions. Veh. Commun. 2019, 20, 100182. [Google Scholar] [CrossRef]
Guerra, J.L.; Catania, C.; Veas, E. Datasets are insufficient: Challenges in labeling network traffic. Comput. Secure. 2022, 120, 102810. [Google Scholar]
Kalra, H. An E-Healthcare System Enhancement Via a Dynamic Cloud-Computing Platform. In Proceedings of the 2023 2nd International Conference on Futuristic Technologies (INCOFT), Karnataka, India, 24–26 November 2023; pp. 1–6. [Google Scholar]
Iqbal, W.; Abbas, H.; Daneshmand, M.; Rauf, B.; Bangash, Y.A. An In-Depth Analysis of IoT Security Requirements, Challenges, and Their Countermeasures via Software-Defined Security. IEEE Internet Things J. 2020, 7, 10250–10276. [Google Scholar] [CrossRef]
Wurm, J.; Hoang, K.; Arias, O.; Sadeghi, A.-R.; Jin, Y. Security analysis on consumer and industrial IoT devices. In Proceedings of the 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), Macao, China, 25–28 January 2016; pp. 519–524. [Google Scholar]
Nithya, R.; Sundari, J.A.; Kanna, B.R.; Balamurugan, M.S.; Sindhuja, R.; Srivastava, A. Multimodal Sensor Data Fusion Based Cyberattack Detection in Industrial Internet of Things Environment. In Proceedings of the 2023 7th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 22–24 November 2023; pp. 1656–1661. [Google Scholar]
Balamurugan, B.; Biswas, D. Security in the network layer of IoT: Possible measures to preclude. In Security Breaches and Threat Prevention in the Internet of Things; IGI Global: Hoboken, NJ, USA, 2017; pp. 46–75. [Google Scholar]
Safi, M.; Dadkhah, S.; Shoeleh, F.; Mahdikhani, H.; Molyneaux, H.; Ghorbani, A.A. A Survey on IoT Profiling, Fingerprinting, and Identification. ACM Trans. Internet Things 2022, 3, 26. [Google Scholar] [CrossRef]
Elghalhoud, O.; Naik, K.; Zaman, M.; Goel, N. Data balancing and hyper-parameter optimization for machine learning algorithms for secure iot networks. In Proceedings of the 18th ACM International Symposium on QoS and Security for Wireless and Mobile Networks, Montreal, QC, Canada, 24–28 October 2022; pp. 71–78. [Google Scholar]
Abrishami, M.; Dadkhah, S.; Neto, E.C.P.; Xiong, P.; Iqbal, S.; Ray, S.; Ghorbani, A.A. Label Noise Detection in IoT Security based on Decision Tree and Active Learning. In Proceedings of the 2022 IEEE 19th International Conference on Smart Communities: Improving Quality of Life Using ICT, IoT and AI (HONET), Marietta, GA, USA, 19–21 December 2022; pp. 46–53. [Google Scholar]
Erfani, M.; Shoeleh, F.; Dadkhah, S.; Kaur, B.; Xiong, P.; Iqbal, S.; Ray, S.; Ghorbani, A.A. A feature exploration approach for IoT attack type classification. In Proceedings of the 2021 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), Calgary, AB, Canada, 25–28 October 2021; pp. 582–588. [Google Scholar]
Krishna, A.; Lal, A.; Mathewkutty, A.J.; Jacob, D.S.; Hari, M. Intrusion Detection and Prevention System Using Deep Learning. In Proceedings of the 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2–4 July 2020; pp. 273–278. [Google Scholar]
Neto, E.C.P.; Dadkhah, S.; Ferreira, R.; Zohourian, A.; Lu, R.; Ghorbani, A.A. CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment. Sensors 2023, 23, 5941. [Google Scholar] [CrossRef] [PubMed]
Adebayo, P.O.; Abdulahi, M.J.; Lawrence, O.M.; Ibrahim, Y.A.; Faki, S.A.; Hassan, B.A. An Artificial Intelligence-based Ensemble Technique for Intrusion Detection and Prevention in IoT Systems. In Proceedings of the 2024 International Conference on Science, Engineering and Business for Driving Sustainable Development Goals (SEB4SDG), Omu-Aran, Nigeria, 2–4 April 2024; pp. 1–6. [Google Scholar]
Manivannan, R. Improving IoT Security with AI-Powered Anomaly Detection and Intrusion Prevention. In Proceedings of the 2023 International Conference on Innovative Computing, Intelligent Communication and Smart Electrical Systems (ICSES), Chennai, India, 14–15 December 2023; pp. 1–5. [Google Scholar]
Adrian, R.; Okke, A.J.; Somardani, M.A.R.; Widiasari, T. Determination of Attack Points on IoT Devices based on Particle Swarm Optimization to Support Intrusion Prevention System. In Proceedings of the 2022 5th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Yogyakarta, Indonesia, 8–9 December 2022; pp. 47–50. [Google Scholar]
Kauhsik, B.; Nandanwar, H.; Katarya, R. IoT Security: A Deep Learning-Based Approach for Intrusion Detection and Prevention. In Proceedings of the 2023 International Conference on Evolutionary Algorithms and Soft Computing Techniques (EASCT), Bengaluru, India, 20–21 October 2023; pp. 1–7. [Google Scholar]
Geetha, K.; Sreedevi, A.G.; Chadha, A.R. Unraveling IoT Network Security with Snort for Robust Intrusion Detection and Prevention. In Proceedings of the 2024 IEEE International Conference on Contemporary Computing and Communications (InC4), Bangalore, India, 15–16 March 2024; pp. 1–6. [Google Scholar]
Ramaiah, N.S.; Andrews, S.K.; Shenbagharaman, A.; Gowtham, M.S.; Bhaskar, B.; Tiwari, M. Enhancing IoT Security Through AI-Based Anomaly Detection and Intrusion Prevention. In Proceedings of the 2023 6th International Conference on Contemporary Computing and Informatics (IC3I), Gautam Buddha Nagar, India, 14–16 September 2023; pp. 1786–1790. [Google Scholar]
Ashish, K.; Manoj, K. Classification of Deep Learning methods in Intrusion Detection for IoT Devices. In Proceedings of the 2024 International Conference on Data Science and Network Security (ICDSNS), Tiptur, India, 26–27 July 2024; pp. 1–6. [Google Scholar]
Sushant, C.G.; Ajay, V.L.; Sahay, R. A Comparative Analysis of Deep Learning Algorithms for Intrusion Detection in IoT. In Proceedings of the 2024 International Conference on Emerging Techniques in Computational Intelligence (ICETCI), Hyderabad, India, 22–24 August 2024; pp. 402–407. [Google Scholar]
Devi, V.A.; Bhuvaneswari, E.; Tummala, R.K. Decentralized Hybrid Intrusion Detection System for Cyber Attack Identification using Machine Learning. In Proceedings of the 2023 International Conference on Data Science, Agents & Artificial Intelligence (ICDSAAI), Chennai, India, 21–23 December 2023; pp. 1–5. [Google Scholar]
Saba, T.; Sadad, T.; Rehman, A.; Mehmood, Z.; Javaid, Q. Intrusion Detection System Through Advance Machine Learning for the Internet of Things Networks. IT Prof. 2021, 23, 58–64. [Google Scholar] [CrossRef]
Alghaithi, H.R.O.; Alshehhi, M.M.A.M.; Murugan, T. IoT Network Anomaly Detection Using Machine Learning and Deep Learning Techniques—Research Study. In Proceedings of the 2024 IEEE Students Conference on Engineering and Systems (SCES), Prayagraj, India, 21–23 June 2024; pp. 1–6. [Google Scholar]
Maithem, M.; Al-Sultany, G.A. Network intrusion detection system using deep neural networks. J. Phys. Conf. Ser. 2021, 1804, 012138. [Google Scholar] [CrossRef]
Bhatia, V.; Choudhary, S.; Ramkumar, K.R. A Comparative Study on Various Intrusion Detection Techniques Using Machine Learning and Neural Network. In Proceedings of the 2020 8th International Conference on Reliability Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 4–5 June 2020. [Google Scholar]
Keturahlee, C. An overview of Intrusion Detection and Prevention Systems. arXiv 2020, arXiv:2004.08967. [Google Scholar] [CrossRef]
Kumari, A.; Mehta, A.K. A Hybrid Intrusion Detection System Based on Decision Tree and Support Vector Machine. In Proceedings of the 2020 IEEE 5th International Conference on Computing Communication and Automation (ICCCA), Greater Noida, India, 30–31 October 2020. [Google Scholar]
Ge, M.; Fu, X.; Syed, N.; Baig, Z.; Teo, G.; Robles-Kelly, A. Deep Learning-Based Intrusion Detection for IoT Networks. In Proceedings of the 2019 IEEE 24th Pacific Rim International Symposium on Dependable Computing (PRDC), Kyoto, Japan, 1–3 December 2019; pp. 256–25609. [Google Scholar]
Zachos, G.; Mantas, G.; Essop, I.; Porfyrakis, K.; Ribeiro, J.C.; Rodriguez, J. Prototyping an Anomaly-Based Intrusion Detection System for Internet of Medical Things Networks. In Proceedings of the 2022 IEEE 27th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD), Paris, France, 2–3 November 2022; pp. 179–183. [Google Scholar]
Bock, L. Learn Wireshark: A Definitive Guide to Expertly Analyzing Protocols and Troubleshooting Networks Using Wireshark; Packt Publishing: Birmingham, UK, 2022. [Google Scholar]
Bhat, C.; Mane, S.B.; Bhatt, C.; Verma, G.; Naser, S.J.; Jweeg, M. Enhancement of Level of Security using Wireshark Through Continuous Monitoring and Detection System. In Proceedings of the 2024 4th International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 14–15 May 2024; pp. 342–344. [Google Scholar]
Stančin, I.; Jović, A. An overview and comparison of free Python libraries for data mining and big data analysis. In Proceedings of the 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 20–24 May 2019; pp. 977–982. [Google Scholar] [CrossRef]
Oktivasari, P.; Zain, A.R.; Agustin, M.; Kurniawan, A.; Murad, F.A.; Anshor, M.F. Analysis of Effectiveness of Iptables on Web Server from Slowloris Attack. In Proceedings of the 2022 5th International Conference of Computer and Informatics Engineering (IC2IE), Jakarta, Indonesia, 13–14 September 2022; pp. 215–219. [Google Scholar]
Pande, P.; Mathur, H.; Gupta, L.K. Machine Learning-based Intrusion Detection System using Wireless Sensor Networks. In Proceedings of the 2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), Bhilai, India, 11–12 January 2024; pp. 1–10. [Google Scholar]
Li, J.; Othman, M.S.; Chen, H.; Yusuf, L.M. Optimizing IoT intrusion detection system: Feature selection versus feature extraction in machine learning. J. Big Data 2024, 11, 1–44. [Google Scholar] [CrossRef]
Amato, F.; Mazzocca, N.; Moscato, F.; Vivenzio, E. Multilayer perceptron: An intelligent model for classification and intrusion detection. In Proceedings of the 2017 31st International Conference on Advanced Information Networking and Applications Workshops (WAINA), Taipei, Taiwan, 27–29 March 2017; pp. 686–691. [Google Scholar] [CrossRef]
Sajid, M.; Malik, K.R.; Almogren, A.; Malik, T.S.; Khan, A.H.; Tanveer, J.; Rehman, A.U. Enhancing intrusion detection: A hybrid machine and deep learning approach. J. Cloud Comput. 2024, 13, 1–24. [Google Scholar] [CrossRef]
Shi, G.; Hao, H.; Lei, J.; Zhu, Y. Application security system design of Internet of Things based on blockchain technology. In Proceedings of the 2021 International Conference on Computer, Internet of Things and Control Engineering (CITCE), Guangzhou, China, 12–14 November 2021; pp. 134–137. [Google Scholar]
Sharafaldin, I.; Lashkari, A.H.; Ali, A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018), Madeira, Portugal, 22–24 January 2018. [Google Scholar]
Kadam, G.; Parekh, S.; Agnihotri, P.; Ambawade, D.; Bhavathankar, P. An Approach to Reduce Uncertainty Problem in Network Intrusion Detection Systems. In Proceedings of the 2020 IEEE 15th International Conference on Industrial and Information Systems (ICIIS), Rupnagur, India, 26–28 November 2020; pp. 586–590. [Google Scholar]
DPKT. Dpkt Documentation. 2022. Available online: https://dpkt.readthedocs.io/en/latest/ (accessed on 19 June 2023).
Ferrag, M.A.; Friha, O.; Hamouda, D.; Maglaras, L.; Janicke, H. Edge-IIoTset: A new comprehensive, realistic cyber security dataset of IoT and IoT applications for centralized and federated learning. IEEE Access 2022, 10, 40281–40306. [Google Scholar] [CrossRef]
Bilal, M.; Ali, G.; Iqbal, M.W.; Anwar, M.; Malik, M.S.A.; Kadir, R.A. Auto-Prep: Efficient and Automated Data Preprocessing Pipeline. IEEE Access 2022, 10, 107764–107784. [Google Scholar] [CrossRef]
Vadlamani, A.; Kalicheti, R.; Chimalakonda, S. APIScanner—Towards Automated Detection of Deprecated APIs in Python Libraries. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), Madrid, Spain, 25–28 May 2021; IEEE: Piscataway, NJ, USA, 2021. [Google Scholar]
Hakim, M.I.N.; Siswanto, J.; Qalban, A.A.; Nuryono, A.A. Increasing the Accuracy of Classification Models with a Scaler for Bus Rapid Transit (BRT) Reliability Values. In Proceedings of the 2024 International Conference on Electrical Engineering and Computer Science (ICECOS), Palembang, Indonesia, 25–26 September 2024; pp. 211–216. [Google Scholar]

Figure 1. Proposed deep learning multilayer perceptron intrusion detection and prevention system model (DLMIDPSM).

Figure 2. CIC IoT lab setting with IDPS capability [32].

Figure 3. Proposed IoT intrusion detection and prevention system topology (IDPST) experiment for a smart industrial office [32].

Figure 4. Number of rows and counts for each attack scenario.

Figure 5. Number of rows and counts for each intrusion attack category.

Figure 6. Valid categorical intrusion data.

Figure 7. Basic intrusion attack framework of the CICIoT2023 dataset [32].

Figure 8. Twenty-one valid categorical feature columns trained in our CICIoT2023 dataset with IDPS capability.

Figure 9. Ten numerical features of the trained dataset for detecting and preventing intrusion.

Figure 10. Complete model accuracy in 80 epochs.

Figure 11. Complete model accuracy loss vs. number of epochs.

Figure 16. (a). ML method to produce intrusion attacks in the CICIoT2023 dataset, (b) ML method to produce the intrusion attacks in the CICIoT2023 dataset.

Figure 17. Data processing and converting from pcap to csv.

Figure 18. Intrusion attack categorical features extracted using IDPST network traffic.

Figure 19. The ML and DLMIDPSMs pipeline used in this research.

Figure 20. Confusion matrix of proposed models’ binary classification between methods [31,32] that result from the DLMLP.

Table 1. Showing the training and validation accuracies and their corresponding losses of the DLMIDPS models.

DLMIDPS Models (Proposed)	Accuracy for Training/Validation (%)	Loss for Training/Validation (%)
Complete_model	85	2
No_regularization_model	80	1
Most_simple_model	78	0.5

Table 2. Intrusion attack features extracted in the IDPST network traffic.

Number	Feature	Description
1	Fin flag number	Finish flag value
2	Syn flag number	Synchronous flag value
3	Psh flag number	Push flag value
4	Ack flag number	Acknowledgment flag value
5	Ece flag number	Explicit congestion notification echo value
6	Car flag number	The congestion window reduced the number
7	HTTP	Indicates if the application layer protocol is HTTP
8	HTTPS	Indicates if the application layer is HTTPS
9	DNS	Indicates if the application layer protocol uses DNS
10	Telnet	Indicates if the application layer protocol uses Telnet
11	SMTP	Indicates if the application layer protocol is SMTP
12	SSH	Indicates if the application layer protocol is SSH
13	IRC	Indicates if the application layer protocol is IRC
14	TCP	Indicates if the application layer protocol is TCP
15	UDP	Indicates if the application layer protocol is UDP
16	DHCP	Indicates if the application layer protocol is HTTP
17	ARP	Indicates if the application layer protocol is ARP
18	ICMP	Indicates if the network layer protocol is ICMP
19	IPv	Indicates if the network layer protocol is IP
20	Flow duration	The duration of packet flow
21	Header length	Header length in protocol header
22	Protocol Type
23	Duration	Time-to-Live (TTL)
24	Rate	Packet transmission rate in flow
25	State	The outbound packet transmission rate in flow
26	Date	The inbound packet transmission rate in flow
27	Ack count	Packet amount with ack flag set in the same flow
28	Syn count	Packet amount with the syn flag set in the same flow
29	Fin count	Packet amount with fin flag set in the same flow
30	Urg count	Packet amount with urg flag set in the same flow
31	Rest count	Packet amount with rst flag set in the same flow
32	Tot sum	Packet summation length in flow
33	Min	The minimum packet length in flow
34	Max	The maximum packet length in flow
35	AVG	The average packet length in flow
36	Std	Standard deviation of packet length in flow
37	Tot size	Packet length summation in flow
38	IAT	Time difference based on the previous packet
39	Magnitude	(Average incoming packet lengths in flow + averages of the length of the outgoing packet in the flow)
40	Radius	(Incoming packet variance lengths in flow + outgoing packet variance lengths in flow)
41	Covariance	Incoming and outgoing packet covariance lengths
42	Variance	Variance lengths of incoming packets in flow/variance lengths of outgoing packets in flow
43	Weight	Incoming packet amount × outgoing packet amount
44	Number	The packet amount in flow

Table 3. Experimental simulation setup hyperparameters.

Parameter	Value
Learning Table Rate	$10^{- 3}$
Sample Batch Size	256
Optimizer	Adam
Activation Functions	SoftMax and Relu
Number of Epochs	Max 100
Binary Classification	Cross Entropy
GPU	RTX3072
Processor	Intel Xeon W 1370
Windows Platform	11 has 16GB RAM, SSD
Language Platform	Python, Jupiter Notebook
Classification Method	Binary
Libraries	Pandas, Scikit, Tensor flow, Keara
Dataset Used	CICIoT2023 dataset

Table 4. Simulation figure values about precision/accuracy and f1-score results.

	Precision	Recall	F1-Score	Support
BenignTraffic	0.00	0.00	0.00	2489
DDoS	0.82	0.99	0.90	76,498
DoS	0.88	0.39	0.54	18,053
MITM	0.00	0.00	0.00	706
Mirai	0.99	0.62	0.77	5838
Recon	0.71	0.25	0.37	703
accuracy		0.85	104,287
macro avg	0.57	0.38	0.43	104,287
weighted avg	0.81	0.83	0.80	104,287

Table 5. Values of precision of intrusion attack detection based on the combined model method used.

Precision	Decision Tree [31] MLP	SVM [31] MLP	ML and DLMIDPSMs Including [31,32]
Mirai	74.63	83.06	99.0%
DoS	74.57	82.84	88.0%
DDoS	73.9	83.31	82.0%
Recon	74.66	83.56	71.0%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Published by MDPI on behalf of the International Institute of Knowledge Innovation and Invention. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Real-Time Large-Scale Intrusion Detection and Prevention System (IDPS) CICIoT Dataset Traffic Assessment Based on Deep Learning

Abstract

1. Introduction

2. Literature Review

2.1. Internet of Things Machine Learning-Based IDPS

2.2. Deep Learning-Based Intrusion Detection and Prevention System (IDPS)

3. Proposed Solution and Methodology

3.1. Preliminary Problem Resolution Objectives

3.2. Proposed Methodology

3.3. Intrusion Detection System (IDS) Phase Used in Proposed DLMIDPS Model

3.4. Proposed DLMIDPSM Predictive Framework for Intrusion Detection and Prevention

3.5. Investigating IDPS Capability and Producing the Dataset

3.6. CIC IoT Lab with IDPS Capability

3.7. Proposed Intrusion Detection and Prevention System Topology (IDPST)

3.8. Benign and Malicious Dataset Collection Scenario from Our IDPST

Generating Intrusion/Malicious Data from the Dataset

3.9. Generating Benign Data from the CICIoT2023 Dataset

3.9.1. DDoS and DoS Intrusion Attacks

3.9.2. IoT Mirai Intrusion Threats

3.10. Preprocessing and Feature Scaling of the Categorical CICIoT2023 Dataset Based on the DLMIDPSM

4. Training and Validating/Testing the Dataset

4.1. Training and Validating the CICIoT2023 Dataset Using Our Proposed DLMIDPSM

4.2. Discussion on Validating the CICIoT2023 Dataset Utilizing the DLMIDPSM

4.3. Intrusion Prevention System (IPS) Deep Learning Approach

4.4. Deep Learning Multilayer Perceptron (DLMLP) Role Using Our Proposed DLMIDPSM Algorithm

4.5. Feature Extraction and Data Processing Description Using ML and DLMIDPSM Approach

4.6. Machine Learning (ML) and Deep Learning Multilayer Perceptron Evaluation (DLMLP)

5. Experimental and Simulation Setup Analysis

5.1. Performance Metric Evaluation Measurements Used

5.2. Binary Classification Result Analysis

6. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics