1. Introduction
During the last few decades, technological advancements have given rise to several innovative concepts, such as the Internet of Things (IoT), which played an increasingly important role in a variety of areas, such as smart cities, healthcare, and education. The IoT is a new technology paradigm envisioned as an ecosystem of interconnected ”things” aiming to bring every physical device into the digital network [
1]. Connecting billions of devices through sensors, actuators, and other components, IoT is anticipated to have 75 billion machine-to-machine connections by 2025 and is expected to generate 79.4 zettabytes of data [
2] With the tremendous amount of data flowing between IoT devices and across networks, protecting these networks against IoT breaches becomes a high priority. The world is experiencing a significant increase in IoT cyberattacks, which increased by 100% in 2021, according to Kaspersky, a leading anti-virus company. The low memory and cost of IoT devices make them even more challenging to secure [
3]. Among recent alarming incidents of IoT attacks is the Verkada breach [
4], in March 2021, where a group of hackers managed to access and control thousands of Verkada surveillance cameras. In addition, they could access video recordings stored in the cloud of more than 24,000 clients and utilize the cameras to carry out future attacks. Another intense incident occurred in July 2021 [
5] when My Book Live and My Book Live Duo devices’ storage was completely erased. This happened during a cyberattack against Western Digital, when hackers were able to remotely perform a factory reset without having a password due to a critical security vulnerability. In all of these incidents, IoT devices were targeted by several malicious attacks such as distributed denial of service (DDoS) attacks [
6], man-in-the-middle attacks [
7], spoofing attacks [
8], targeted code injection [
9], and other unprecedented types of attacks. Thus, there has been an increased concern about enhancing the effectiveness of the current Network Intrusion Detection Systems (NIDS) to detect new attacks [
10].
Network Intrusion Detection Systems (NIDS) [
11] are security tools that monitor network traffic flow to detect IoT attacks. They are designed to enhance the security of information and communication systems. NIDS can be categorized into signature-based [
12] and anomaly-based [
13]. The signature-based NIDS compares the incoming traffic to a database of known attacks based on the signatures. However, in the anomaly-based approach, a normal profile is created based on the normal behavior of the network, and any deviation from this is viewed as an attack. In other words, The signature-based method compares network traffic against a pre-existing list of compromises to detect a known intrusion efficiently. On the other hand, the anomaly-based method relies on machine learning to identify a threat making it ideal for spotting unknown attacks. However, like any system that relies on ML, actual accuracy may vary. Understandably, the pros of one approach are the cons of the other. Thus, some IDS combine signature and anomaly-based approaches to benefit from both.
Security threats are continuously evolving. Thus, NIDS detection models that rely on old datasets have clear limitations. Our paper utilizes a recently published (2021) large dataset with around 17 million data rows known as NF-ToN-IoT-v2 [
14]. While the dataset’s existing features provide outstanding accuracy, we are interested in conducting binary- and multi-class classification using a subset of the dataset’s 43 features. Reducing the number of utilized features can minimize the prediction time and necessary storage and enhance networks’ operational functionality. However, it will also reduce prediction accuracy. Thus, our focus was on investigating the right balance. Our results were acquired using the entire dataset without any sampling.
As mentioned earlier, our dataset of choice, NF-ToN-IoT-v2, is an IoT dataset consisting of 43 features with a total of 16,940,496 data rows. Each data row is classified as an attack or benign. The attacks make up 63.99% of the dataset, while the benign samples represent 36.01%. Furthermore, the dataset contains nine different types of attacks, namely Backdoor, Denial of Service (DoS), Distributed Denial of Service (DDoS), Injection, Man in the Middle (MITM), Password, Ransomware, Scanning, and Cross-site Scripting (XSS) [
14]. Such attacks compromise the IoT system’s security by violating one or more of its CIA principles [
15] (Confidentiality, Integrity, and Availability).
Table 1 below shows the distribution of NF-ToN-IoT-v2 (16,940,496) data rows over the nine attacks and the benign flow (ten classes in total) [
14]. The dataset consists of 43 features, as shown in
Table 2 [
16]. More details on NF-ToN-IoT-v2 origins, Netflow features, previous versions, and utilizations are presented in
Section 2.
In this paper, we used this dataset to efficiently classify network flows into benign and attacks (binary-classification) and to predict the exact type of attack (multi-classification). Our research target was to achieve these tasks with high accuracy while utilizing a lower number of features. In our assessment, we applied four shallow machine learning classifiers, namely Decision Trees (DT), Random Forest (RF), XGBoost (XGB), and Naïve Bayes (NB).
The paper contributes to the literature by utilizing a fairly large, recently published dataset (2021). The dataset has been fully utilized without any sampling. While other research has achieved high accuracy using an extensive set of features, our aim was to reduce the number of features while still achieving a highly accurate detection system. A lower number of features results in a faster prediction time and lower storage space, which translates into a better operational system.
The rest of the paper is divided as follows:
Section 2 reviews the literature;
Section 3 presents the methodology;
Section 4 explains how the features were chosen;
Section 5 covers the results and their analysis;
Section 6 concludes the paper.
2. Related Work
Our work is based on a dataset created by Sarhan et al. [
17]. Thus, we will start this section by summarizing their highly influential work. In their Intrusion Detection Work, Sarhan et al. [
14,
18,
19] have utilized and created several datasets with network traffic data. Sarhan et al. have been interested in utilizing Netflow features to standardize NIDS datasets [
14,
18]. NetFlow is a Cisco standard that collects traffic data as it flows across the network [
18]. The features extracted from Netflow have proven valuable in detecting network attack [
14]. In their early work, Sarhan et al., used pcap files of existing datasets to generate several datasets with a few Netflow features (up to eight features) [
18]. For example, NF-ToN-IoT was created using the publicly available ToN-IoT dataset [
20]. The binary-classification results of NF-ToN-IoT showed a slight improvement over ToN-IoT with an F1-score of 1.00 compared to 0.99 in the original dataset [
18]. However, the multi-classification F1-score weighted average dropped from 0.87 to 0.60. Both datasets contain 10 classes. Extra Trees ensemble classifier was used to assess both datasets. In later work, Sarhan et al. expanded the utilized Netflow features to 43 and generated several datasets, including NF-ToN-IoT-v2 [
14]. The use of 43 features made a significant improvement in comparison to the previous two versions. For example, the second version’s F1-score was 1.00 in binary-classification and the multi-classification F1-score weighted average was 0.98.
In addition to the dataset creators, Le et al. proposed a binary and multi-class detection model using 20 features from NF-ToN-IoT-v2 [
21]. The authors reported an F-1 score of 1.00 and an area under the curve (AUC) value of 93%. A higher AUC means a better distinction between the classes. However, we argue that our approach achieved similar results using fewer features (13 and 17). Furthermore, our proposed method resulted in an AUC of 97.7%.
In another paper, Sarhan et al. [
22] compared between original, Netflow feature based (43 features), and CICFlowMeter feature-based (83 features) datasets. CICFlowMeter is also a network flow traffic generator. The study concluded that the Netflow-based features dataset achieved higher accuracy in lower prediction time.
Furthermore, in another recent study, Sarhan et al. analyzed the classification performance of three datasets using several ML models and feature extraction algorithms and concluded that no ML model and feature extraction algorithm combination work best across all datasets [
19]. Such a conclusion motivated our work and interest in researching a trade-off between accuracy and performance.
Dias et al. [
23] utilized Netflow data to detect attacks. In their work, they used 12 fixed Netflow features in addition to a set of dynamically defined and extracted features from the network data. They experimented with 52 to 412 features using CIC-IDS-2018 and a confidential military dataset. They achieved an F1-score of 0.97.
In another work, Liu et al. [
24] converted a 2013 one-dimensional Netflow dataset into two-dimensional images and fed it to a Convolutional Neural Network (CNN). The authors reported an accuracy of 95.86%. However, they noted that the additional computational time to convert the data into images made it impractical in real-life scenarios.
Other works that utilized Netflow in intrusion detection include [
25,
26]. In [
25], Krishnamurthy et al. proposed a framework that predicts attacks using Netflow logs using machine learning. The purpose of their framework is to provide analysts with a readable explanation of the logic behind the classification. In [
26], Haghighat et al. applied deep learning using 92 Netflow extracted features. In their work, they utilized a dataset from 2011 called CTU 13 and reported an accuracy above 99%.
Further research in the domain is summarized below with the purpose of highlighting the methods adopted. It is difficult to compare against the accuracy and speed of models that were tested on different datasets using machines with different specifications.
In [
27], Abu Al-Haija and Al-Badawi implemented several ML models (ensemble, neural, and kernel) and assessed their ability to anomaly detect intrusions on IoT networks. The designed NIDS was tested on two datasets, namely distilled-Kitsune-2018 and Network Security Laboratory-Knowledge Discovery Databases (NSL-KDD). Each dataset consisted of around 150,000 network traffic records. The authors concluded that ensemble methods achieve the highest accuracy while neural network methods had the highest prediction speed. In an earlier work, Abu Al-Haija and Saleh achieved an accuracy of 98.2% in classifying IoT cyber-attacks using the Convolutional Neural Network model over the NSL-KDD dataset [
28].
In [
29], Verma and Ranga studied a system specific to detecting DoS attacks. The authors assessed the accuracy and performance of several models using CIDDS-001, UNSW-NB15, and NSL-KDD datasets. The study concluded that both Classification and Regression trees (CART) and XGBoost are the most practical classifiers in terms of performance and prediction speed.
In [
30], an analysis of NIDS classification performance on a benchmark data set, ISCX 2012, was performed using the SVM classification algorithm. In addition, two major methods for selecting features are examined: Recursive Feature Elimination (RFE) and Recursive Feature Addition (RFA). In their study, the authors observed that RFE performs better when it comes to independent features, while RFA works better when tackling interdependent features.
In [
31], the authors used several feature selection algorithms such as genetic algorithm, particle swarm optimization, firefly optimization, and gray wolf optimization to analyze the performance of NIDS. Their model was evaluated using support vector machines (SVMs) and machine learning classifiers. The UNSW-NB15 dataset is used as input for the experiment. The authors concluded that a system with fewer features would be more accurate.
6. Conclusions
Anomaly-based detection models are trained to determine normal and flag suspicious behaviors. Networks contain many features that can be used to train IDS. For example, NetFlow, a networking industry-standard protocol, provides a rich set of fields (over 70 in version 9) that can be used for that purpose [
39]. A recent study proposed a standard set of 43 NetFlow features that resulted in high accuracy [
14]. Netflow is a Cisco standard that provides a huge amount of network data that happened to be outstanding when it comes to malicious attack detection. The 2021 study included the creation of a massive NetFlow-based dataset (NF-ToN-IoT-v2) with over 16 million records. Our goal was different as we wanted to utilize this new dataset and further reduce the number of features while maintaining high accuracy results. A lower number of features means faster processing time, which is essential in deploying an anomaly-based detection system.
During our work, we examined several sets of features. We concluded that NetFlow features with importance values above the mean of all feature values are sufficient to predict malicious behavior. Indeed, 13 features were enough to perform binary classification with an F1-score of 1.00, and 17 features were adequate to achieve an F1-score of 0.98 in the case of multi-classification. These results are identical to those obtained using the 43 features [
14]. Furthermore, our approach reduced the prediction time by 38% and 40% in binary and multi-classification, respectively.
Our study tested the performance using DT, RF, XGB, and NB. The results achieved by DT and RF were the highest. In our future work, we plan to assess the suitability of this threshold on other NetFlow-based datasets.