Next Article in Journal
Topology Optimisation under Uncertainties with Neural Networks
Next Article in Special Issue
Adaptive IDS for Cooperative Intelligent Transportation Systems Using Deep Belief Networks
Previous Article in Journal
Research on Network Attack Traffic Detection HybridAlgorithm Based on UMAP-RF
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

IoT Multi-Vector Cyberattack Detection Based on Machine Learning Algorithms: Traffic Features Analysis, Experiments, and Efficiency

1
Computer Engineering and Information Systems Department, Khmelnytskyi National University, 29016 Khmelnytskyi, Ukraine
2
Department of Computer Systems, Networks and Cybersecurity, National Aerospace University “KhAI”, 61001 Kharkiv, Ukraine
*
Authors to whom correspondence should be addressed.
Algorithms 2022, 15(7), 239; https://doi.org/10.3390/a15070239
Submission received: 19 June 2022 / Revised: 7 July 2022 / Accepted: 9 July 2022 / Published: 12 July 2022

Abstract

:
Cybersecurity is a common Internet of Things security challenge. The lack of security in IoT devices has led to a great number of devices being compromised, with threats from both inside and outside the IoT infrastructure. Attacks on the IoT infrastructure result in device hacking, data theft, financial loss, instability, or even physical damage to devices. This requires the development of new approaches to ensure high-security levels in IoT infrastructure. To solve this problem, we propose a new approach for IoT cyberattack detection based on machine learning algorithms. The core of the method involves network traffic analyses that IoT devices generate during communication. The proposed approach deals with the set of network traffic features that may indicate the presence of cyberattacks in the IoT infrastructure and compromised IoT devices. Based on the obtained features for each IoT device, the feature vectors are formed. To conclude the possible attack presence, machine learning algorithms were employed. We assessed the complexity and time of machine learning algorithm implementation considering multi-vector cyberattacks on IoT infrastructure. Experiments were conducted to approve the method’s efficiency. The results demonstrated that the network traffic feature-based approach allows the detection of multi-vector cyberattacks with high efficiency.

1. Introduction

1.1. Motivation

The Internet of Things is a concept that aggregates many technologies and physical objects—devices that exchange data and interact over the internet, as well as big data that generate these devices. Internet of Things devices have various purposes and complexities, from wearable things or technology to intelligent devices in smart homes and critical infrastructure. The Internet of Things was designed to make many areas of human life more comfortable and safer. However, the Internet of Things not only brings increased comfort but also new challenges and problems related to cybersecurity [1,2].
Security issues surrounding the Internet of Things infrastructure are determined by the specific features of an environment. One possible feature involved in building an IoT infrastructure is an IoT system of groups of identical or similar technical characteristic devices. If a specified device has a vulnerability, such homogeneity multiplies its impact [3,4,5].
Important issues include security issues with protocols used in the internet infrastructure, the use of unsafe network services, such as Telnet and SSH, and vulnerabilities in routers and open ports. With the ability to monitor and collect data on the IoT, even specialized compromised IoT devices with limited resources can be used to leverage critical infrastructure systems, such as database servers. Vulnerability in the IoT device communication protocol can spread to other devices that use the vulnerable protocol in the IoT infrastructure [6].
Thus, vulnerabilities in the protocols used in the IoT network can have devastating effects on the entire IoT infrastructure. The criticalities of these effects depend on the environments in which the compromised IoT devices operate.
Moreover, in some cases, the deployment conditions of IoT devices make it difficult or impossible to reconfigure or upgrade IoT devices. Often, IoT devices cannot be upgraded due to the discontinuation of device support from the manufacturer. This leads to the possibility of new vulnerabilities and threats to the IoT device in the future, as the current security mechanisms of device deployment may be out of date. Technical support and management of IoT smart devices are important cybersecurity issues in the long run. Another specific problem surrounding IoT cybersecurity is the fact that the internal operation of a smart device and the data streams generated by the device may be unknown to the user. The situation is complicated by the constant availability of IoT devices on a network and the ignorance of users (i.e., concerning potential cybersecurity risks). It may lead to the use of dangerous settings on IoT devices (default), direct network connections of internet devices to the internet, the use of obsolete or unreliable devices, and weak passwords.
One important IoT cybersecurity risk is that the functionality of smart devices can be changed by the device manufacturer without the consent or knowledge of the user (by updating the device firmware). It creates a new vulnerability that can allow the smart device to partially change the functionality or perform undesirable actions on the user’s device, such as collecting sensitive user data without the user’s knowledge.
However, the risks are not limited to data confidentiality. Attacks on IoT infrastructure can not only target compromised devices to steal sensitive data or cause financial losses but also disrupt or damage IoT devices physically. Compromised IoT devices can even lead to the injuries or deaths of people who depend on these devices or work with them.
Thus, non-compliance with basic security requirements (for both manufacturers and the users of smart devices) is the main cause of IoT cybersecurity problems. Common causes of security breaches in IoT infrastructure due to manufacturers are vulnerabilities in the IoT device software, lack of support for automatic updates, lack of firmware updates, and dangerous update mechanisms. This situation is often caused by manufacturers attempting to launch new smart devices as soon as possible. Vulnerabilities in software and web applications can lead to the theft of sensitive information or the spread of malicious firmware updates. Another common problem is unsafe authentication methods provided by the device manufacturers. The above weaknesses of the current IoT state of affairs, as well as the heterogeneity of the IoT environment, make IoT devices more vulnerable than computers and servers on conventional networks. Vulnerable components of IoT can be IoT devices, device software, and communication channels of the IoT infrastructure. The main threats in IoT infrastructure are distributed denial of service (DDoS), disclosure of confidential information, falsification, spoofing, and elevation of privilege. These threats are commonly used by cybercriminals as entry points, followed by other criminal activities: infecting devices with malicious software, stealing sensitive data, or blocking network connections.
Mentioned factors contribute to the high probability of compromising IoT devices, the spread of malicious software, and various multi-vector cyberattacks on IoT infrastructure (MVIA). At the same time, compromised IoT devices can be used as sources of attacks both inside and outside the IoT infrastructure.
The next subsection presents a brief analysis of the modern ideas and methods addressed to solve the problem of IoT malware detection by analyzing the advantages and disadvantages.

1.2. Objectives and Contribution

The main objectives of the work were to study the possibility of a multi-vector cyberattack detection in the IoT infrastructure based on a flow analysis and a deeper traffic analysis that takes into account IoT protocol features. This research aims to improve detection efficiency via various machine learning algorithm usages. The proposed approach deals with the set of network traffic features that may indicate the presence of cyberattacks in the IoT infrastructure and compromised IoT devices.
Thus, the novelty of this work involves the approach used for IoT multi-vector cyberattack detection, which involves a flow-based features analysis. It enables decreased detection time and is scalable. On the other hand, if the flow-based feature analysis was unable to conclude the attack presence, a deep analysis of network traffic with the use of MQTT-based, DNS-based, and HTTP-based features analysis was employed.
This paper is organized as follows. Section 2 presents the state-of-the-art. Section 3 describes the machine learning algorithms for cyberattack detection. Section 4 discusses the stages of the proposed IoT multi-vector cyberattack detection technique based on machine learning algorithms with the traffic features analysis. Section 5 proposes the experiments and the efficiency of the proposed approach. Finally, we present our conclusions and future research.

2. The State-of-the-Art

The scientific community is focusing on the increasing problems concerning cybersecurity today. Solutions devoted to cyberattack detection against Internet of Things infrastructure are widely presented [7,8]. Quite possibly, the most encouraging approaches for IoT cyberattack detection are based on machine learning algorithms (MLA) [9,10,11,12,13].
To solve the cyberattack detection problem, the authors of [14] proposed an approach that executes the IoT malware traffic analysis. It is based on the usage of multilevel artificial intelligence and involves neural networks and binary visualization. In addition, the approach proposes efficiency improvement via learning from the misclassification approach, which includes three main stages, is designed to collect the network traffic, perform the binary visualization to store the collected network traffic in ASCII, convert it to 2D images, and process/analyze the obtained binary image. An analysis of the binary images is executed using the TensorFlow tool, an end-to-end open-source platform designed to use machine learning for different problem solutions. It can find and classify patterns automatically. The main advantage of the tool is the ability to organize the system retraining as well as the possibility to make the image recognition. The approach proposes the use of the algorithm to perform the visualization of the collected traffic characteristics as an image (in the form of tiles using the Binvis tool). The TensorFlow machine tool can make predictions. The use of graphic tiles allows the determination of the tile combination on which the image is based. It is able to detect needed objects regardless of the location within the obtained image. The provided method can perform the IoT device protection on the gateway level, bypassing the IoT environment constraints.
The authors of [15] presented a survey on the experimental studies with a detailed analysis of a set of machine learning algorithms. The article included comparative data concerning the algorithm detection efficiency of anomalous behavior in IoT networks. Experimental results have shown that the best efficiency concerning used datasets is produced by the random forest algorithm. Nevertheless, all investigated machine learning algorithms demonstrated to be very close to random forest algorithm and detection efficiency results; sometimes the choice of an appropriate algorithm depends on the nature of the analyzed data.
Article [16] is devoted to machine learning classifiers involved in the botnet traffic analysis in the IoT environment. Nine IoT devices were employed for dataset construction, consisting of several botnet attack types. To evaluate the efficiency of the proposed approach, true positive, true negative, false positive, false negative, F1-score accuracy, precision, and recall were used. The experimental results of the research demonstrated that the random forest algorithm produced the best results while the support vector machine produced the lowest results. The main disadvantage of the approach is the strong need for data analysis of all features in processed datasets.
The IoT cyberattack detection approach for the IoT network is presented in [17]. It is based on the use of intelligent technologies. The produced intelligent system operates with a set of network features. The approach aims to reduce the feature number via its ranking with the usage of the correlation coefficient, random forest algorithm, and the gain ratio. The base for the experimental research involves three feature sets, where using the proposed algorithm is to be combined to obtain an optimized feature set. The means of data processing the authors used were K-nearest neighbor, random forest, and XGBoost machine learning algorithms. All experiments were based on the usage of NSL-KDD, BoT-IoT, and DS2OS datasets. The investigation of the detection efficiency of the proposed system was executed. For this purpose, the metrics of accuracy, detection rate, F1-score, and precision were evaluated.
An approach for IoT attack detection based on the usage of cloud technologies and software-defined networks (SDNs) is presented in [18]. It employs a decentralized two-layer SDN and is able to perform attack mitigation in the wireless IoT infrastructure. To execute the network traffic control for each subnet domain, the predefined local domain controller of the specified domain was employed. The core of the approach is a special controller connected to a local controller and it is placed in the cloud environment. The approach also involves some special local controllers to perform the traffic collection from the investigated domains to perform the feature extraction, and, as a result, to find out the facts of the DDoS attack presence in the domain. The attack detection process is based on the analysis of 155 features, collected via the SPAN function of the Cisco switch. The obtained feature values were evaluated by detection modules placed within all defined local controllers to detect DDoS attacks. The approach used an extreme learning machine (ELM) as a decision-maker for attack detection. The feed-forward neural network with semi-supervised learning was used. The main advantage of ELM implementation is the training time reduction as it performs the random selection of the initial parameters. As a result, usage of ELM decreases the detection time. An attack mitigation module is also presented on each local controller. There is the possibility to organize the data exchange between each local controller, as well as with the universal controller. The proposed attack mitigation technique involves a set of attack mitigation scenarios able to perform in the wireless internet environment for different fixed devices.
The authors of [19] propose an intrusion detection system for IoT infrastructures. It is based on deep learning (DL-IDS). The approach for the IoT infrastructure intrusion detection involves the network traffic analysis; the data normalization procedure (to avoid the uncertainties in the obtained dataset); the data similarity evaluation on the usage of the Minkowski distance (to take into account the missing values, to eliminate possible redundancy, and to remove from the dataset the redundant and duplicate data); the replacement of the missing feature values in the obtained dataset (taking into account the evaluated values of the nearest neighbor on the basis of the K-nearest neighbor in the Euclidean distance to produce the average values for proceed data (to not take into account the classification results based on the data obtained from the more frequent entries); the traffic feature selection procedure on the basis of the spider monkey optimization algorithm usage (the set of features that are able to indicate the intrusion into the IoT infrastructure); and the exact intrusion detection procedure based on the stacked-deep polynomial network for the incoming data classification to mark it as normal or abnormal. The proposed approach is able to detect intrusions concerning the IoT environment (a remote-to-local attack, a DDoS attack, a probing attack, a user-to-root attack, etc.).
The study [20] provides research devoted to the usage of machine learning algorithms for anomaly detection in the Internet of Things infrastructures. To do this, the authors investigated the effectiveness and the main aspects of the usage of several single algorithms or their combinations for detection. The efficiency of the anomaly detection involved performance metrics, such as false positives, false negatives, specificity, sensitivity, and overall accuracy. The experimental part of the study is based on the Nemenya and Friedman tests that made it possible to perform a statistical analysis of the classifiers’ differences. Another aspect of the research was the evaluation of the classifiers’ response time. For this purpose, specific IoT infrastructure (as part of the implemented IDS) was employed. As a result of the conducted experiments, the authors of the study concluded that the most acceptable classification accuracy and the time of response were provided by the classification trees, regression trees, and extreme gradient boosting.
An approach for cyberattack detection as an AD-IoT system is presented in [21]. The proposed system is designed for the smart city infrastructure and is based on the random forest machine learning algorithm. The system aims to detect the compromised IoT devices that are placed in the distributed fog nodes. The division of normal and malicious behaviors of IoT devices is executed on the basis of monitoring and analyzing the fog nodes’ network traffic. Such analysis is performed to verify whether the fog level attacks are detected and to inform the cloud security services concerning the evaluated results. The presented approach demonstrates sufficient detection efficiency and applies to the smart city infrastructure.
An approach for DDoS attack detection is presented in [22]. It is based on the hybrid optimization algorithms of Metaheuristic lion and Firefly. It was designed to perform data collecting, data preprocessing for noise removing, and filling missing data. The feature extraction was performed by employing recursive feature elimination (RFE). An important item of the proposed technique is the possibility of detecting low-rate attacks using the hybrid ML-F optimization algorithm. For the attack classification, a random forest classifier was used.
The article [23] introduces an IDS, which is based on the technique that uses an ensemble-based voting classifier. This approach uses multiple classifiers as a base learner. The final prediction is formed via producing the classifier’s vote for the traditional classifier predictions. As the mean of the efficiency evaluation of the presented approach, a set of IoT devices with the usage of different sensors (garage door, light motion, GPS sensor, fridge sensor, thermostat, modbus, and weather) were employed. Multi-class attacks, such as XSS, Ransomeware, scanning injection, DDoS, and backdoor, were involved in the technique efficiency verification. The efficiency of the presented method was compared with the set of new intrusion detection approaches provided by scientists. The comparison was constructed on the basis of the accuracy, precision, recall, and F-score metrics. Furthermore, a set of machine learning algorithms, such as decision tree, naive Bayes, random forest, and K-nearest neighbors were involved in the comparison procedure. The experimental results demonstrated that the proposed approach has a high detection efficiency.
The authors of [24] propose a detection method for DoS/DDoS attacks against the IoT using machine learning. The approach aims to detect and apply the mitigation scenarios in the situation of DoS/DDoS attacks. To do this, the approach employs a multiclass classifier (“Looking back”). In addition, the ability of the technique to detect “malicious” packets makes it possible to apply mitigation measures against attacks that employ specific packet types.
The approach in [25] provides a botnet detection system for IoT devices. It is based on the algorithm named local–global best bat, which is used for neural networks and is able to process the botnet’s feature sets to distinguish malicious and benign network traffic. As an experimental part of the study, the botnets Mirai and Gafgyt were used to infect several commercial IoT devices. In addition, to classify 10 botnet classes, the proposed algorithm was used. It was designed to tune the neural network hyperparameters and optimize the weight. The authors made the efficiency comparison of the provided algorithm with other approaches. The experimental results demonstrated that the proposed botnet detection approach accuracy was up to 90%, while BA-NN was 85.5%, and PSO-NN was 85.2%.
The authors of [26] proposed a taxonomy of intrusions detection systems that utilizes the data objects as the dimensions to summarize and classify machine learning- and deep learning-based IDS. The survey clarifies the concept of IDSs. Moreover, machine learning-based algorithms, metrics, and benchmark datasets frequently used in IDSs were introduced. IDSs applied to various data sources, i.e., logs, sessions, packets, and flow, were analyzed. The proposed taxonomic system was presented as a baseline and key IDS issues with using machine learning and deep learning algorithms. Moreover, future developments and challenges of IDS were discussed.
The authors of [27] introduced a probabilistic-driven ensemble (PDE)-based approach. This approach operates with several classification algorithms, wherein the effectiveness of these algorithms has been improved by applying a probabilistic criterion. Thus, the proposed approach allows maximizing the possibility of detecting intrusion events, regardless of the operational scenario, using several evaluation models. This makes it possible to distinguish ordinary events from related events to all classes of attacks. Experiments performed by using real-world data show that the proposed ensemble approach has better capability in detecting intrusion events (concerning known solutions).
The authors of [28] presented machine learning-based IDS. The feature reduction approach has two components: (1) Auto-encoder as a deep learning instance for dimensionality reduction; and (2) principal component analysis. The resulting set of low-dimensional features from both approaches was used to build different classifiers, i.e., Bayesian network, random forest, linear discriminant analysis, and quadratic discriminant analysis for designing IDS. The obtained experimental findings show better performance in terms of detection rate, false alarm rate, accuracy, and F-measure for binary and multi-class classification. This approach is able to reduce the feature dimensions of the CICIDS2017 dataset from 81 to 10, with high accuracy in both multi-class and binary classifications.
The objective of [29] was to apply various approaches for handling imbalanced datasets to design an effective IDS from the CIDDS-001 dataset. The effectiveness of sampling methods based on CIDDS-001 was studied and experimentally evaluated via random forest, deep neural networks, variational autoencoder, voting, and stacking machine learning classifiers. The developed system makes it possible to detect attacks with high accuracy when processing an unbalanced distribution of classes using a smaller number of samples. It makes it possible to apply the proposed system to data classification problems if it is necessary to merge data in real-time.
In [30], the authors were devoted to solving cybersecurity problems, such as the difficulty in distinguishing illegitimate activities from legitimate ones due to their high degrees of heterogeneity and similar characteristics. To solve this problem, a local feature engineering approach was proposed. This approach is based on the adoption of a data preprocessing strategy that allows reducing the number of network event patterns, increasing their characterization. The main distinguishing feature of the approach is that it operates locally in the feature space of each single network event, allowing to introduce new features and discretizing their values. The experimental results showed that the proposed approach improves the performance of known solutions.
The results of the machine learning algorithm efficiency analysis for detecting cyberattacks in the Internet of Things infrastructure are presented in Table 1.
The analysis of related works allows concluding that most studies had good detection accuracy; nevertheless, the main disadvantage of the investigated works is that they do not cover most features that may indicate the attack presence.
The analysis shows that the known approaches for detecting IoT cyberattacks demonstrate high-efficiency levels. Nevertheless, there are limitations—the inability to detect and respond to unknown attacks (zero-day attacks), the low efficiency of detection of multi-vector attacks; a high level of false positives, a significant response time that is unacceptable for real-time systems, and the need for significant amounts of computing resources. Another important aspect is the need to select a minimum and sufficient set of informative network traffic features that are able to indicate the presence of cyberattacks in the IoT infrastructure.
To summarize, there is a strong need to evolve new methods for cyberattack detection in the IoT infrastructure. To do this, we are to eliminate technique drawbacks and increase the detection efficiency of detecting known and unknown cyberattacks in the IoT infrastructure.

3. Machine Learning Algorithms for Cyberattack Detection

The current study has involved five MLAs for IoT multi-vector cyberattack detections, as they were mostly used in (recent) research for efficient object classification [15,16,17,20,22,30]; we relied on our own experience in MLA use for cyberattack detection [11]:
  • Decision tree (DT) [31,32];
  • Random forest (RF) [33,34,35,36,37,38];
  • K-Nearest Neighbor (KNN) [39];
  • Extreme Gradient Boosting (XGBoost) [40];
  • Support Vector Machine (SVM) [41,42,43].

4. IoT Multi-Vector Cyberattack Detection Based on Machine Learning Algorithms

4.1. Detection Steps

The approach for IoT cyberattack detection includes the following steps (Figure 1):
  • Traffic obtaining;
  • Grouping packets by type, source device, and time. Packets from each device are grouped by type and by N records, according to the last connection time;
  • Feature extraction;
  • Feature classification based on the machine learning algorithm;
  • Result producing.

4.2. Features Description

An important task is to speed up the detection of attack traffic. Early detection of attack traffic provides an opportunity to increase the security of the Internet of Things infrastructure, as it prevents the further spread of malicious software compromising not yet infected devices in the IoT infrastructure. Therefore, to speed up the detection of cyberattacks in the infrastructure, four types of features are involved:
  • Flow-based features;
  • MQTT-based features;
  • DNS-based features;
  • HTTP-based features.
Using only flow-based features (Table 2) makes it possible to speed up the detection of attacks on the network by faster extraction of features from streams and their analyses. In the case of suspicious traffic behavior that cannot be unambiguously classified as an attack, an in-depth traffic analysis is applied with the MQTT-based (Table 3), DNS-based (Table 4), and HTTP-based (Table 5) feature extractions.
This section presents the involvement of four feature types for multi-vector cyberattack detection in the IoT infrastructure. The features based on flow analysis enable the possibility of speeding up attack detections through faster analyses and make the detection algorithm scalable, allowing us to analyze high-bandwidth IoT traffic. On the other hand, the features based on deep packet analyses enable us to improve the accuracy of detection in cases where the use of a sign based on flow analysis does not provide an unambiguous answer about the presence of a cyberattack (and also allows detecting the multi-vector attacks).

5. Experiments

5.1. Evaluation Setting

To conduct the experiments, a Wi-Fi network of IoT devices was created. A Raspberry Pi 3 was configured as a middlebox, which acted as a Wi-Fi access point. To simulate DoS attacks as a source of malicious traffic, a computer system with a virtual Kali Linux was used. As a victim of DoS attacks, Raspberry Pi 2 with an installed Apache web server was used. All devices were connected to create a Wi-Fi network access point.
Three IoT devices (router, thermostat, camcorder) were also connected to the Wi-Fi network. To obtain normal traffic, a simulation of user interactions with the devices of the created IoT network was performed. To do this, actions such as transmitting video from the camera and installing software updates on connected IoT devices were performed. To obtain malicious traffic, a simulation of performing the most common classes of DoS attacks was executed.
An HTTP GET flood attack was simulated with the Goldeneye tool [44]; TCP SYN and UDP flood were simulated with Kali Linux hping3 utility [45]. The iodine utility was used to perform DNS tunneling attacks [46].
Malicious/benign traffic was collected at the Wi-Fi access point. The IoT traffic collection was executed via the Zeek tool [47]. It gives capacities to the network intrusion detection systems (IDS) and empowers security operation centers (SOC). The Zeek tool was used as a network traffic analyzer with an in-built classification engine.
In the collected DoS traffic samples, the source IP addresses and MAC addresses were substituted for the IP addresses and MAC addresses of the devices of the created IoT network. The time of sending malicious packets was modified so that the total collected IoT traffic replicated the activity of the attacking and normal activity devices.
Thus, the execution of DoS attacks of different types by each IoT device was simulated.

5.2. Dataset Description

To hold the experiments, the traffic generated by Mirai, Gafgyt, Dark Nexus botnets, UCI Machine Learning Repository, DS2OS, Bot-IoT, N-BaIoT, CIDDS, UNSW-NB15, and NSL-KDD traffic datasets [48,49,50,51,52,53,54] were used.
The DS2OS dataset contains traces gathered from the application layer of the IoT environment from devices such as movement sensors, light controllers, thermometers, batteries, thermostats, smart doors, etc. This dataset can be used to assess anomaly-based attack detection algorithms.
The UNSW-NB15 dataset contains data on nine types of attacks, such as Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, and Worms. A total of 49 features were extracted to describe these types of attacks.
The N_BaIoT dataset offers real-world IoT traffic data collected from nine IoT devices infected by Mirai and BASHLITE. Malicious data are divided into 10 attacks as well as harmless data (with 115 different features).
The Kitsune Network Attack Dataset contains nine network capture datasets in total that relate to different types of attack traffic against the IoT Infrastructure.
The BoT-IoT dataset was created by deploying a realistic IoT infrastructure network environment and it includes legitimate IoT network traffic as well as various types of attacks. The BoT-IoT includes DDoS and DoS for different protocols, OS scan, service scan, data exfiltration, and keylogging attacks.
The CIDDS and NSL-KDD datasets are built on network intrusion data describing “bad” connections, which are called intrusions (or attacks) and “good” connections (legitimate connections). These databases describe a wide range of intrusions and take into account user behavior scenarios.
Furthermore, experiments dealt with the set of traffic features presented in the above-mentioned datasets for three IoT devices: router, thermostat, and camcorder that were infected by Mirai, Gafgyt, and Dark Nexus botnets. The set of traffic features corresponds to four types of attacks (TCP, UDP, HTTP GET, and DNS tunneling).
As each dataset contains different samples and features, the preprocessing and feature selection processes were executed via each file type analysis and their parsing into the needed presentation for the next preprocessing. Such files as .csv, .pcap, Argus files, Zeek files, and .txt were processed.
Mirai is well-known malware that is able to infect an IoT device and turn such a smart device into a remotely-controlled network of bots—a botnet. The main negative impact of Mirai is the ability to launch massive DDoS attacks, as well as the ability to scan the internet for IoT smart devices based on the ARC processor. Such vulnerability as the usage of a stripped-down Linux version makes it possible to perform the logging into the device and execute malicious actions. In addition, the Mirai botnet uses a great amount of hijacked IoT devices to increase its spread and it is very dangerous for its mutating [55].
Gafgyt is a botnet that uses the vulnerabilities of IoT devices. It employs infected devices for large-scale (DDoS) attack execution. Moreover, Gafgyt uses known vulnerabilities (e.g., CVE-2017-17215, CVE-2018-10561) to implement the downloading of the next-stage payloads to compromised devices. New versions of the Gafgyt botnet include Mirai-based components to perform DDoS attacks; HTTP flooding to send a great number of HTTP requests to server targets to overwhelm them; UDP flooding to send special UDP packets to server victims to exhaust them; TCP flood attacks; STD attacks to send a random string to a specified IP address [56].
Dark Nexus is an IoT botnet that launches DDoS attacks. It was designed to launch credential stuffing attacks against different kinds of IoT devices (video recorders; DLink, Dasan Zhone, ASUS routers, thermal cameras, etc.) [57].

5.3. Training and Testing

The proposed approach involves five ML algorithms (decision tree, random forest, K-nearest neighbor, extreme gradient boosting, and support vector machine) to compare their detection possibilities. All algorithms were trained and tested using the dataset with training and testing percentages of 75% and 25%.
The BotGRABBER framework uses the scikit-learn library–an open-source platform for MLA in Python [58]. The configuration of each used MLA relies on the appropriate set of algorithm parameters. The optimal used values of algorithm parameters are presented in Table 6, Table 7, Table 8, Table 9 and Table 10 [59,60,61,62,63].

5.4. Implementation Platform

To perform the feature extraction, the feature classification based on the machine learning algorithm, as well as the result of production, the BotGRABBER framework was employed. It is a multi-vector protection system that can perform network and host activity analyses. The BotGRABBER framework presents the tool, not only for botnet detection but also to produce the needed security scenario of the network reconfiguration according to the type of cyberattack performed by the detected botnet [11,13,43]. The mentioned tool includes several units aimed at traffic collection, packet processing, feature extraction, feature classification based on machine learning algorithms, and producing results. The feature classification unit of the framework is based on the scikit-learn library usage. It is a free software ML library for the Python programming language [58].

5.5. Results

Experimental results are presented in Table 11, Table 12, Table 13, Table 14, Table 15, Table 16, Table 17, Table 18 and Table 19.
As examples, comparisons of the different MLA efficiencies for Router/Mirai botnet detection (TCP attack, UDP attack, HTTP GET attack, and DNS tunneling) are presented in Figure 2, Figure 3 and Figure 4.
In this study, the highest level of detection was shown by the random forest algorithm. However, the type of IoT device that was the source of the attack traffic did not affect the level of attack detection in any way.
The combination of proposed features based on flow analysis and a deeper traffic analysis that took into account the IoT protocol features provided good detection levels of the multi-vector attacks on the IoT infrastructure performed by different types of botnets.

6. Conclusions and Future Work

A flow-based traffic analysis allows detecting malicious behavior without the need for an in-depth packet analysis. Meanwhile, a packet content analysis provides an opportunity to decide whether the intercepted traffic belongs to the attack traffic or normal traffic in cases where the flow-based analysis does not give an unambiguous result. Attempting to cover features (as many as possible) that indicate the presence of attacks in the Internet of Things infrastructure has its weaknesses. Such an approach requires some time to analyze in-depth, and it is poorly scalable.
The main experiment results concerning MLA involvement showed that SVM demonstrated the worst results, while the RF algorithm demonstrated the best results.
In addition, the involvement of different IoT multi-vector cyberattack features based on flow analysis and features based on the most commonly used IoT protocols caused the detection of TCP, UDP, HTTP GET, and DNS tunneling attacks approximately at the same level.
In this paper, we reviewed the known approaches to detect attacks on the Internet of Things infrastructure based on machine learning and investigated their effectiveness. We investigated the possibility of detecting traffic attacks on the Internet of Things infrastructure based on flow analysis and the most commonly used IoT protocols, such as HTTP, MQTT, and DNS.
Traffic from well-known botnets, such as Mirai, Dark Nexus, and Gafgyt was taken from well-known databases that represent common attacks on the Internet of Things infrastructures, such as TCP, UDP, HTTP GET, and DNS tunneling, used as malicious traffic.
In addition, attack traffic was generated using known utilities, and benign IoT traffic was collected from devices such as a router, a thermostat, and a camcorder.
The features presented in the work were classified using various methods of machine learning and were removed from the received traffic.
The levels of detection of the multi-vector attacks on the Internet of Things infrastructure largely depend on the involved objects of training and test samplings/settings of machine learning algorithms. This important aspect is the subject of further research.
Therefore, future work will focus on the following issues:
  • Different Internet of Things protocols [64] to remove signs of traffic, which will improve the accuracy of attack detection in the lack of flow-based analysis cases;
  • Efficient ways to reduce the number of traffic features sufficient to detect attacks;
  • Development of ML-based methods for dependability assurance of IoT systems by combining attacks and intrusion detection, redundancy, and recovery procedures [65].

Author Contributions

Data curation K.B. and V.K.; formal analysis S.L.; investigation K.B. and O.S.; methodology K.B. and S.L.; project administration V.K.; Software K.B.; supervision V.K.; validation K.B. and O.S.; visualization K.B. and S.L.; writing—original draft K.B. and S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The dataset used for this study is publicly available at [43,44,45,46,47,48,49].

Acknowledgments

This work was supported by the ECHO project, which has received funding from the European Union’s Horizon 2020 research and innovation program under the grant agreement no 830943. The authors appreciate the scientific society of the consortium for creative analysis and discussion during the preparation of this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Nozomi Networks Labs. New OT/IoT Security Report: Trends and Countermeasures for Critical Infrastructure Attacks. Available online: https://www.nozominetworks.com/blog/new-ot-iot-security-report-trends-and-countermeasures-for-critical-infrastructure-attacks/ (accessed on 3 February 2022).
  2. Global Cyber Alliance. GCA Internet Integrity Papers: IoT Policy and Attack Report. Available online: https://www.globalcyberalliance.org/wp-content/uploads/IoT-Policy-and-Attack-Report_FINAL.pdf (accessed on 5 December 2021).
  3. Shaaban, A.M.; Chlup, S.; El-Araby, N.; Schmittner, C. Towards Optimized Security Attributes for IoT Devices in Smart Agriculture Based on the IEC 62443 Security Standard. Appl. Sci. 2022, 12, 5653. [Google Scholar] [CrossRef]
  4. Seo, S.; Kim, D. IoDM: A Study on a IoT-Based Organizational Deception Modeling with Adaptive General-Sum Game Competition. Electronics 2022, 11, 1623. [Google Scholar] [CrossRef]
  5. Makarichev, V.; Lukin, V.; Illiashenko, O.; Kharchenko, V. Digital Image Representation by Atomic Functions: The Compression and Protection of Data for Edge Computing in IoT Systems. Sensors 2022, 22, 3751. [Google Scholar] [CrossRef]
  6. Bliss, D.; Garbos, R.; Kane, P.; Kharchenko, V.; Kochanski, T.; Rucinski, A. Homo Digitus: Its Dependable and Resilient Smart Ecosystem. Smart Cities 2021, 4, 514–531. [Google Scholar] [CrossRef]
  7. Deorankar, A.V.; Thakare, S.S. Survey on Anomaly Detection of (IoT)- Internet of Things Cyberattacks Using Machine Learning. In Proceedings of the 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 11–13 March 2020; pp. 115–117. [Google Scholar] [CrossRef]
  8. Hristov, A.; Trifonov, R.A. Model for Identification of Compromised Devices as a Result of Cyberattack on IoT Devices. In Proceedings of the 2021 International Conference on Information Technologies (InfoTech), Varna, Bulgaria, 16–17 September 2021; pp. 1–4. [Google Scholar] [CrossRef]
  9. Lysenko, S.; Bobrovnikova, K.; Shchuka, R.; Savenko, O. A Cyberattacks Detection Technique Based on Evolutionary Algorithms. In Proceedings of the 2020 IEEE 11th International Conference on Dependable Systems, Services and Technologies (DESSERT), Kyiv, Ukraine, 14–18 May 2020; pp. 127–132. [Google Scholar]
  10. Lysenko, S.; Pomorova, O.; Savenko, O.; Kryshchuk, A.; Bobrovnikova, K. DNS-based Anti-evasion Technique for Botnets Detection. In Proceedings of the 8th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, Warsaw, Poland, 24–26 September 2015; pp. 453–458. [Google Scholar]
  11. Savenko, B.; Lysenko, S.; Bobrovnikova, K.; Savenko, O.; Markowsky, G. Detection DNS Tunneling Botnets. In Proceedings of the 2021 IEEE 11th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications, Cracow, Poland, 22–25 September 2021; Volume 1, pp. 64–69. [Google Scholar]
  12. Lysenko, S.; Savenko, O.; Bobrovnikova, K. DDoS Botnet Detection Technique Based on the Use of the Semi-Supervised Fuzzy c-Means Clustering. CEUR-WS 2018, 2104, 688–695. [Google Scholar]
  13. Lysenko, S.; Bobrovnikova, K.; Matiukh, S.; Hurman, I.; Savenko, O. Detection of the botnets’ low-rate DDoS attacks based on self-similarity. Int. J. Electr. Comput. Eng. 2020, 10, 3651–3659. [Google Scholar] [CrossRef]
  14. Shire, R.; Shiaeles, S.; Bendiab, K.; Ghita, B.; Kolokotronis, N. Malware Squid: A Novel IoT Malware Traffic Analysis Framework Using Convolutional Neural Network and Binary Visualisation. In Ininternet of Things, Smart Spaces, and Next Generation Networks and Systems; Springer: Cham, Switzerland, 2019; pp. 65–76. [Google Scholar]
  15. Elmrabit, N.; Zhou, F.; Li, F.; Zhou, H. Evaluation of machine learning algorithms for anomaly detection. In Proceedings of the 2020 International Conference on Cyber Security and Protection of Digital Services (Cyber Security), Dublin, Ireland, 15–19 June 2020; pp. 1–8. [Google Scholar]
  16. Bagui, S.; Wang, X.; Bagui, S. Machine Learning Based Intrusion Detection for IoT Botnet. Int. J. Mach. Learn. Comput. 2021, 11, 399–406. [Google Scholar] [CrossRef]
  17. Kumar, P.; Gupta, G.P.; Tripathi, R. Toward design of an intelligent cyberattack detection system using hybrid feature reduced approach for IoT networks. Arab. J. Sci. Eng. 2021, 46, 3749–3778. [Google Scholar] [CrossRef]
  18. Ravi, N.; Shalinie, S.M. Learning-driven detection and mitigation of DDoS attack in IoT via SDN-cloud architecture. IEEE Internet Things J. 2020, 7, 3559–3570. [Google Scholar] [CrossRef]
  19. Otoum, Y.; Liu, D.; Nayak, A. DL-IDS: A deep learning-based intrusion detection framework for securing IoT. Trans. Emerg. Telecommun. Technol. 2019, 33, e3803. [Google Scholar] [CrossRef]
  20. Verma, A.; Ranga, V. Machine learning based intrusion detection systems for IoT applications. Wirel. Pers. Commun. 2020, 111, 2287–2310. [Google Scholar] [CrossRef]
  21. Alrashdi, I.; Alqazzaz, A.; Aloufi, E.; Alharthi, R.; Zohdy, M.; Ming, H. Ad-IoT: Anomaly Detection of IoT Cyberattacks in smart City Using Machine Learning. In Proceedings of the 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 7–9 January 2019; pp. 305–310. [Google Scholar]
  22. Krishna, E.S.; Thangavelu, A. Attack detection in IoT devices using hybrid metaheuristic lion optimization algorithm and firefly optimization algorithm. Int. J. Syst. Assur. Eng. Manag. 2021, 1–14. [Google Scholar] [CrossRef]
  23. Mihoub, A.; Fredj, O.B.; Cheikhrouhou, O.; Derhab, A.; Krichen, M. Denial of service attack detection and mitigation for internet of things using looking-back-enabled machine learning techniques. Comput. Electr. Eng. 2022, 98, 107716. [Google Scholar] [CrossRef]
  24. Khan, M.A.; Khan Khattk, M.A.; Latif, S.; Shah, A.A.; Ur Rehman, M.; Boulila, W.; Ahmad, J. Voting classifier-based intrusion detection for IoT networks. In Advances on Smart and Soft Computing; Springer: Singapore, 2022; pp. 313–328. [Google Scholar]
  25. Alharbi, A.; Alosaimi, W.; Alyami, H.; Rauf, H.T.; Damaševičius, R. Botnet attack detection using local global best bat algorithm for industrial internet of things. Electronics 2021, 10, 1341. [Google Scholar] [CrossRef]
  26. Liu, H.; Lang, B. Machine learning and deep learning methods for intrusion detection systems: A survey. Appl. Sci. 2019, 9, 4396. [Google Scholar] [CrossRef] [Green Version]
  27. Saia, R.; Carta, S.; Recupero, D.R. A Probabilistic-driven Ensemble Approach to Perform Event Classification in Intrusion Detection System. In Proceedings of the 10th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, Seville, Spain, 18–20 September 2018; pp. 141–148. [Google Scholar]
  28. Abdulhammed, R.; Musafer, H.; Alessa, A.; Faezipour, M.; Abuzneid, A. Features dimensionality reduction approaches for machine learning based network intrusion detection. Electronics 2019, 8, 322. [Google Scholar] [CrossRef] [Green Version]
  29. Abdulhammed, R.; Faezipour, M.; Abuzneid, A.; AbuMallouh, A. Deep and machine learning approaches for anomaly-based intrusion detection of imbalanced network traffic. IEEE Sens. Lett. 2018, 3, 1–4. [Google Scholar] [CrossRef]
  30. Carta, S.; Podda, A.S.; Recupero, D.R.; Saia, R. A local feature engineering strategy to improve network anomaly detection. Future Internet 2020, 12, 177. [Google Scholar] [CrossRef]
  31. Rokach, L.; Maimon, O. Data Mining with Decision Trees: Theory and Applications; World Scientific: Singapore, 2014; p. 81. [Google Scholar]
  32. Flow of Decision Tree Algorithm. Available online: https://www.analyticsvidhya.com/blog/2022/04/complete-flow-of-decision-tree-algorithm/ (accessed on 10 December 2021).
  33. Kotu, V.; Deshpande, B. Data Science: Concepts and Practice; Morgan Kaufmann: San Francisco, CA, USA, 2019; pp. 65–163. [Google Scholar]
  34. Polamuri, S. How the Random Forest Algorithm Works in Machine Learning. Available online: https://dataaspirant.com/2017/05/22/random-forest-algorithm-machine-learing (accessed on 10 December 2021).
  35. Biau, G.; Scornet, E.A. Random Forest Guided Tour. Test 2016, 25, 197–227. [Google Scholar] [CrossRef] [Green Version]
  36. Scornet, E.; Biau, G.; Vert, J.-P. Consistency of random forests. Ann. Statist. 2015, 43, 1716–1741. [Google Scholar] [CrossRef]
  37. Athey, S.; Tibshirani, J.; Wager, S. Generalized random forests. Ann. Statist. 2019, 47, 1148–1178. [Google Scholar] [CrossRef] [Green Version]
  38. Ronaghan, S. The Mathematics of Decision Trees, Random Forest and Feature Importance in Scikit-Learn and Spark. Available online: https://towardsdatascience.com/the-mathematics-of-decision-trees-random-forest-and-feature-importance-in-scikit-learn-and-spark-f2861df67e3 (accessed on 10 December 2021).
  39. Campos, G.O.; Zimek, A.; Sander, J.; Campello, R.J.; Micenková, B.; Schubert, E.; Assent, I.; Houle, M.E. On the evaluation of unsupervised outlier detection: Measures, datasets, and an empirical study. Data Min. Knowl. Discov. 2016, 30, 891–927. [Google Scholar] [CrossRef]
  40. Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y.; Cho, H.; Chen, K. Xgboost: Extreme gradient boosting. R Package Version 0.4-2 2015, 1, 1–4. [Google Scholar]
  41. Weston, J.; Mukherjee, S.; Chapelle, O.; Pontil, M.; Poggio, T.; Vapnik, V. Feature selection for SVMs. Advances in neural information processing systems 2001, 13, 668–674. [Google Scholar]
  42. Chapelle, O.; Vapnik, V.; Bousquet, O.; Mukherjee, S. Choosing multiple parameters for support vector machines. Mach. Learn. 2002, 46, 131–159. [Google Scholar] [CrossRef]
  43. Lysenko, S.; Bobrovnikova, K.; Savenko, O.; Kryshchuk, A. BotGRABBER: SVM-Based Self-Adaptive System for the Network Resilience Against the Botnets’ Cyberattacks. In International Conference on Computer Networks; Springer: Cham, Switzerland, 2019; pp. 127–143. [Google Scholar]
  44. GoldenEye Is a HTTP DoS Test Tool. Available online: https://www.kali.org/tools/goldeneye/ (accessed on 11 December 2021).
  45. hping3 Network Tool. Available online: https://github.com/antirez/hping (accessed on 11 December 2021).
  46. DNS Tunneling Tool. Available online: https://github.com/yarrick/iodine (accessed on 11 December 2021).
  47. Zeek. An Open Source Network Security Monitoring Tool. Available online: https://zeek.org/ (accessed on 11 May 2022).
  48. UCI Machine Learning Repository. Available online: https://archive.ics.uci.edu/ml/index.php (accessed on 11 December 2021).
  49. Kaggle. DS2OS Traffic Traces. Available online: https://www.kaggle.com/datasets/francoisxa/ds2ostraffictraces (accessed on 11 December 2021).
  50. IEEEDataPort. The Bot-IoT Dataset. Available online: https://ieee-dataport.org/documents/bot-iot-dataset (accessed on 11 December 2021).
  51. Kaggle. N-BaIoT Dataset to Detect IoT Botnet Attacks. Available online: https://www.kaggle.com/datasets/mkashifn/nbaiot-datasetURL (accessed on 11 December 2021).
  52. Hochschule Coburg. CIDDS-Coburg Intrusion Detection Data Sets. Available online: https://www.hs-coburg.de/forschung/forschungsprojekte-oeffentlich/informationstechnologie/cidds-coburg-intrusion-detection-data-sets.html (accessed on 11 December 2021).
  53. UNSW Sydney. The UNSW-NB15 Dataset. Available online: https://research.unsw.edu.au/projects/unsw-nb15-dataset (accessed on 11 December 2021).
  54. UNB. University of New Brunswick. NSL-KDD Dataset. Available online: https://www.unb.ca/cic/datasets/nsl.html (accessed on 11 December 2021).
  55. What Is the Mirai Botnet? Available online: https://www.cloudflare.com/learning/ddos/glossary/mirai-botnet/ (accessed on 11 May 2022).
  56. Gafgyt Botnet Lifts DDoS Tricks from Mirai. Available online: https://threatpost.com/gafgyt-botnet-ddos-mirai/165424/ (accessed on 11 May 2022).
  57. Dark Nexus, the Latest IoT Botnet Targets a Wide Range of Devices. Available online: https://crazygreek.co.uk/dark-nexus-iot-botnet-targets-devices/ (accessed on 11 May 2022).
  58. Scikit-Learn. Machine Learning in Python. Available online: https://scikit-learn.org/stable/index.html (accessed on 11 May 2022).
  59. Sklearn.Tree.DecisionTreeClassifier—Scikit-Learn 1.0.2 Documentation. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html (accessed on 11 May 2022).
  60. Sklearn.Ensemble.RandomForestClassifier—Scikit-Learn 1.0.2 Documentation. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html (accessed on 15 May 2022).
  61. Sklearn.Neighbors.KNeighborsClassifier—Scikit-Learn 1.0.2 Documentation. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html (accessed on 15 May 2022).
  62. Sklearn.Neighbors.GradientBoostingClassifier—Scikit-Learn 1.0.2 Documentation. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html (accessed on 11 May 2022).
  63. Sklearn.Svm.SVC—Scikit-Learn 1.0.2 Documentation. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html (accessed on 15 May 2022).
  64. Kolisnyk, M. Vulnerability analysis and method of selection of communication protocols for information transfer in Internet of Things systems. Radioelectron. Comput. Syst. 2021, 1, 133–149. [Google Scholar] [CrossRef]
  65. Illiashenko, O.; Kolisnyk, M.; Strielkina, A.; Kotsiuba, I.; Kharchenko, V. Conception and application of dependable Internet of Things based systems. Radio Electron. Comput. Sci. Control 2020, 4, 139–150. [Google Scholar] [CrossRef]
Figure 1. IoT cyberattack detection scheme.
Figure 1. IoT cyberattack detection scheme.
Algorithms 15 00239 g001
Figure 2. Comparison of different MLA efficiencies (decision tree—DT, random forest—RF, K-nearest neighbor—KNN, extreme gradient boosting—XGBoost, support vector machine—SVM) for Router/Mirai botnet detection: (a) TCP attack; (b) UDP attack; (c) HTTP GET attack; (d) DNS tunneling.
Figure 2. Comparison of different MLA efficiencies (decision tree—DT, random forest—RF, K-nearest neighbor—KNN, extreme gradient boosting—XGBoost, support vector machine—SVM) for Router/Mirai botnet detection: (a) TCP attack; (b) UDP attack; (c) HTTP GET attack; (d) DNS tunneling.
Algorithms 15 00239 g002
Figure 3. Comparison of different MLA efficiencies (decision tree—DT, random forest—RF, K-nearest neighbor—KNN, extreme gradient boosting—XGBoost, support vector machine—SVM) for Router/Gafgyt botnet detection: (a) TCP attack; (b) UDP attack; (c) HTTP GET attack; (d) DNS tunneling.
Figure 3. Comparison of different MLA efficiencies (decision tree—DT, random forest—RF, K-nearest neighbor—KNN, extreme gradient boosting—XGBoost, support vector machine—SVM) for Router/Gafgyt botnet detection: (a) TCP attack; (b) UDP attack; (c) HTTP GET attack; (d) DNS tunneling.
Algorithms 15 00239 g003
Figure 4. Comparison for different MLA efficiencies (decision tree—DT, random forest—RF, K-nearest neighbor—KNN, extreme gradient boosting—XGBoost, support vector machine—SVM) for Router/Dark Nexus botnet detection: (a) TCP attack; (b) UDP attack; (c) HTTP GET attack; (d) DNS tunneling.
Figure 4. Comparison for different MLA efficiencies (decision tree—DT, random forest—RF, K-nearest neighbor—KNN, extreme gradient boosting—XGBoost, support vector machine—SVM) for Router/Dark Nexus botnet detection: (a) TCP attack; (b) UDP attack; (c) HTTP GET attack; (d) DNS tunneling.
Algorithms 15 00239 g004
Table 1. Machine learning algorithm (MLA) efficiency for cyberattack detection in the Internet of Things infrastructure.
Table 1. Machine learning algorithm (MLA) efficiency for cyberattack detection in the Internet of Things infrastructure.
AuthorsGoalMLAData SetResult
Shire, R.; Shiaeles, S.; Bendiab, K.; Ghita B.;
Kolokotronis, N. [14]
malware detection, zero-day malware classificationConvolutional Neural Network and binary visualizationReal network environmentsAccuracy of 91.32%,
Precision of 91.67%,
Recall of 91.03%
Elmrabit, N.;
Zhou, F.; Li, F.;
Zhou H. [15]
anomaly detection, attack detectionLogistic Regression, Decision Tree, Adaptive boosting, KNN, Random Forest, Naive Bayes, Gated Recurrent Units, Simple Recurrent Neural Network, Convolutional Neural Network and Long short-Term Memory, Convolutional Neural Network, Long short-Term Memory, Deep Neural NetworkUNSW-NB15, CICIDS-2017, ICS CyberattackPerformance about 99.9% using Random Forest (CICIDS-2017)
Bagui, X. Wang;
Bagui, S. [16]
intrusion detectionLogistic regression, SVM, random forestUCI Machine Learning Repository Accuracy of about 99%
Kumar, P.;
Gupta, G.P.;
Tripathi, R. [17]
cyber-attack detection against IoT networksK-nearest neighbor, random forest, XGBoostDS2OS, NSL-KDD, BoT-IoT Accuracy up to 99%,
detection 90–100%
Ravi N.;
Shalinie S.M. [18]
DDoS attacks detection and attacks mitigationELM, semi-supervised extreme learning machines UNB-ISCXAccuracy of about 96.28%
Otoum, Y.; Liu, D.; Nayak A. [19]DoS, user-to-root (U2R), remote-to-local (R2L) detection, probe, intrusionsStacked-deep polynomial networkNSL-KDDAccuracy up to 99.02%, Precision up to 99.4%,
recall up to 98.3%,
F1-score up to 98.8%
Verma, A.;
Ranga, V. [20]
Survey on machine learning algorithms for DoS attacks detectionAdaBoost, extremely randomized trees, multilayer perceptron, classification and regression trees, random forest, gradient boosted machine, extreme gradient boostingUNSW-NB15, NSL-KDD, CIDDS-001Regression trees, classification trees, and EG boosting show the best results—accuracy up to 96.7%, specificity up to 96.2%, sensitivity up to 97.3%
Alrashdi, I.;
Alqazzaz, A.;
Aloufi, E.;
Alharthi, R.;
Zohdy, M.;
Ming, H. [21]
Detection of DDoS attacks Bat AlgorithmN-BaIoTAccuracy up to 90%
Krishna, E.S.; Thangavelu, A. [22]Detection of the DDoS attacksRandom ForestNSL-KDD, NBaIoTAccuracy up to 99.98%, precision up to 99.87%, recall up to 100%,
and F-score up to 99.73%
Mihoub, A.;
Fredj, O.B.; Cheikhrouhou, O.; Derhab, A.;
Krichen, M. [23]
Investigation of DoS/DDoS attacks detection for IoT based on ML algorithmsLooking-back-enabled random forestIoT-BotAccuracy up to 99.81%
Khan, M.A.;
Khan Khattk, M.A.; Latif, S.; Shah, A.A.; Ur Rehman, M.; Boulila, W.;
Ahmad, J. [24]
intrusion detectionCombined decision tree, naive Bayes, random forest, and K-Nearest Neighbors using a voting-based techniqueTON IoTAccuracy up to 88%,
Precision up to 90%,
Recall up to 88%,
F-score of 88% for DT-RF-NB based on binary classification with a combined IoT dataset
Alharbi, A.;
Alosaimi, W.; Alyami, H.;
Rauf, H.T. [25]
detection of DDoS attacksBat algorithmN-BaIoTAccuracy up to 90%
Saia, R.; Carta, S.; Recupero, D.R. [27]intrusion events detectionMultilayer perceptron, decision tree, adaptive boosting, gradient boosting, random forestsNSL-KDDBetter performance compared to single classifiers in terms of specificity, without significant degradation in other aspects, since there is little degradation in terms of mean F-score, but a positive mean AUC (compared to competitor approaches), demonstrates the effectiveness of the approach
Abdulhammed, R.; Musafer, H.;
Alessa, A.;
Faezipour, M.;
Abuzneid, A. [28]
developing the features dimensionality reduction approaches for machine learning-based IDSBayesian network, random forest, linear discriminant analysis, quadratic discriminant analysisCICIDS2017Reducing the feature dimensions of a dataset from 81 to 10, with high accuracy of 99.6% in both multi-class and binary classification
Abdulhammed, R.; Faezipour, M.; Abuzneid, A.; AbuMallouh, A. [29]applying various approaches for handling imbalanced datasets to design effective IDSRandom forest, deep neural networks, variational autoencoder, voting, stackingCIDDS-001Attacks detection with up to 99.99% accuracy
Carta, S.; Podda, A.S.; Recupero, D.R.;
Saia, R. [30]
solving such cybersecurity problems, as the difficulty of distinguishing illegitimate activities from legitimate onesRandom forests, decision tree, gradient boosting, adaptive boosting, multilayer perceptronNSL-KDD, CICIDS2017, UNSW-NB15Improving the performance of the state-of-the-art canonical solutions
Table 2. Flow-based features.
Table 2. Flow-based features.
#Features DesignationValue Description
1f1Protocol type
2f2Source IP address
3f3Destination IP address
4f4Source port
5f5Destination port
6f6Last connection time
7f7Transaction bytes from f2 to f3
8f8Transaction bytes from f3 to f2
9f9Mean packet size transmitted by f2
10f10Mean packet size transmitted by f3
11f11Source bits per second
12f12TTL value, f2 to f3
13f13TTL value, f3 to f2
14f14Interpacket interval
15f15Bandwidth
16f16Packet jitter
Table 3. MQTT-based features.
Table 3. MQTT-based features.
#Features DesignationValue Description
1f18The amount of connections to f3 in N gathered records according to f6
2f19The amount of connections of f2 in N gathered records according to f6
3f20The amount of connections of f2 and f5 in N gathered records according to f6
4f21The amount of connections to f3 and f4 in N gathered records according to f6
5f22The amount of connections of f2 and f3 in N gathered records according to f6
Table 4. HTTP-based features.
Table 4. HTTP-based features.
#Features DesignationValue Description
1f48HTTP request method (GET, POST, HEAD)
2f49HOST header value
3f50Length of the HOST header value
4f51URL in the request
5f52Length of URL
6f53HTTP pipelining depth
7f54Uncompressed size of the transferred data from the client
8f55Uncompressed size of the transferred data from the server
9f56Percentage of f48 with the same f49 in N records according to f6
10f57Percentage of the f49 the with same the f51 in N records according to f6
11f58Percentage of f48 with the same f51 in N records according to f6
Table 5. DNS-based features.
Table 5. DNS-based features.
#Features DesignationValue Description
1f23Requested domain name
2f24Value specifying the request type
3f25Length of f23
4f26Amount of unique characters in f23
5f27Entropy of f23
6f28TTL-period, mode (the value that appears most often in a set of data), in N records according to f6
7f29TTL-period, median (the numerical value separating the higher half of a data sample from the lower half), in N records according to f6
8f30TTL-period, average value, in N records according to f6
9f31Amount of A-records corresponding to f23 in the incoming DNS messages (the feature is used if value f31 > 1), in N records according to f6
10f32Amount of IP addresses concerned with f23 (feature is used if value f31 = 1), in N records according to f6
11f33Average distance between the IP addresses concerned with f23 (feature is used if value f31 = 1), in N records according to f6
12f34Average distance between the IP addresses in the set of A-records for f23 in the incoming DNS message (feature is used if value f31 > 1), in N records according to f6
13f35Amount of unique IP addresses in sets of A-records corresponding to f23 in the DNS messages (feature is used if value f31 > 1), in N records according to f6
14f36Average distance between unique IP addresses in sets of A-records corresponding to f23 in the DNS messages (feature is used if value f31 > 1), in N records according to f6
15f37Domain name amounts that share IP addresses corresponding to f23, in N records according to f6
16f38Sign of the usage of uncommon types of DNS records, or DNS records that are not commonly used by a typical client (e.g., TXT are most often used for tunneling (excluding mail servers), KEY, or NULL)
17f39The entropy of the DNS records, which are contained in the DNS messages (CNAME, TXT, NS, MX, KEY, NULL, etc.)
18f40Maximum size of the DNS messages about f23, in N records according to f6
19f41Sign of success of DNS query (f41 = 0 if DNS query failed, and f41 = 1 if DNS query was successful)
20f42Answer length
21f43Mean class value in N records according to f6
22f44Mean type value in N records according to f6
23f45Amount of f2 and f23 in N records according to f6
24f46Amount of f23 to the same f2 in N records according to f6
25f47Percentage of the domain in N records according to f6
Table 6. Decision tree algorithm parameters [59].
Table 6. Decision tree algorithm parameters [59].
ParameterValueDescription
criterionginiThe function to measure the quality of a split.
splitterbestThe strategy used to choose the split at each node.
max_depthNoneThe maximum depth of the tree.
min_samples_split3The minimum number of samples required to split an internal node.
min_samples_leaf1The minimum number of samples required to be at a leaf node.
min_weight_fraction_leaf0.0The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.
max_featuresautoThe number of features to consider when looking for the best split.
random_stateRandomState instanceControls the randomness of the estimator.
class_weightbalancedWeights associated with classes.
ccp_alpha0.0Complexity parameter used for minimal cost complexity pruning.
Table 7. Random forest algorithm parameters [60].
Table 7. Random forest algorithm parameters [60].
Parameter Value Description
n_estimators100The number of trees in the forest.
criterionginiThe function to measure the quality of a split.
max_depthNoneThe maximum depth of the tree.
min_samples_split2The minimum number of samples required to split an internal node.
min_samples_lea1The minimum number of samples required to be at a leaf node.
min_weight_fraction_leaf0.0The minimum weighted fraction of the sum total of weights.
max_featureslog2The number of features to consider when looking for the best split.
class_weightbalancedWeights associated with classes.
ccp_alpha0.0Complexity parameter used for minimal cost complexity pruning.
Table 8. K-Nearest Neighbor algorithm parameters [61].
Table 8. K-Nearest Neighbor algorithm parameters [61].
Parameter Value Description
n_neighbors5Number of neighbors.
weightsdistanceWeight function used in prediction.
algorithmkd_treeThe algorithm used to compute the nearest neighbors.
leaf_size30Leaf size passed to KDTree.
p2Power parameter for the Minkowski metric.
metricstrThe distance metric to use for the tree.
metric_paramsdictThe number of parallel jobs to run for the neighbors’ search.
Table 9. Extreme gradient boosting algorithm parameters [62].
Table 9. Extreme gradient boosting algorithm parameters [62].
Parameter Value Description
lossexponentialThe loss function to be optimized.
learning_rate0.1Learning rate shrinks the contribution of each tree.
n_estimators100The number of boosting stages to perform.
subsample1.0The fraction of samples to be used for fitting the individual base learners.
criterionsquared_errorThe function to measure the quality of a split.
min_samples_split2The minimum number of samples required to split an internal node.
min_weight_fraction_leaf0.0The minimum weighted fraction of the sum total of weights (of all the input samples) required to be at a leaf node.
max_depth3The maximum depth of the individual regression estimator.
random_stateRandomState instanceControls the random seed given to each tree estimator at each boosting iteration.
max_featuresNoneThe number of features to consider when looking for the best split.
max_leaf_nodesNoneGrow trees with max_leaf_nodes in the best-first fashion.
validation_fraction0.1The proportion of training data to set aside as the validation set for early stopping.
n_iter_no_changeNoneThe decision as to whether early stopping will be used to terminate training when the validation score does not improve.
tol1 × 103Tolerance for the early stopping.
ccp_alpha0.0Complexity parameter used for minimal cost complexity pruning.
Table 10. Support vector machine parameters [63].
Table 10. Support vector machine parameters [63].
Parameter Value Description
C1.0Regularization parameter.
kernelrbfSpecifies the kernel type to be used in the algorithm.
gammaautoKernel coefficient.
tol1 × 103Tolerance for stopping criterion.
cache_size100Specify the size of the kernel cache (in MB).
max_iter−1Hard limit on iterations (no limit).
random_stateRandomState instanceControls the pseudo-random number generation to shuffle the data for probability estimates.
Table 11. Classification results (router—Mirai).
Table 11. Classification results (router—Mirai).
Device/
Botnet
AttackAlgorithmAccuracyTPFPFNTNPrecisionRecallF1 ScoreAUC
Router/
Mirai
TCPRF0.99947936202420240.99947480.9998960.9995720.999615
DT0.99858436123520300.999170.9986180.9988940.998994
kNN0.99946936031220440.9997230.9994450.9995840.999692
XGBoost0.99893835625120820.9985980.9997190.9991580.999573
SVM0.996991354461120890.998310.9969060.9976070.997881
UDPRF0.99976775315220120.9999370.9998350.9999350.999841
DT0.99926775154320280.9994680.9996010.9995340.99975
kNN0.99947674702320750.9997320.9995990.9996650.999821
XGBoost0.99968674651220820.9998660.9997320.9997990.999827
SVM0.9985347455101720680.9986780.9986780.9986780.999174
HTTP GETRF0.99969464343320600.9998340.9997340.9997340.999839
DT0.99941264191420760.9998440.9993770.9996110.999793
kNN0.99941263871421080.9998430.9993740.9996090.999458
XGBoost0.99952963402221560.9996850.9996850.9996850.999671
SVM0.997412638151421000.9986360.996370.9975020.999051
DNS tunnelingRF0.99962459783420050.9997980.9997310.9996150.999944
DT0.99924959352420490.9996630.9993260.9994950.999928
kNN0.99937459203220650.9994930.9996620.9995780.999632
XGBoost0.99899959035320790.9991540.9994920.9993230.999186
SVM0.997247589951420720.9986490.995420.9970320.997547
Table 12. Classification results (router—Gafgyt).
Table 12. Classification results (router—Gafgyt).
Device/
Botnet
AttackAlgorithmAccuracyTPFPFNTNPrecisionRecallF1 ScoreAUC
Router/
Gafgyt
TCPRF0.999714119842220020.9998330.9998330.9998330.999835
DT0.999571119632420210.9998330.9996660.9997490.999757
kNN0.999357119174520640.9996640.9995810.9996230.999792
XGBoost0.999643118813221040.9997480.9998320.999790.999734
SVM0.9987131188871120840.9994120.9990760.9992440.999523
UDPRF0.99973844982119990.9996560.9998780.9996670.999882
DT0.99907744534220410.9991030.9995510.9993270.99947
kNN0.99938544303120660.9993230.9997740.9995490.999648
XGBOOST0.99907743915121030.9988630.9997720.9993170.999712
SVM0.99830844336944330.9990560.9988670.9989610.998861
HTTP GETRF0.999784210822320130.9999050.9998580.9998810.999913
DT0.99987210341220630.9999520.9999050.9999290.999912
kNN0.999697209972520960.9999050.9997620.9998330.999971
XGBoost0.999827209901321060.9999520.9998570.9999050.999845
SVM0.99896120986617186840.9984090.9961440.9972750.999221
DNS tunnelingRF0.99984631912420030.9996740.9997480.9995610.999783
DT0.99826931535420380.9984170.9987330.9985750.999548
kNN0.99865431152520780.9993580.9983970.9988780.999539
XGBoost0.99961530741121240.9996750.9996750.9996750.999882
SVM0.9961543121101114850.9989190.9956880.9973010.997861
Table 13. Classification results (router—Dark Nexus).
Table 13. Classification results (router—Dark Nexus).
Device/
Botnet
AttackAlgorithmAccuracyTPFPFNTNPrecisionRecallF1 ScoreAUC
Router/
Dark
nexus
TCPRF0.99933354904120050.9992720.9998180.9995450.999691
DT0.999254725120220.9990870.9998170.9994520.999982
kNN0.99893354553520370.999450.9990840.9992670.999836
XGBOOST0.999254172420770.9996310.9992620.9994460.999285
SVM0.997653949920880.9983340.9983340.9983340.999444
UDPRF0.999344101965319960.999510.9997060.9996080.999488
DT0.999672101711320250.9999020.9997050.9998030.999932
kNN0.999426101463420470.9997040.9996060.9996550.999835
XGBOOST0.999426101202520730.9998020.9995060.9996540.999844
SVM0.9982791013779101370.9983010.9977360.9980190.998421
HTTP GETRF0.999771197672320180.9998990.9998480.9998740.999853
DT0.999725197463320380.9998480.9998480.9998480.999995
kNN0.999679197162520670.9998990.9997460.9998230.999931
XGBOOST0.999771196661421190.9999490.9997970.9998730.999794
SVM0.9989919665418196650.999090.9959180.9975020.999452
DNS tunnelingRF0.99929893513520410.9996790.9994660.9995720.999457
DT0.99947493014220930.999570.9997850.9996780.999974
kNN0.99973792852121120.9997850.9998920.9998380.999859
XGBOOST0.99938692435221500.9994590.9997840.9996210.999482
SVM0.997895930271093020.9981090.995420.9967630.998561
Table 14. Classification results (thermostat—Mirai).
Table 14. Classification results (thermostat—Mirai).
Device/
Botnet
AttackAlgorithmAccuracyTPFPFNTNPrecisionRecallF1 ScoreAUC
Thermostat/
Mirai
TCPRF0.99993836231520210.9997240.9996220.9997730.999913
DT0.99893836181520260.9997240.998620.9991710.999446
kNN0.99893835695120750.9986010.999720.999160.999017
XGBOOST0.99964635281121200.9997170.9997170.9997170.999923
SVM0.996106353551720930.9985880.9952140.9968980.999678
UDPRF0.99998674952120520.9999330.9998970.99990.999865
DT0.99979174511120970.9998660.9998660.9998660.999834
kNN0.99937274461520980.9998660.9993290.9995970.999701
XGBOOST0.99947674071421380.9998650.999460.9996630.999991
SVM0.997906744631074460.9990560.9971720.9981130.999816
HTTP GETRF0.99985964384420540.9998790.9997790.9997790.999861
DT0.99952963912221050.9996870.9996870.9996870.999722
kNN0.99941263694121260.9993720.9998430.9996080.999893
XGBOOST0.99929463435121510.9992120.9998420.9995270.999791
SVM0.997176640291064020.9984090.9961440.9972750.999465
DNS tunnelingRF0.99964959761520080.9998330.9998640.9994980.999692
DT0.99887459674520140.999330.9991630.9992460.999617
kNN0.99924959253320590.9994940.9994940.9994940.999828
XGBOOST0.99937458764121090.999320.999830.9995750.999422
SVM0.9969965890101558900.9983790.9951520.9967630.998059
Table 15. Classification results (thermostat—Gafgyt).
Table 15. Classification results (thermostat—Gafgyt).
Device/
Botnet
AttackAlgorithmAccuracyTPFPFNTNPrecisionRecallF1 ScoreAUC
Thermostat/
Gafgyt
TCPRF0.999943119733220120.9998490.9998330.9998910.999954
DT0.999714119422220440.9998330.9998330.9998330.999876
kNN0.999571119192420650.9998320.9996650.9997480.999744
XGBOOST0.999786119031220840.9999160.9998320.9998740.999962
SVM0.998571188571320850.9994110.9989070.9991590.998787
UDPRF0.99981544944519970.9998110.9998890.9999930.999972
DT0.99892344592520340.9995520.998880.9992160.998947
kNN0.99953844352120620.9995490.9997750.9996620.999642
XGBOOST0.99892344003420930.9993190.9990920.9992050.999741
SVM0.996769442061244200.998490.9975480.9980190.998866
HTTP GETRF0.999784210873220080.9998580.9999050.9998810.999862
DT0.99961210424520490.999810.9997620.9997860.999649
kNN0.99974210251520690.9999520.9997620.9998570.999824
XGBOOST0.99974209831521110.9999520.9997620.9998570.999743
SVM0.99935120992810209920.9984090.9981820.9982950.999371
DNS tunnelingRF0.99993131873120090.999760.9998860.9997730.999842
DT0.99923131702220260.9993690.9993690.9993690.999636
kNN0.99923131251320710.999680.9990410.999360.999325
XGBOOST0.99865430842521090.9993520.9983810.9988660.998948
SVM0.99596231377931370.9986490.9956880.9971660.998563
Table 16. Classification results (thermostat—Dark Nexus).
Table 16. Classification results (thermostat—Dark Nexus).
Device/
Botnet
AttackAlgorithmAccuracyTPFPFNTNPrecisionRecallF1 ScoreAUC
Thermostat/
Dark nexus
TCPRF0.99906754845220090.9998890.9997350.9998620.999807
DT0.99933354574120380.9992680.9998170.9995420.999866
kNN0.99906754402520530.9996320.9990820.9993570.999787
XGBOOST0.998854094520820.9992610.9990760.9991690.999392
SVM0.9972538691220930.9983320.9977770.9980540.998168
UDPRF0.99988101885520020.999880.999880.999880.99988
DT0.999344101463520460.9997040.9995070.9996060.999544
kNN0.999262101405420510.9995070.9996060.9995560.999830
XGBOOST0.999344101305320620.9995070.9997040.9996050.999510
SVM0.99803310073316100730.998490.9969840.9977360.998590
HTTP GETRF0.999633197653520170.9998480.9997470.9997980.999937
DT0.999541197305520500.9997470.9997470.9997470.999730
kNN0.999725197162420680.9998990.9997970.9998480.999950
XGBOOST0.999679196803421030.9998480.9997970.9998220.999980
SVM0.99908219644413196440.9981820.9972750.9977280.999110
DNS tunnelingRF0.99964993813120150.999680.9998930.9997870.999683
DT0.99973793451220520.9998930.9997860.999840.999960
kNN0.99921193354520560.9995720.9994650.9995180.999830
XGBOOST0.99956193053220900.9996780.9997850.9997310.999730
SVM0.998421933941320440.9983790.9967630.997570.999860
Table 17. Classification results (camcorder—Mirai).
Table 17. Classification results (camcorder—Mirai).
Device/
Botnet
AttackAlgorithmAccuracyTPFPFNTNPrecisionRecallF1 ScoreAUC
Camcorder/
Mirai
TCPRF0.99929236392220070.9994510.9994510.9994510.99907
DT0.99858436234420190.9988970.9988970.9988970.999863
kNN0.99964635981120500.9997220.9997220.9997220.999781
XGBOOST0.99893835923320520.9991660.9991660.9991660.999396
SVM0.996106356081420680.9977580.9960830.996920.998166
UDPRF0.99987275452419990.9998350.999870.9996030.99983
DT0.99926774975220460.9993340.9997330.9995330.999701
kNN0.99895374515520890.9993290.9993290.9993290.999833
XGBOOST0.99958174441321020.9998660.9995970.9997310.999515
SVM0.99759274126974120.9988670.9967960.997830.998597
HTTP GETRF0.99952964792220170.9996910.9996910.9996910.99993
DT0.99941264612320340.9996910.9995360.9996130.999737
kNN0.99941264373220580.9995340.9996890.9996120.999959
XGBOOST0.99941264212320740.9996890.9995330.9996110.999982
SVM0.998353640571064050.9984090.9984090.9984090.999113
DNS tunnelingRF0.99924959784220060.9993310.9996660.9994980.99968
DT0.99899959594420230.9993290.9993290.9993290.999963
kNN0.99912459424320410.9993270.9994950.9994110.999832
XGBOOST0.99887459145420670.9991550.9993240.999240.999737
SVM0.99712159145859140.997570.9962250.9968970.999861
Table 18. Classification results (camcorder—Gafgyt).
Table 18. Classification results (camcorder—Gafgyt).
Device/
Botnet
AttackAlgorithmAccuracyTPFPFNTNPrecisionRecallF1 ScoreAUC
Camcorder/
Gafgyt
TCPRF0.999971119815120030.9999830.9999170.999850.999889
DT0.999643119411420440.9999160.9996650.9997910.999486
kNN0.999571119245120600.9995810.9999160.9997480.999991
XGBOOST0.999571119163320680.9997480.9997480.9997480.999364
SVM0.9987131192010820520.9991620.9993290.9992460.999484
UDPRF0.99992344964319970.9998110.9998330.9998220.999913
DT0.99861544655420260.9988810.9991050.9989930.999888
kNN0.99907744301520640.9997740.9988730.9993230.999442
XGBOOST0.99923143872321080.9995440.9993170.999430.999591
SVM0.997231436581111930.9983010.9983010.9983010.997732
HTTP GETRF0.99974210563320380.9998580.9998580.9998580.999965
DT0.999827210491320470.9999520.9998570.9999050.999425
kNN0.999784210064120890.999810.9999520.9998810.999628
XGBOOST0.99974209585121360.9997610.9999520.9998570.999901
SVM0.99909121005820210050.9986360.9965960.9976150.999821
DNS tunnelingRF0.99903831824120130.9997450.9996860.9992150.999491
DT0.99807731765520140.9984280.9984280.9984280.999020
kNN0.99884631652420290.9993680.9987380.9990530.999290
XGBOOST0.99846231604420320.9987360.9987360.9987360.999390
SVM0.99692331235931230.9983790.9973010.997840.997460
Table 19. Classification results (camcorder—Dark Nexus).
Table 19. Classification results (camcorder—Dark Nexus).
Device/
Botnet
AttackAlgorithmAccuracyTPFPFNTNPrecisionRecallF1 ScoreAUC
Camcorder/
Dark Nexus
TCPRF0.99972254235120710.9997790.9998160.9997470.99986
DT0.99893354033520890.9994450.9990750.999260.999869
kNN0.998853784521130.9992570.9990710.9991640.999911
EGB0.999253591521350.9998130.9990680.9994410.999932
SVM0.997867533610621480.9981290.9988770.9985030.999542
UDPRF0.999918101875520030.9998090.9998390.9998510.999861
DT0.99959101394120560.9996060.9999010.9997530.998747
kNN0.999426101324320610.9996050.9997040.9996550.999406
EGB0.999262101244520670.9996050.9995060.9995560.999904
SVM0.9981151010571620720.9993080.9984190.9988630.999489
HTTP GETRF0.999633197693520130.9998480.9997470.9997980.999851
DT0.999587197335420480.9997470.9997970.9997720.999996
kNN0.999862197261220610.9999490.9998990.9999240.999952
EGB0.999633197094420730.9997970.9997970.9997970.999766
SVM0.9990361970481320650.9995940.9993410.9994670.999123
DNS tunnelingRF0.99947493854220090.9995740.9997870.999680.999921
DT0.99938693445220490.9994650.9997860.9996260.998696
kNN0.99964993183120780.9996780.9998930.9997850.999282
EGB0.99938693052520880.9997850.9994630.9996240.999645
SVM0.998421931781020650.9991420.9989280.9990350.998664
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Lysenko, S.; Bobrovnikova, K.; Kharchenko, V.; Savenko, O. IoT Multi-Vector Cyberattack Detection Based on Machine Learning Algorithms: Traffic Features Analysis, Experiments, and Efficiency. Algorithms 2022, 15, 239. https://doi.org/10.3390/a15070239

AMA Style

Lysenko S, Bobrovnikova K, Kharchenko V, Savenko O. IoT Multi-Vector Cyberattack Detection Based on Machine Learning Algorithms: Traffic Features Analysis, Experiments, and Efficiency. Algorithms. 2022; 15(7):239. https://doi.org/10.3390/a15070239

Chicago/Turabian Style

Lysenko, Sergii, Kira Bobrovnikova, Vyacheslav Kharchenko, and Oleg Savenko. 2022. "IoT Multi-Vector Cyberattack Detection Based on Machine Learning Algorithms: Traffic Features Analysis, Experiments, and Efficiency" Algorithms 15, no. 7: 239. https://doi.org/10.3390/a15070239

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop