1. Introduction
IoT devices are currently widely employed in intelligent applications, including smart cities, healthcare [
1], and transportation. All of these IoT-enabled applications share two similar functions: “monitoring” (regularly checking the sensors’ state) and “actuating” (acting on the data gathered during monitoring). Additionally, IoT is a networked system built on recognized standards that exchange knowledge. Further, many communication standards, tools, and protocols have been developed due to the many appliance domains. As a result, the Internet of Things (IoT) is frequently referred to as the Internet of People (IoP) because practically everyone uses it regularly, from people to institutions. Moreover, it enables measurement collection from small, affordable, intelligent end nodes dispersed over a vast physical region with less expensive implementation and operation [
2]. However, these advantages come at a cost in terms of finite resources, particularly the end nodes’ battery life.
Some of the data collected by IoTs are seen to be unexpected. The shocking data may come from environmental changes, deliberate action, faulty operation, coincidence, or perhaps both. Anomalies were used to describe them [
3]. Even though the sensors employed at the edge are insufficient, anomalies are predicted to occur. The end nodes’ battery lives may suffer due to the nodes’ rapid processing. Due to these restrictions, the network may be more susceptible to errors and malicious attacks [
4].
IoT devices are vulnerable to assault since they are connected to the Internet and lack proper security measures. An attacker can swiftly hack IoT devices by taking over smart gadgets that can be used maliciously to exploit other IoT-connected devices [
5,
6]. Therefore, it is crucial to recognize improper actions to ensure the network operates reliably and securely. Additionally, IoT networks can prevent the broadcasting of useless or inaccurate measurements by spotting intriguing or uncommon events. As a result, the network’s dependability can increase while energy consumption is decreased [
7].
Anomaly detection entails the identification of noteworthy or unexpected occurrences in the network [
8]. Finding a model for the vast majority of normal data is essential to identify anomalies in a dataset. The anomalies can then be identified as those data vectors that considerably depart from the normal model. Finding abnormalities in the network [
9] while minimizing overhead and obtaining high detection accuracy is a major challenge.
In the Internet of Things, there are two categories of anomaly detection mechanisms: statistical and machine learning [
10]. Only regular IoT traffic is used in statistical methods to create trained models [
11]. While doing this, machine learning techniques use both legitimate and malicious communications to train their models. Based on the learning process, these methods are divided into supervised, unsupervised, and semi-supervised categories [
12]. During the supervised learning process, the traffic features are mapped to a traffic class, such as normal or assault. Only labeled datasets are used in this learning procedure. By finding intriguing structures in the data, the unsupervised learning process learns the traffic features without being aware of the traffic class. Unsupervised learning groups comparable data in semi-supervised learning, whereas labeled data is used to categorize unlabeled data.
The current detection methods for anomalies depend primarily on a centralized cloud’s [
13] inability to address IoT requirements, such as resource allocation and scalability. With IoT, operations are carried out across many devices, and large amounts of data are exponentially generated [
14]. Since it enables users to access Internet-based services, the cloud is essential to the Internet of Things (IoT). However, because of its centralized architecture, it cannot manage IoT devices even while it does expensive calculations. The great distance between an IoT device and the centralized anomaly detection system also results in a high detection time. Since the centralized cloud environment can accommodate the service requirements of IoT, anomaly detection in IoT differs from currently used methodologies [
15]. A brand-new distributed intelligence technique called “computation” is used to reduce the gap. The fog exchanges information by processing data near the data sources, i.e., IoT devices. At the fog layer, as depicted in
Figure 1, where fog nodes perform dispersed processing, security measures can be put into place [
16]. To implement distributed security mechanisms, expensive computations and storage from IoT devices may be offloaded [
17].
In this study, we introduced a framework model and a hybrid algorithm for effective ML algorithm selection to discover workable methods for anomalies and incursion IoT network traffi in a fog environment from many ML algorithms.
Significant contributions of the current work include:
The majority of Intrusion Detection System (IDS) related works are based on the outdated KDD cup99 or NSL-KDD [
3,
4,
11,
18,
19,
20,
21] data sets, which do not include the majority of contemporary assaults. In contrast to the KDD cup99 dataset, we employed the latest data set (UNSW-NB15) that covers the most recent attacks in this study.
Comparative study of the performance of the traditional ML models in anomaly detection
A modified Tab transformer model is proposed. It is the first time this unique technology has been used to detect fog node anomalies.
The rest of the paper’s organization is as follows.
Section 2 discusses the literature review of the past work done. The proposed strategy, including data sets, model construction, and performance evaluations, is discussed in
Section 3. In
Section 4, we use experiments to evaluate the proposed methodologies quantitatively. Then, we discuss the methods and results, conclude the paper in
Section 4 and finally conclude in
Section 5.
2. Literature Review
This section presents relevant research and comprehensive background information on machine learning (ML) selection for detecting anomalies and intrusions in IoT networks [
22] traffic.
Anomaly detection in IoT data using deep learning was proposed by [
15], and it was shown to be more effective than a conventional IDS for identifying coordinated IoT Fog assaults. The NSL-KDD intrusion dataset was used. Compared to the standard model’s binary classification recall of 97.50%, it achieved a score of 99.27% for deep learning. In addition, machine learning earned an average recall of 93.66% in multi-classification, whereas deep learning scored an average recall of 96.5%.
The authors of [
23] suggested cognitive fog computing for IDS in an IoT network. The suggested methodology could detect malicious behavior in nearby fog nodes as opposed to employing a centralized cloud-based infrastructure. The cloud stores a list of all fog nodes for future research. The proposed model is assessed using the NSL-KDD dataset, and detection is accomplished using the online sequential extreme learning machine (OSELM) method. Their model has a 0.37% FAR and a 97.36% accuracy rate.
The authors of [
24] suggested an adaptive IDS for IoT that can recognize DoS threats. In this work, a fresh dataset was gathered using Wireshark over the course of four consecutive days on an IoT testbed. Unfortunately, their suggested model outperforms the Naive Bayes classifier.
The authors of [
25] proposed an IDS based on neural networks and locust swarm optimization. For this experiment, which makes use of the NSL-KDD and UNSW-NB15 datasets, the accuracy and FAR are 94.04 and 2.21%, respectively.
Li et al. suggested a combined K-means clustering technique with a PCA fog computing design for anomaly detection. An ELM-based Semi-supervised Fuzzy C-Means (ESFCM) technique was put out by [
12]. The NSL-KDD dataset was utilized. The suggested system outperformed the centralized attack detection framework in terms of performance. It reported an accuracy rate of 86.53% and a decreased detection time of 11 milliseconds.
To put in place an adaptive Intrusion Detection System (IDS) that can recognize when a Fog node has been hacked, and then take the appropriate action to ensure communication availability [
26], authors in [
18] developed an Anomaly Behavior Analysis Methodology based on Artificial Neural Networks [
27] and ensemble approach [
21]. The training dataset was produced using the IoT testbed. The accuracy rate of the approach was 97.51%.
Similarly, the authors in [
22] suggested a variational long short-term memory (VLSTM) learning model based on reconstructed feature representation for intelligent anomaly identification. Experiments using the publicly available UNSW-NB15 IBD dataset demonstrate that the proposed VLSTM model can successfully address the imbalance and high dimensional issues, and that it can also significantly improve accuracy and decrease false rates in anomaly detection.
By dividing the Intrusion Detection System functions across the fog nodes and the cloud, [
19] low resource overheads are achieved. As a result, an accuracy of up to 98.8% was achieved. In addition, compared to installing a neural network on the fog node a 10% decrease in the energy usage of the fog node is observed.
This work is novel since it develops intrusion detection for IoT traffic using SDN and deep learning. SDN enables intelligent network management by separating the control and data planes. In the current IDS, deep learning-based classifiers outperform traditional classifiers in terms of results. Any infiltration in networking systems, in particular IoT networks, is detected by the suggested model [
28]. Current existing work related to Anomaly detection is listed in
Table 1.
Encryption is necessary to safeguard and stop such errors in transmitting delicate data over the internet and other networks. To strengthen the safety of the delicate data or information, the author created an improved variety of the Caesar cipher in this paper and developed a technique in which flexible arithmetic is used to transform plaintext into ciphertext. The author also created a decryption method that is entirely unrelated to encryption by incorporating divisibility tests and arithmetic modulo.
The conventional approach to situational awareness prediction in network security is comparatively simple. For perception and prediction, only one algorithm is typically utilized, and its prediction accuracy is constrained. This study optimizes a radial basis function (RBF) neural network using the simulated annealing (SA) algorithm and the hybrid hierarchy genetic algorithm (HHGA). Hence, it constructs an RBF neural network prediction model based on the HHGA optimization and performs relevant experiments to investigate the application impact of intelligent learning algorithms. The results show that the projected scenario value of the enhanced RBF neural network is relatively close to the actual situation value in 15 instances. The neural network has a significant predictive influence and can assist with network security maintenance [
29].
To the best of our knowledge, no study demonstrates which ML algorithm is efficient for the identification of IoT dangerous traffic, despite numerous research proposals on various identification models for accurately detecting IoT malicious traffic. Most academics conduct experiments to evaluate the ML algorithm’s performance and, based on the results, they choose the most efficient method. However, it is crucial to research and find the most efficient machine learning method for anomaly and intrusion in IoT network traffic identification by reviewing frequently cited and primarily studied literature reviews.
3. Materials and Methods
3.1. Dataset
The current analysis employed the UNSW-NB15 dataset as a benchmark [
30,
31,
32] Previous datasets, including NSLKDD [
33], KDD98, KDDCUP 99 [
34], CIDDS-001, DARPA, and ADFA were already accessible for Network Intrusion Detection System (NIDS) research [
35]. These datasets, most of which date back more than 20 years, have several limitations, making them unreliable and out-of-date. Such datasets are no longer thought to provide a complete or accurate representation of contemporary attack environments, and algorithms trained on such datasets will not exhibit realistic output performance. These databases distort regular traffic and exclude modern attack types, making it simple for stealthy/spy attacks to pass for normal activity.
The following dataset-specific issues also exist: n uneven number of records from various types of traffic, an excessive number of attacks, incomplete training sets that do not accurately reflect all attacks found in the testing set, a dearth of validation work, data generation techniques, and low data rates, etc. [
36,
37].
The Australian Center for Cyber Security (ACCS) produced a more recent dataset in collaboration with several specialists worldwide to solve the problems presented by earlier datasets in the field. It has been a publicly available dataset for the current NIDS since 2015. As indicated in
Table 2, the dataset has 45 total network attributes, including flow and network-based properties. Flow, fundamental, substance, time, and other created features are additional classifications. Approximately 2.5 million CSV-formatted records in total, including 175,341 training data and 82,331 testing data, constitute the entire dataset. The training and testing datasets are devoid of duplicate data to guarantee NIDS evaluation dependability. Two distinct traffic labels are initially applied to the dataset (attack and normal). The attack categories in
Table 2 are further classified into nine more class types according to the attack type.
3.2. Data Preprocessing
In Machine Learning, more data results in more accurate models. However, data from the real world is inconsistent, noisy, incomplete, and consists of missing values as it is compiled utilizing data mining and storage. Therefore, it is crucial to pre-process raw data into the processed form. The data preparation enhances data quality so that valuable insights can be extracted. This will be beneficial for model development and training. The approaches used to pre-process the UNSW-NB15 dataset are described below.
3.3. Data Cleaning
We tried to list the count of the missing values in the dataset corresponding to each feature. The feature “service” had 94,168 missing values for the train set and 47,153 for the test set. After removing the records with missing features, the count of the records corresponding to each class in the total dataset has been reduced.
Figure 2 shows the modified distribution of categories in the total dataset.
3.4. Data Transformation
The characteristics “proto”, “service”, “state”, and “attack cat” contained categorical information that could not be directly put into the ML models. We utilized “One-hot-encoding” to encode absolute values into the binary format, except for “attack cat,” which was the target multiclass attack label that the model had to predict. The columns of the three one-hot encoded characteristics were eliminated, bringing the total number of classes to 61.
The range of the numerical characteristics in the dataset is varied. Therefore, it was essential to normalize the values. Except for the “id” and “label” columns, the numerical feature columns have been normalized using the “MinMaxScaler.”
For binary categorization of the characteristics into “normal” and “abnormal”, the “labels” column was encoded using LabelEncoder() as “0” for the normal class and “1” for the abnormal class. Again, the binary dataset contains 61 columns.
For multiclass classification, the “‘attack cat’ attribute’s nine categories were label encoded as 0 (‘Analysis’), 1 (‘Backdoor’), 2 (‘DoS’), 3(‘Exploits’), 4(‘Fuzzers’), 5(‘Generic’), 6(‘Normal’), 7(‘Reconnaissance’), and 8 (‘Worms’). Consequently, the total number of attributes in the multiclass classification dataset has increased to 69.
3.5. Feature Selection
Feature selection is essential for the efficient training of machine-learning models [
38]. This is because the selection of the features contributes the most to accomplishing a task and eliminates unneeded or redundant qualities [
39]; otherwise, the model can learn from noise and collect insignificant patterns. Consequently, feature selection enhances processing and prediction reliability [
38]. In this paper, correlation-based feature selection is used.
3.6. Model Development
3.6.1. Tab Transformer
The widely used Transformer design by the authors in [
40] served as an inspiration for the TabTransformer architecture that was developed by the authors in [
41]. A column embedding layer, a stack of N Transformer layers, and a multilayer perceptron are the components of the suggested design [
42]. As described by [
43], each Transformer layer comprises a position-wise feed-forward layer, followed by a multi-head self-attention layer. In the study that we are currently presenting, we have utilized a variation from the modified tab transformer model that was proposed by the authors in [
44]. The proposed model is illustrated in
Figure 3. The revised version only utilized the Tab-transformer’s capability to handle the continuous input features. It removed the categorical features and the subsequent normalization layer and concatenation layer related to these features. In other words, it only used the Tab-transformer to handle the continuous features in the input.
The detailed methodology for detecting anomalies in the fog node is depicted in
Figure 4.
3.6.2. Model Training Pipeline
Following the data cleaning process, there were 1,41,321 data samples. In total, 80% of those samples were designated for training, while the remaining 20% were used for testing. The sklearn and keras libraries were utilized during the development of the machine learning models. Pytorch-widedeep is responsible for the implementation of the Tab transformer. A total of ten epochs were used to train the tab transformer model. On the NVidia T4 GPU with 40 GB of RAM, it took 15 s for each epoch to complete.
3.6.3. Performance Evaluation
Accuracy: The ratio of the number of correct predictions to the total number of predictions represents how often the classifier makes accurate predictions, as shown in Equation (1).
Recall: The fraction of true positives successfully identified, as shown in Equation (2)
Precision: Proportion of anticipated positives that are positive, as shown in Equation (3)
F1 score: The harmonic mean of recall and precision, as shown in Equation (4).
5. Conclusions
The current work proposed a fog-based anomaly detection system for IoT networks. Implementation of anomaly detection indicated that fog nodes can be utilised effectively in decentralizing an IoT-based network based on cloud architecture. The suggested model was developed on the UNSW-NB15 dataset and employed its architecture to identify aberrant traffic in IoT networks. The proposed detection technique reduced the number of features for multiclass and binary datasets using correlation-based feature selection. However, the test dataset remains unbalanced. Yet, both the ML and suggested Tab transformers demonstrated satisfactory performance. Our Tab transformer design outperforms conventional ML models and obtained 98.35% accuracy on binary classification (Normal vs. Abnormal Traffic) and 97.22% accuracy on multiclass detection jobs.
Furthermore, by comparing the performance of the proposed model to that of previously created models on the same dataset, we have proved the significance of the correlation-based feature selection method. As IoT devices have varying memory capacity, network bandwidth, and battery life limits, we might construct a lightweight anomaly detection model by utilising an optimum collection of attributes. In the future, we intend to test the performance of the proposed model utilising additional balanced IoT-based data sets and conduct performance research of the proposed model in terms of computation complexity and time.
Limitations
Although the proposed methodology to detect anomalies is performing better than others, it nevertheless has some limitations.
The computational complexity can be further increased by applying new data augmentation techniques. Further, the parameters can be reduced by applying customized models. A few more features can be added to enhance the accuracy further.
The techniques used by authors in [
19] are lightweight and human immune. Whereas we applied the Tab transformer technique in our work, we offer a novel intrusion detection model capable of deployment at the fog nodes to detect the undesired traffic towards the IoT devices by leveraging features from the UNSW-NB15 dataset. A further limitation of the study [
19] is that it did not give a comparison with other works. Furthermore, no technical details are clearly mentioned about the features extracted and how the processing was done.