1. Introduction
Internet of Things (IoT) refers to any device that can automatically gather and transmit data over a network without human involvement. It is steadily growing in importance and is now pervasive throughout our daily lives [
1]. Furthermore, it is a revolution that brings together a wide range of diverse elements such as smart systems, automatic devices, actuators, and sensors. However, growing worries about security, such as software bugs and cyberattacks, can stop many people from using IoT devices. Such IoT security issues are especially significant for businesses operating in industries such as healthcare, financial services, industrial production, transportation, commerce, and other fields that have already begun adopting IoT systems. Another reason to prioritize security when developing IoT systems is to protect their data, smart devices collect massive amounts of sensitive data, including personal information, which must be safeguarded against intrusion. In addition, having a password that is too easy to guess is another problem that the user needs to be mindful of.
Figure 1 displays the top ten passwords that are both easily guessed and come as the default option for IoT devices. Deep Learning (DL) can safeguard the IoT environment by automating the process of scanning and managing IoT devices across the entire network. A DL approach can perform scans on every device connected to the network, automatically preventing and terminating any attacks it finds. It enables IoT security to make predictions based on historical behavior. When it detects a known vulnerability or attack, such as a distributed denial of service (DDoS) attack (
Figure 2 is showing how rapidly it is growing over time), it analyzes the current behavior of the network and compares it to the behavior patterns found in instances of attacks. Learning from traffic patterns and giving a heuristic solution to IoT large-scale attacks are two examples of how DL can be used to deliver considerable value, as well as to stimulate the construction and training of models [
2,
3]. The aging process and the persistent analysis of data assist DL algorithms to produce smarter decisions or increased forecast accuracy [
4].
A wide range of built-in and customized DL models were used in several research works to determine the optimum solution for large-scale attacks generated by IoT devices. The following is a list of models identified through data analysis that are frequently used by researchers and produce strong results in identifying malicious network traffic.
Convolution neural network (CNN)—CNNs are a subset of neural networks that are designed to categorize input data, most commonly images, into a variety of distinct categories. Networks are composed of one or more convolutional layers [
7]. The convolutional layer, pooling layer, and fully connected layer make up the foundational architecture of CNN. These layers are placed one on top of the other. CNNs are renowned for their ability to produce higher levels of accuracy while completing difficult problems. The IoT devices are resource-constrained, but CNN necessarily requires a large compute power to deliver high accuracy [
8]. It is essential to develop a balanced CNN model that works effectively and gives the best accuracy while using the least amount of processing power. This necessitates reducing the size of the CNN model such that it can be implemented on an IoT device, which takes both prior knowledge and extensive experimentation. In [
9], the author built twelve different CNN architectures, each with a unique set of features, with the goal of automatically detecting the directionality of textures, whereas in [
10], the authors developed a universal pipeline with the goal of boosting the accuracy of CNN-based approaches through the use of computer graphics, and they validated it using data from many different disciplines and datasets.
Deep belief network (DBN)—The DBNs were developed as a means of resolving issues that arise during the training of traditional neural networks in deep layered networks. These issues include slow learning, becoming stuck in local minima as a result of poor parameter selection, and the requirement of a large number of training datasets. DBN is made up of multiple layers of neural networks, which are also referred to as “Boltzmann Machines” [
11]. Hidden neurons, visible neurons, and layers that create an output layer are all components of the DBN framework, which is an intellectual model. The DBN is sufficiently trained to be able to recognize the presence of potential threats within the network based on the features that are retrieved.
Recurrent neural network (RNN)—RNNs are a specific kind of artificial neural network that processes data in sequential order. For ordinal or temporal issues, such as translation services, natural language processing (NLP), voice recognition, and image captioning, these deep learning methods are frequently utilized. RNNs, in the same manner as feedforward neural networks, make use of training data to learn. They are differentiated by their “memory”, which allows them to use information gathered from previous inputs to modify the input and output of the system at the current time. In contrast to typical deep neural networks, which operate under the assumption that inputs and outputs are not reliant on one another, recurrent neural networks include outputs that are determined by the elements that came before them in the sequence [
12]. Because of this, RNN is a great option for evaluating the traffic on IoT networks for anomaly identification [
13].
Figure 3 depicts the general architecture of an RNN.
Long short-term memory (LSTM)—This well-known RNN design was initially proposed by Sepp Hochreiter and Juergen Schmidhuber as a solution to the issue of vanishing gradients. The name “gate” refers to each of these three components of an LSTM cell. The initial component is known as the Forget gate, the second component is called the Input gate, and the third and final component is called the Output gate. The Forget gate is responsible for deciding what information should be discarded and what should be preserved. LSTMs were developed with the express goal of avoiding the long-term dependency issue [
15].
Gated recurrent unit (GRU)—There is a type of RNN known as the GRU, and in some circumstances, it is superior to long-term and short-term memory. The GRU is more efficient and requires less memory than LSTM. Additionally, GRUs solve the problem of vanishing gradients, which is a problem that traditional recurrent neural networks face while trying to update the weights of the network. The update gate and the reset gate are the two gates that are utilized by GRU to overcome challenges. These gates determine what information is permitted to come through to the output and can be educated to retain information from further back in the process [
16].
During the process of training the DL model, we need to make adjustments to the weights of each epoch and strive to achieve the lowest possible loss. An optimization algorithm modifies model parameters such as weights to reduce loss and enhance accuracy. The stochastic gradient descent is one such effective DL training approach. For a scenario with a large number of training examples, it is both simple and quick to apply. However, it was challenging to parallelize this procedure with a graphical process unit (GPU) since it requires a number of different manual tuning schemes to achieve the correct parameters, and the process itself is primarily sequential. Thus, metaheuristic optimization algorithms can be used to find ideal weights for the DL models. Metaheuristic algorithms draw their inspiration from nature and can address optimization issues by emulating either natural or physical processes. In the fields of research, technology, engineering, and business, these optimization methods have been used to find solutions to a wide variety of optimization issues [
17]. These algorithms are growing in popularity since the algorithms rely on straightforward ideas and are straightforward to put into practice [
18]; additionally, for large-scale, complicated, nonlinear search and optimization issues, traditional optimization techniques may be insufficient and inappropriate [
19]. The evaluation of general purposed metaheuristic approaches is broken down into a total of nine distinct categories. These categories include “biology-based, physics-based, swarm-based, social-based, music-based, chemistry-based, sport-based, mathematics-based, and hybrid” [
20]. In swarm-based techniques, the population is initially generated at random and then evolved throughout successive generations. The best individuals from one generation are always mixed with those from the following generation to create the next generation of individuals, which is one of the strengths of these approaches. This makes it possible to achieve optimal population levels throughout several generations. These techniques attempt to imitate the social behavior of groups of animals. The particle swarm optimization (PSO) algorithm, which was developed by Eberhart [
21], is currently the most often used. The various swarm-based optimization algorithms that we came across in our literature review are summarized in
Table 1 below.
Bird flocking, animal herd, bacterial development, and schools of fish are all inspirations for swarm intelligence algorithms.
Figure 4 displays the overall framework that all metaheuristic algorithms adhere to.
However, there are a few old or standard algorithms that are still being utilized, despite the fact that they are not very accurate. Thus, two novel metaheuristic optimization algorithms were proposed in order to resolve the issue of low accuracy. The first optimization algorithm that is proposed is termed the seagull adapted elephant herding optimization algorithm (SAEHO) [
31]. This method takes the logic of two independent metaheuristic optimization algorithms, namely, the elephant herding optimization (EHO) [
32] and seagull optimization algorithm (SOA) [
33], and combines them into a single solution. On the other hand, the second algorithm is referred to as self-upgraded cat and mouse optimization (SU-CMO) [
34], and it is an improvement on the already-existing optimization technique known as cat- and mouse-based optimizer (CMBO) [
30]. Consequently, both algorithms are put to use in the hybrid deep learning classifier model in order to achieve optimal weighting. The two approaches that were implemented in the current paper are as follows:
In the first approach, two different types of deep learning models, namely, CNN and DBN, are combined to create a hybrid classifier. This classifier is then used in the problem of identifying intrusions into the IoT system. In order to perfect the model, the proposed SAEHO is utilized for its training.
In the second approach, SU-CMO is put to use in order to train the hybrid classifier deep learning model that is made up of GRU and Bi-LSTM.
Following that, both of the suggested frameworks are contrasted with conventional and state of the art techniques. Despite the fact that many distinct intrusion detection systems are now being developed, there are still issues that might be improved. One of the issues that has to be properly managed is a higher rate of false errors, i.e., false positive and false negative. The objective of current research work is to develop a metaheuristic optimization algorithm for improving DL models for IoT attack classification.
Section 2 of this paper provides a literature review of the related work.
Section 3 briefly describes two novel metaheuristic optimization algorithms. The IoT security attack classification framework is described in
Section 4. The results are depicted in
Section 5, and the conclusion and future work are provided in
Section 6.
2. Related Work
A number of researchers have contributed to the categorization of IoT security threats and have also proposed optimization techniques [
35,
36].
A nature-inspired meta-heuristic optimization technique that they term the whale optimization algorithm (WOA) was proposed by Mirjalili et. al. [
18] in 2016. The algorithm imitates whales’ predatory behavior to solve the target problem. It comprises three operators to imitate the behavior of humpback whales, i.e., searching for prey, surrounding prey, and using bubble nets to catch their food. The WOA is evaluated using a total of 39 problems, 29 of which are mathematical optimization challenges and 6 of which are structural design issues. The findings showed that the WOA method has the potential to be highly successful in addressing real situations that include unknown search spaces. Slow convergence can be seen in WOA.
Seyedali et. al. [
27] introduced a new metaheuristic optimization algorithm termed the grey wolf optimization (GWO), which was inspired by a group of grey wolves. It is a simulation of the social structure and hunting process of grey wolves found in nature. For the purpose of mimicking the hierarchy of leadership, four distinct varieties of grey wolves, designated as “alpha, beta, delta, and omega”, are used. In the mathematical model, an alpha wolf is considered to be the optimal solution. The beta wolf, delta wolf and omega wolves are, respectively, regarded to be second best, third best, and the rest of the candidate solutions. However, this algorithm is susceptible to low precision, poor local searching, and slow convergence.
Raja et. al. [
28] proposed a nature-inspired metaheuristic optimization method, named the sea lion optimization (SLnO) algorithm, that mimics the hunting techniques used by sea lions in their natural environment. Moreover, this research work was carried out on twenty-three different mathematical optimization problems in order to investigate the exploration phase, the exploitation phase, and the recommended method’s convergence behavior. The algorithm avoids local optimum solutions and converges quickly over rounds.
Seyedali et. al. [
37] presented an ant lion optimizer (ALO), which is a relatively new meta-heuristic algorithm that mathematically describes the natural interaction of ants and antlions. The method was designed to address optimization issues that consider random ant walks, building traps, entrapping ants in traps, collecting prey, and re-building traps. Ants’ unpredictable walks assist to avoid becoming stuck in local optima; moreover, few tunable parameters make it easy to use.
In 2021, Laith et. al. [
38] introduced the Aquila Optimizer (AO), a population-based optimization approach inspired by the activities of aquilas (a bird) in the wild as they are in the process of catching their meal. The algorithm is modeled after four different procedures, i.e., choosing the available search space, probing inside the confines of a divergent search, using resources within the confines of a convergent search space, and walk and catch the prey. Optimization begins the improvement methods by producing a random set of potential solutions (population). Through repetition, AO search techniques, i.e., “expanded exploration, narrowed exploration, expanded exploitation, and narrowed exploitation”, find near-optimal or best-obtained solutions. In complicated optimization situations, an AO may have low convergence or fall in sub-optimal areas.
In 2018, Arora et. al. [
29] made an effort to find a solution to the global optimization problem by putting up a novel optimization algorithm named the butterfly optimization algorithm (BOA). The algorithm that was presented is a simulation of the butterfly’s foraging and mating behavior. The performance of the method was analyzed and compared with that of several other metaheuristic algorithms after it was tested and verified on a set of thirty benchmark test functions. While BOA is applied to various fields of optimization problems due to its performance. it also has some cons, such as a smaller population and an increased chance of being stuck in a “local optimum”.
In [
39], Read et. al. provided an overview study that analyzes the false data injection (FDI) assault in the smart grid according to attack model, attack impact, and attack target. A number of important assessment criteria were utilized in relation to the needs of the power systems and the smart grid security in estimating the efficacy of the multiple cyberattack models that were found in the literature that was surveyed. This was implemented in order to determine the associated challenges with these models. In addition to this, several future research topics for FDI assaults are suggested as a means of enhancing the smart grid cybersecurity architecture.
In [
40], Khosravani et. al. outlined the process of developing a knowledge-based system to enhance the conventional injection molding procedure through the incorporation of relevant information. The proposed system is to assist industries involved in injection molding by using cutting-edge technologies to enhance production and raise levels of efficiency. The study is concentrated on “simulation and generative design” for the optimization of production, “additive manufacturing”, and “virtual reality”. Research and experimentation have the potential to develop and enrich the present cost-effective and appropriate solutions that are the outcome of the knowledge-based system that is being proposed.
4. IoT Security Attacks Detection Framework
Both of the above-mentioned proposed algorithms, SAEHO and SU-CMO, are utilized in the process of detecting security threats in an IoT environment. Each of the algorithms have a separate IoT framework designed for it. Two IoT security threat detection frameworks, based on deep learning models, were developed. Consequently, we termed it a hybrid deep learning classifier since it utilizes two distinct deep learning models. The framework-I (
Figure 7) combines two DL models for classifying attacks; specifically, it consists of a CNN and a DBN. However, the framework-II (
Figure 8) uses the GRU and Bi-LSTM DL models. The framework-I optimized the CNN and DBN deep learning models using the proposed SAEHO optimization algorithm. In contrast, the framework-II used the proposed SU-CMO optimization method to optimize the Bi-LSTM deep learning model. Preprocessing, feature extraction, hybrid deep learning classifiers optimized by the suggested optimization method, and finally, attack classification outcomes are the standard operating procedures for both IoT frameworks. In the preprocessing phase, data are cleaned and prepared for analysis by removing outliers or null values and removing redundant or extraneous information. Following that, the statistical and higher-order statistical features are extracted, as part of the feature extraction process.
Statistical feature extraction: In this section, the statistical features known as the standard deviation (SD), mean, harmonic mean (HM), median, pitch, peak, pitch angle, root mean square (RMS), and amplitude are calculated.
Mean: The concept of “mean value” refers to the operation in which the total sum of all values is divided by the total number of all values.
Median: The term “median” refers to the operation of sorting the value that falls in the center of a dataset in descending order. If there are two values that fall in the center of the dataset, then the median of the data is determined by taking the mean of those two values.
SD: It is a measurement of the values that make up a set of dispersion or the degree of variance. A smaller standard deviation suggests that the values in question tend to be closer to the mean value, whereas a bigger SD shows that the values in question span across a wider range.
Mode: The term “mode” refers to the “value that appears most frequently in the dataset.” It is one of the most common “central tendency measures” (i.e., to use with “nominal data”), which are characterized by entirely arbitrary classifications. The value in a set of observations that occurs most frequently is referred to as the mode of those observations.
HM: To calculate it, the total number of observations is divided by the reciprocal of each number in the series. The value is obtained. One sort of numerical average is referred to as the HM.
RMS: It is also known as the quadratic mean, and it is utilized in both the field of mathematics and the field of statistics. It is the square root of the total sum of squares for each observation’s data.
Peak amplitude: The “peak amplitude” of a “sinusoidal waveform” is the point at which the level of a wave’s largest positive or negative deviation from zero reference level is reached.
Pitch Angle: The angle formed between the longitudinal axis and the horizontal plane is denoted by this term.
Higher-order statistical feature extraction: Higher order statistical characteristics include the limitations listed below, such as “kurtosis, skewness, entropy, percentile, energy, mean, frequency”, etc.
Skewness: It is a measurement of how much of a departure from the normal distribution the distribution of a random variable exhibits.
Kurtosis: The term “Kurtosis” refers to a statistical metric that determines the degree to which the tails of a distribution deviate from the tails of a normal distribution.
Entropy: It is a term that refers to the “average degree of information, surprise, or uncertainty that is inherent in the probable consequence of the variable according to the information theory”.
Percentile: A score that falls at or below a given percentage of the total scores in a frequency distribution is referred to as a percentile. In statistics, a percentile refers to a score that falls at or below a particular percentage.
Variance: It is defined as the “mean squared disparity” between each and every data point and the calculated center of distribution for the population as a whole.
Moment: In probability theory and statistics, it refers to the point at which the probability distribution takes into account the arbitrary variable. It is the average value of an integer power that was defined, and its difference from the mean of the arbitrary variable is its power.
MI Feature: It is defined as the computation of information that is sent back and forth between two collections of random variables.
Symmetric Uncertainty: It calculates the features by basing them on the evaluated symmetric uncertainty correlation metrics between the class and the feature.
6. Conclusions and Future Works
The optimization approach is an essential part of the training process for the machine learning models. Even though there are a number of effective search optimization techniques and algorithms, there is yet no optimum algorithm that is certain to work for every situation. Consequently, new optimization algorithms are regularly presented, or some efficient adjustments to existing algorithms have been accomplished. In this paper, two different metaheuristic optimization algorithms, namely, SAEHO and SU-CMO, were presented in order to train the DL models that were used to categorize several types of security threats that may occur in an IoT environment. SAEHO is a hybrid algorithm that combines two distinct algorithms, namely, EHO and SOA, while on the other hand, SU-CMO is the improved version of existing CMBO optimization algorithm. Consequently, the CNN and DBN DL models in framework-I are trained with the SAEHO optimization algorithm, while the Bi-LSTM model in framework-II is trained with the SU-CMO optimization method. Both the frameworks are validated using two separate datasets, i.e., dataset 1 and dataset 2. Four distinct performance metrics, i.e., accuracy, rand index, f-measure, and MCC, are utilized to demonstrate the validity of both frameworks. The proposed frameworks provide better outcomes when contrasted with either conventional or state-of-the-art approaches. The proposed optimization strategy has the potential to improve the accuracy of a wide variety of DL models when employed in the appropriate context. However, there is no assurance that the best option is among the solutions considered. As a result, the metaheuristic solution to an optimization issue must be viewed as good and not optimal. A significant portion of this paper is focused on the discussion for maintaining the safety of the IoT network, so the suggested optimization algorithm may be assessed in terms of time complexity and search space as a future work reference.