1. Introduction
The Internet of Things (IoT), as a new communication/control platform, connects everything/everybody to the Internet, where the behavior of connected nodes in the IoT can be monitored to or their operation can be controlled by a (group of) server(s) [
1,
2,
3]. IoT-connected nodes can range from simple sensors in various environments to critical components in different applications and communicate with each other on a predefined (Internet-based) network [
4]. With the global activation of the IoT, there will be a fundamental change in various aspects of human life, including industry, culture, education, trade, transportation, etc. Therefore, the IoT is one of the most important technologies being developed in the world today [
5,
6,
7,
8].
In general, there are several challenges facing the IoT in order for it to be practically implemented and move from the research and development to the productivity stage [
9,
10,
11,
12,
13]. These challenges include but are not limited to: (a) large-scale: in any communication networks, there are different parameters that may lead to a decrease in the network performance, e.g., scalability, device heterogeneity, variety of network interactions, and network mobility rate [
14]; (b) lack of infrastructure: in the IoT, the connected devices need to discover each other through a certain infrastructure [
15]; and (c) commercialization: the International Telecommunication Union (ITU) has described the IoT-commercialization process as an important challenge and reported it as follows: “Many centers such as standard development organizations, research centers, service providers, network operators need to work together and each change many of its own rules and regulations” [
16].
However, the most important and critical challenge that the IoT is facing, and will always face, is security. Security itself can be defined/considered in different aspects such as the kind of security requirements and threat models, the studied layer in the network, and the type of cryptographic primitives that can be used [
17,
18,
19,
20,
21,
22,
23,
24,
25]. The same as most consumer technologies, IoT has not been considered with security in mind in the first place, leading security to be emerging as an important obstacle in the adoption of different networks and services.
Among all security mechanisms, intrusion detection [
26,
27,
28] is one of the most important security mechanisms, which can be studied in all of four IoT architecture layers as depicted in
Figure 1 [
29]. The network intrusion detection system (NIDS) is known as a promising solution to detect the intrusion of malicious behaviors in IoT networks. The NIDS is mainly provided by the network layer in the IoT, which plays as a spine in order to connect various IoT devices. The adversarial threats in the network layer can be classified in four main categories including probing, denial of service (DoS), user to root (U2R), and remote to local (R2L) [
30,
31,
32,
33,
34,
35,
36]. Another categorization for the NIDS is based on the scheme’s ability in detecting the intrusion, based on which it is divided into two main categories named signature-based intrusion detection and anomaly-based intrusion detection [
37,
38,
39]. A more general classification can include the host intrusion detection (HID) and network intrusion detection (NID) [
40,
41,
42], which have their own advantages and drawbacks.
1.1. Rekated Works
Through recent years, many schemes have been introduced for NID to better classify different attacks/threats in the network’s normal traffic. The traditional detection schemes have often employed statistical approaches, for example, distance measuring [
43], the Hidden Markov Model (HMM) [
44], Bayes theory [
45], cluster analysis [
46], and signal processing [
47]; however, these methods have gradually given way to machine learning-based approaches. Thaseen et al. [
48] introduced an approach using the support vector machine (SVM) and principal component analysis (PCA). They could improve the accuracy and training-time cost for some attacks in the network, e.g., U2R and R2L, by automatically tuning the optimization parameters and optimizing SVM’s kernels and parameters.
There are other well-known machine learning-based methods for detecting the attacks in IoT networks, including the multi-layer perceptron neural network (MLP NN), Random Forest (RF), and Naive Bayes (NB) [
49,
50,
51,
52,
53], though, it has been shown that the performances of MLP, RF, NB, and other traditional machine learning-based approaches are not sufficient, especially when the number of traffic data is big, mostly because of their shallow learning essence. As a result of the growth in using deep learning in various ranges of applications, many efforts have been also done to propose an efficient and accurate NIDS based on deep learning.
Yin et al. [
54] have introduced a NIDS using a recurrent neural network (RNN). In comparison with former machine learning-based approaches, their scheme could obtain better classification accuracy and a higher detection rate. He et al. [
55] have introduced a NIDS using the long short-term memory (LSTM) and multimodal deep auto-encoder for obtaining better accuracy. Garg et al. [
56] have introduced an IoT NIDS based on the grey wolf optimizer (GWO) and the deep convolutional neural network (DCNN). The authors in [
56] have shown that their proposed model could achieve a higher detection rate with minimized features on three network intrusion datasets. Xu et al. [
57] proposed employing a log-cosh conditional variational auto-encoder (CVAE) in order to catch the complicated propagation of the observed data and produce new data with pre-specified classes, leading to the creation of a more efficient way to produce various intrusion data for disbalanced classes.
Deep learning-based approaches could have improved the accuracy of the NIDS, though there were still some important features that needed to be improved, including achieving a higher detection rate and decreasing the computational cost. One important thing to do on these scores, which has been rarely considered in the literature, is to optimally train the fully connected neural network in the deep architecture [
58,
59,
60,
61,
62,
63]. Due to the fact that better training the fully connected neural network leads to better classification accuracy, the used classifier can be designed in a more lightweight manner (in an equal detection rate), and thus less data will be required to train the network.
1.2. Paper Contributions
According to the drawbacks of the mentioned NID models, the most important contributions of this paper are summarized as follows:
We improve a novel meta-heuristic algorithm named NSBPSO, in which new concepts such as employed bees, onlooker bees, and the multi-parent crossover of bees are introduced to better the exploitation and exploration abilities of the PSO algorithm.
We optimally improve the performance of the DCNN as our NIDS by updating its optimization parameters using the NSBPSO algorithm.
We evaluate the performance of the proposed evolutionary deep learning-based IDS by comparing it with other IoT intrusion detectors in the literature using the UNSW-NB15 [
64] and Bot-IoT [
65] datasets.
1.3. Paper Organization
The rest of this paper is organized as follows:
Section 2 elaborates the proposed NSBPSO algorithm.
Section 3 explains the proposed NIDS for the IoT, including the used datasets and the way of training the intrusion detector (DCNN) by the proposed NSBPSO algorithm.
Section 4 evaluates the performance of the proposed evolutionary deep learning-based IDS by comparing it with other IoT intrusion detectors in the literature using the UNSW-NB15 [
64] and Bot-IoT [
65] datasets, and, finally, we conclude the paper in
Section 5.
2. The Proposed NSBPSO Algorithm
Particle Swarm Optimization (PSO) is one of the most important meta-heuristic algorithms that was introduced by Kennedy and Eberhart in 1995. This algorithm was inspired by the social behavior of animals such as fish and birds. PSO is suitable for discrete and continuous problems and has performed very well in various engineering optimization problems.
In the PSO algorithm, solutions are mapped to particles, and each particle is assigned an initial velocity. The fitness function is used to calculate the next velocity of the particles in the search space. Particle velocity consists of three main movements: (a) the percentage of the previous movement, (b) the motion toward the best personal experience, and (c) the motion toward the best experience of other particles.
Figure 2 indicates an overview of particle velocity motions in the PSO algorithm. Equations (1) and (2) represent the velocity and position of the particles, respectively.
where
= the current velocity of particle in dimension,
,
= the new velocity of particle in dimension,
,
= the current position of particle in dimension,
,
= the new position of particle in dimension
;
= a random number between zero and
,
= a random number between zero and
,
= the inertial coefficient,
= the best personal experience of particles in dimension
, and
= the best global experience of particles in dimension
.
This paper shows that standard PSO has two main drawbacks: (I) insufficient ability to explore and exploit solutions, and (II) getting stuck in local minimums. PSO has no operator to make sudden changes, which leads to getting stuck in local minimums. The PSO algorithm improves its position by considering the best personal and global experience. If the initial populations are far from the best solution, PSO can rarely converge. Another weakness of PSO is that this algorithm is highly dependent on the distribution of initial particles in the search space. If a considerable number of particles are trapped in local minimums, PSO can slightly prevent particles from being trapped in local minimums. However, PSO converges faster if the particles change suddenly. In this paper, to improve the PSO algorithm, employed bees, onlooker bees, and the multi-parent crossover of bees are used to amplify exploitation and exploration. The proposed algorithm is called neighborhood search-based particle swarm optimization (NSBPSO).
In the proposed NSBPSO algorithm, by considering several particles as the employed bees (global bests), different parts of the search space can be examined simultaneously. Therefore, it helps the algorithm to avoid being trapped in the local minimums. In the artificial bee colony (ABC) algorithm, the onlooker bees are obtained by a neighborhood search around the employed bees. If the onlooker bees are more efficient than the employed bees, they will be replaced by the employed bees and the employed bees will be updated. In the proposed NSBPSO algorithm, after selecting the employed bees, a number of onlooker bees are sent to search around them. Updated employed bees are then compared to the global best, and the global best is updated. In NSBPSO, onlooker bees play the role of exploiting good solutions.
Figure 3 shows the example of the production of onlooker bees (a neighborhood search around employed bees).
In standard PSO, the particle diversity gradually decreases as the particles move towards the personal best and global best. In this paper, due to the exploratory nature of the crossover operator, a multi-parent crossover is proposed to achieve highly varied solutions. In this operator, instead of using two employed bees, all employed bees participate in the crossover to create new solutions. When we use several best particles (as employed bees) to produce the new solutions, the obtained child bears less similarity to its parent, meaning that the solutions are diverse in the search space. Therefore, the multi-parent crossover operator improves the algorithm exploration.
Figure 4 shows the example of the multi-parent crossover operator of the NSBPSO algorithm.
Therefore, Equation (1) is updated as follows and two new vectors are added to improve the PSO performance. Motion towards the best onlooker bee (from the neighborhood search operator) improves the algorithm’s exploitation. Motion to the best employed bee from the multi-parent crossover operator improves the algorithm’s exploration.
Figure 5 shows the flowchart of the proposed NSBPSO algorithm.
where
= a random number between zero and
,
= a random number between zero and
,
= the best onlooker bee from neighborhood search operator in dimension
, and
= the best employed bee from the multi-parent crossover operator in dimension
.
4. Simulation Results on the NID Datasets
In this section, the results of various hybrid deep architectures for intrusion detection in IoT systems are evaluated. The performance of the proposed NSBPSO algorithm is also evaluated in comparison with some widely-used and competitive metaheuristic algorithms, including the particle swarm optimization (PSO) algorithm, the artificial bee colony (ABC) algorithm, the iterated greedy algorithm (IG) [
66], the improved crow search algorithm (I-CSA) [
67], and the black widow optimization (BWO) algorithm [
68]. All algorithms have been coded in MATLAB, and the calibration parameters of the algorithms have been shown in
Table 2.
For validation, sensitivity, accuracy, and specificity metrics are used to compare the performance of the deep architectures. These criteria are derived from the confusion matrix (as demonstrated in
Figure 7) and can be calculated as Equations (5)–(7).
where,
= true positive,
= false negative,
= true negative,
= false positive.
Table 3 indicates the specificity, accuracy, and sensitivity of evolutionary deep learning models for intrusion detection in IoT systems. As can be seen, the NSBPSO-DCNN model indicates the highest ratios in accuracy, sensitivity, and specificity in training and testing datasets. NSBPSO-DCNN achieved 99.41% and 98.86% accuracy in the test and train datasets, respectively. NSBPSO-DCNN also achieved 99.86% and 99.03% sensitivity in the test and train datasets, respectively.
Figure 8 and
Figure 9 show the comparison of deep architectures in the training and validation datasets, respectively. According to
Figure 8 and
Figure 9, the rank of the architectures is: NSBPSO-DCNN, I-CSA-DCNN, IG -DCNN, BWO -DCNN, ABC-DCNN, PSO-DCNN, and Standard DCNN, respectively. The results of hybrid deep architectures in the test dataset show that the proposed architectures are well trained using meta-heuristic algorithms because the accuracy, specificity, and sensitivity of the different hybrid deep architectures in the test and train datasets are highly stable.
Table 4 shows the trends of the accuracy and runtime of the proposed architectures in different epochs. According to this table, the NSBPSO-DCNN architecture has achieved the highest accuracy in the shortest runtime. The accuracy of the NSBPSO-DCNN, I-CSA-DCNN, IG-DCNN, BWO-DCNN, ABC-DCNN, PSO-DCNN, and DCNN architectures is 99.41%, 98.52%, 98.09%, 97.43%, 96.74%, 96.50%, and 94.21%, respectively.
Figure 10 compares the total “Runtime” of the architectures. As can be seen, the runtime of NSBPSO-DCNN is less than other architectures. As mentioned in
Section 2, to develop the proposed NSBPSO algorithm, employed bees and onlooker bees are used to improve the exploitation of the PSO algorithm. Multi-parent crossover is also proposed to improve the exploration of the algorithm. Hence, NSBPSO has provided the best results compared to other algorithms.
Table 5 indicates the value of the mean square error (MSE) for the proposed architectures. The proposed NSBPSO-DCNN model has a lower MSE than other methods. In the proposed NSBPSO, by considering several particles as the employed bees (global bests), different parts of the search space can be examined simultaneously. Therefore, it helps the algorithm to avoid being trapped in the local minimums. Therefore, the proposed NSBPSO-DCNN model has been useful for intrusion detection in IoT systems.
Figure 11 and
Figure 12 show the convergence curve of the NSBPSO-DCNN and other architectures. The NSBPSO-DCNN architecture is close to its lowest MSE at epoch = 80. However, other architectures do not have good accuracy at epoch = 80. Subsequently, with an increasing epoch, NSBPSO-DCNN has achieved high stability and high convergence speed. As shown in
Figure 12a, the convergence curve of the proposed NSBPSO-DCNN architecture is faster than the other architectures. The reason for NSBPSO’s superiority is the existence of two new operators. (a) The motion towards the best onlooker bee (from neighborhood search operator) improves the algorithm’s exploitation, and (b) the motion towards the best employed bee from the multi-parent crossover operator improves the algorithm’s exploration.
Figure 12b shows the details of the convergence curves.
A nonparametric statistical test called Wilcoxon has been used to show the significant differences between all models. The Wilcoxon test is applied to measure the similarity of two dependent degree-scale samples. Derrac et al. [
69] provided the full details of this nonparametric statistical test. All architectures have been implemented with 25 runs for intrusion detection in IoT systems. The mean values of the fitness function were normalized and then the Wilcoxon test results were obtained using SPSS software.
Table 6 shows the R
+, R
−, and
p-value for all NSBPSO-DCNN pairwise comparisons. As shown in
Table 6, NSBPSO-DCNN shows an improvement versus I-CSA-DCNN, IG-DCNN, and BWO-DCNN with a level of significance
α = 0.05, and versus ABC-DCNN, PSO-DCNN, and Standard DCNN with a level of significance
α = 0.01. According to the results, NSBPSO-DCNN has a strong performance compared to the other algorithms.