Enhancing Internet of Things Network Security Using Hybrid CNN and XGBoost Model Tuned via Modified Reptile Search Algorithm

Salb, Mohamed; Jovanovic, Luka; Bacanin, Nebojsa; Antonijevic, Milos; Zivkovic, Miodrag; Budimirovic, Nebojsa; Abualigah, Laith

doi:10.3390/app132312687

Open AccessArticle

Enhancing Internet of Things Network Security Using Hybrid CNN and XGBoost Model Tuned via Modified Reptile Search Algorithm

by

Mohamed Salb

^1,†

,

Luka Jovanovic

^1,†

,

Nebojsa Bacanin

^1,*,†

,

Milos Antonijevic

^1,†

,

Miodrag Zivkovic

^1,†

,

Nebojsa Budimirovic

^1,†

and

Laith Abualigah

^{2,3,4,5,6,7,8,†}

¹

Department of Informatics and Computing, Singidunum University, Danijelova 32, 11000 Belgrade, Serbia

²

Computer Science Department, Al al-Bayt University, Mafraq 25113, Jordan

³

Department of Electrical and Computer Engineering, Lebanese American University, Byblos 13-5053, Lebanon

⁴

Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Amman 19328, Jordan

⁵

MEU Research Unit, Middle East University, Amman 11831, Jordan

⁶

Applied Science Research Center, Applied Science Private University, Amman 11931, Jordan

⁷

School of Computer Sciences, Universiti Sains Malaysia, Gelugor 11800, Malaysia

⁸

School of Engineering and Technology, Sunway University Malaysia, Petaling Jaya 27500, Malaysia

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2023, 13(23), 12687; https://doi.org/10.3390/app132312687

Submission received: 25 October 2023 / Revised: 14 November 2023 / Accepted: 17 November 2023 / Published: 27 November 2023

Download

Browse Figures

Versions Notes

Abstract

:

This paper addresses the critical security challenges in the internet of things (IoT) landscape by implementing an innovative solution that combines convolutional neural networks (CNNs) for feature extraction and the XGBoost model for intrusion detection. By customizing the reptile search algorithm for hyperparameter optimization, the methodology provides a resilient defense against emerging threats in IoT security. By applying the introduced algorithm to hyperparameter optimization, better-performing models are constructed capable of efficiently handling intrusion detection. Two experiments are carried out to evaluate the introduced technique. The first experiment tackles detection through binary classification. The second experiment handles the task by specifically identifying the type of intrusion through multi-class classification. A publicly accessible real-world dataset has been utilized for experimentation and several contemporary algorithms have been subjected to a comparative analysis. The introduced algorithm constructed models with the best performance in both cases. The outcomes have been meticulously statistically evaluated and the best-performing model has been analyzed using Shapley additive explanations to determine feature importance for model decisions.

Keywords:

internet of things; feature reduction; convolutional neural networks; XGBoost; reptile search algorithm

1. Introduction

The internet of things (IoT) has ushered in a new era of connectivity, transforming various industries by enabling faster sensor and data access. This enhanced networking capability has been pivotal in facilitating real-time monitoring, which is indispensable for further process optimization across sectors. Real-time data acquisition has significantly improved healthcare by enabling timely patient monitoring and emergency notification systems [1]. In healthcare, IoT devices range from blood pressure and heart rate monitors to advanced devices capable of monitoring specialized implants. The internet of medical things (IoMT) has even led to the creation of automated systems used to analyze health statistics [2]. The industrial application of the IoT includes asset management, predictive maintenance, and manufacturing process control. Overall, the IoT’s integration into various domains is revolutionizing the way we live and work, providing more efficient, cost-effective, and intelligent solutions [3].

In the manufacturing sector, the IoT has streamlined operations, ensuring efficient production and quality control. The integration of IoT devices has revolutionized the way machines and systems communicate and interact with each other, forming the backbone of smart factories. Through IoT, manufacturers can monitor and control their production processes in real time, with sensors embedded in machines, equipment, and products collecting data on various parameters such as temperature, pressure, vibration, and performance metrics. This data can be transmitted and analyzed instantly, providing valuable insights into operations. IoT also helps manufacturers achieve greater visibility and transparency across the entire supply chain, optimizing inventory management, reducing stockouts, and improving order fulfillment. Technological advancements continue to benefit manufacturers by opening new revenue streams, improving industrial safety, and reducing operational costs, reshaping the way manufacturers operate in the digital revolution [4].

The widespread adoption of the IoT is not without challenges. Devices often grapple with limited battery lifetimes, the need to function in remote locations, and demanding transceiver operations. Among these challenges, security stands out as the most daunting. While a device running out of battery is an observable setback, a data breach, often clandestine, can wreak havoc. IoT devices must communicate with each other and central systems, requiring complex transceiver operations. This can lead to high energy consumption and potential interference with other devices, complicating the network [5]. Security is indeed one of the most critical challenges in the IoT ecosystem. The interconnected nature of these devices means that a breach in one device can potentially compromise an entire network. Issues such as weak authentication, lack of encryption, and insecure interfaces can lead to unauthorized access and data theft [6]. Radio frequency (RF) attacks have become a prevalent attack vector within the IoT ecosystem. Attackers can exploit vulnerabilities in wireless communication protocols to intercept, modify, or disrupt the RF signals between devices. This can lead to data leakage, device malfunction, or even taking control of the devices [7]. Message Queuing Telemetry Transport (MQTT) is one of the standard application layer protocols emerging in the IoT ecosystem [8]. As an emerging technology, it is a popular target for malicious actors seeking new vulnerabilities [9].

To counter these threats, several network security measures, such as blocklists and firewalls, have been implemented [10,11]. However, artificial intelligence (AI) algorithms aim to address security challenges without the constraints of predefined rules or continuous manual intervention [12]. A pivotal factor influencing AI’s performance is the judicious selection of hyperparameters that steer the algorithm. With the burgeoning complexity of emerging algorithms, the conventional trial-and-error approach for hyperparameter tuning is becoming increasingly untenable. This optimization challenge can be equated to NP-hard problems, which are notoriously difficult to resolve using discrete methods [13]. A potential respite from this optimization conundrum lies in metaheuristic algorithms. While not always possible to pinpoint the absolute optimal solution, their iterative nature enhances the probability of identifying a near-optimal solution. Often, in practical scenarios, a ”good enough” solution is more valuable than an elusive perfect one. Additionally, it is important to explore and address emerging challenges that accompany developments in the field.

In the scope of this research, a refined methodology designed to confront the security challenges inherent in the IoT is introduced. The introduced approach incorporates convolutional neural networks (CNNs) to effectively manage feature sizes within IoT MQTT dataset and employs extreme gradient boosting (XGBoost) for intrusion identification and detection. A distinctive aspect of the introduced methodology lies in the integration of a modified version of a well-established algorithm, specifically tailored for hyperparameter optimization in the unique context of IoT security [14]. This integration represents a notable combination of concepts and techniques, embodying both typical combination novelty and incremental novelty.

Our main scientific contributions to this work are:

Proposing a CNN-centric approach for feature reduction in IoT datasets;
Utilizing XGBoost for the classification of intrusion events;
Introducing a modified algorithm specifically designed for optimization;
Implementing the proposed methodology on real-world data, addressing a pressing real-world challenge.

This paper’s organization has been meticulously designed to lead the reader through a logical development of concepts and advances. After this brief introduction, Section 2 provides an overview of the pertinent literature and presents a broad overview of the approaches. The suggested approaches are described in Section 3, going into depth on the unique methodologies and algorithms that are the foundation of our contribution. The experimental design is described in Section 4, together with the datasets, metaheuristics, parameters, and metrics used in our study. Our experimental findings are provided in Section 5, followed by analysis and conclusions drawn from them. A thorough conclusion that summarizes the major findings and considers the wider ramifications of our work concludes Section 6, followed by proposals for future works.

2. Related Works

The history of intrusion detection systems (IDSs) and intrusion prevention systems (IPSs) can be traced back to an academic paper written in 1986 [15]. The Stanford Research Institute developed the Intrusion Detection Expert System (IDES) using statistical anomaly detection, signatures, and profiles to detect malicious network behaviors. In the early 2000s, IDSs became a security best practice, with few organizations adopting IPSs due to concerns about blocking harmless traffic. The focus was on detecting exploits rather than vulnerabilities. The latter part of 2005 saw the growth of IPS adoption, with vendors creating signatures for vulnerabilities rather than individual exploits [16]. The capacity of IPSs increased, allowing for more network monitoring.

Next-generation intrusion prevention systems (NGIPSs), which include capabilities like application and user control, were developed during this time period, marking a significant turning point. Sandboxing and emulation features were added to fulfill the requirement for defense against zero-day malware. By 2016, most businesses had deployed next-generation firewalls (NGFWs), which contain IDS/IPS functionality. High-fidelity machine learning is the current focus for tackling threat detection and file analysis [17].

The groundbreaking academic publication “An Intrusion-Detection Model” by Dorothy E. Denning, which inspired the creation of IDES, is one example of earlier studies that addresses intrusion detection in networks. To identify hostile network behaviors, the Stanford Research Institute used statistical anomaly detection, signatures, and profiles. Significant turning points in the development of IPS technology, such as the switch to NGIPSs and NGFWs, have been reached [18].

Multi object optimization problems.

2.1. Convolutional Neural Networks

CNNs are a specialized subclass of artificial neural networks (ANNs) that are particularly well-suited for analyzing visual data. CNNs are designed to automatically and adaptively learn spatial hierarchies of features. This is particularly beneficial for tasks like image recognition, object detection, and even medical image analysis. The concept of residual learning, as introduced by Kaiming He et al., further enhances the capabilities of CNNs by allowing them to benefit from deeper architectures without the risk of overfitting or vanishing gradients [19].

As opposed to ANNs, CNNs employ local connectivity by linking each neuron to a localized region of the input space. This is in stark contrast to traditional ANNs, where each neuron is connected to all neurons in the preceding and following layers. Yann LeCun’s paper emphasizes that this local connectivity is crucial for the efficient recognition of localized features in images [20]. Furthermore, they use shared parameters across different regions of the input, which significantly reduces the number of trainable parameters. This is in contrast to traditional ANNs, where each weight is unique, leading to a much larger number of parameters and higher computational costs. CNNs are inherently designed to recognize the same feature regardless of its location in the input space. This is a crucial advantage over traditional ANNs, which lack this form of spatial invariance. Notably, they often employ deeper architectures, which are made computationally feasible through techniques like residual learning, as discussed in Kaiming He et al.’s paper.

The Inception architecture, introduced by Christian Szegedy et al., is another example of a deep yet computationally efficient network [21]. CNNs are designed to be computationally efficient, particularly when dealing with high-dimensional data. The architecture leverages local connectivity and parameter sharing to reduce computational requirements. The concept of residual learning, as discussed in the paper by Kaiming He et al., allows CNNs to be trained more efficiently, even when the network is very deep.

Notably, several unique architectural elements are associated with CNNs, and these include filters, kernels, and pooling layers. Filters and kernels use learnable weight matrices that are crucial for feature extraction. They slide or convolve across the input image to produce feature maps. Yann LeCun’s paper highlights the effectiveness of gradient-based learning techniques in training these filters [22]. Pooling layers serve to reduce the spatial dimensions of the input, thereby decreasing computational complexity and increasing the network’s tolerance to variations in the input. They are particularly useful in making the network robust to overfitting.

CNNs can be effectively combined with other types of neural networks, like recurrent neural networks (RNNs), for sequential data processing tasks such as video analysis and natural language processing. Additionally, CNNs can be integrated with traditional machine learning algorithms, like support vector machines (SVMs), for tasks like classification, thereby creating a hybrid model that leverages the strengths of both methodologies. In summary, CNNs offer a robust, adaptable, and computationally efficient approach to a wide range of machine learning tasks. Their unique architecture, as validated by seminal research papers, makes them highly effective for tasks involving spatial hierarchies and structured grid data.

2.2. Extreme Gradient Boosting

XGBoost is an optimized distributed gradient boosting approach designed to be highly efficient and flexible. It has gained immense popularity in machine learning competitions and is widely regarded as the “go-to” algorithm for structured data. XGBoost has been optimized for both computational speed and model performance, making it highly desirable for real-world applications [23]. There are several advantages of decision-tree-based techniques [24].

One of the most significant advantages of decision trees is their ease of interpretation. They can be visualized, and the decision-making process can be easily understood, even by non-experts. Decision trees are computationally inexpensive to build, evaluate, and interpret compared to algorithms like support vector machines (SVMs) [25] or ANNs. Unlike other algorithms that require extensive pre-processing, decision trees can handle missing values without imputation, making them more robust. Decision trees can also capture complex non-linear relationships in the data, which linear models may not capture effectively. Further, this approach can be used for both classification and regression tasks, making it very versatile.

Gini impurity is a metric used to quantify the disorder or impurity of a set of items. It is crucial for the “criterion” parameter in the decision tree algorithm. Lower Gini impurity values indicate more “pure” nodes. The Gini impurity is used to decide the optimal feature to split on at each node in the tree.

G i n i (t) = 1 - \sum_{i = 1}^{C} p_{i}^{2}

(1)

Further advantages of using XGBoost are that of ensemble learning [26]. Ensemble methods, particularly boosting algorithms like XGBoost, are less susceptible to the overfitting problem compared to single estimators due to their ability to optimize on the error. By combining several models, ensemble methods can average out biases and reduce the variance, thus minimizing the risk of overfitting. Ensemble methods often achieve higher predictive accuracy than individual models. XGBoost, in particular, has been shown to outperform deep learning models in certain types of data sets, especially when the data are tabular.

The objective function optimized by XGBoost includes both a loss term and a regularization term, making it adaptable to different problems:

Obj (θ) = \sum_{i = 1}^{n} l (y_{i}, {\hat{y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})

(2)

2.3. Metaheuristic Optimization

Metaheuristic optimization algorithms have gained significant attention in the field of computational intelligence for their ability to solve complex optimization problems that are often NP-hard. Traditional optimization algorithms, such as gradient-based methods, often get stuck in local optima and are not well suited for solving problems with large, complex search spaces. In contrast, metaheuristics offer several advantages [27].

Additionally, addressing the challenges of multi-objective optimization problems has been a focal point for many works, leading to the development of various multi-objective evolutionary algorithms [28]. However, a common hurdle in these algorithms is the delicate balance required between diversity and convergence. This balance critically impacts the quality of solutions derived from the algorithms [29].

Designed to explore the entire solution space, metaheuristics often find a near-optimal solution within reasonable time periods. They are problem-independent, meaning they can be applied to a wide range of optimization problems without requiring problem-specific modifications. Metaheuristics are highly scalable and can handle problems with a large number of variables and constraints. They are less sensitive to the initial conditions and can adapt to changes in the problem environment. Metaheuristics can find near-optimal solutions to NP-hard problems in polynomial time, which is a significant advantage over traditional methods that often fail to find feasible solutions within a reasonable time frame.

Algorithms often draw inspiration from various natural phenomena, social behaviors, and physical processes. Some notable examples include the genetic algorithms (GAs) [30], inspired by the process of natural selection and genetics; particle swarm optimization (PSO) [31], based on the social behavior of birds flocking or fish schooling; the ant colony optimization (ACO) [32] algorithm, which mimics the foraging behavior of ants in finding the shortest path; and the firefly algorithm (FA) [33], which draws inspiration from courting rituals of fireflies. Additional recent examples include the salp swarm optimizer (SSA) [34], the whale optimization algorithm (WOA) [35], and the COLSHADE [36] optimization algorithm.

Metaheuristics are a popular approach for researchers used to improve hyperparameter selections. Many examples exist in the literature, with some interesting examples originating from medical applications [37]. Further applications include time-series forecasting [38,39] and computer security [40,41,42]. Hybridization techniques have also shown great promise when applied to metastatic algorithms, often producing algorithms that demonstrate performance improvements on given tasks [43].

3. Methods

The Methods section serves as the backbone of this research, offering a comprehensive and rigorous examination of the algorithms under study. Specifically, this section delves into the original reptile search algorithm (RSA) [44] and a proposed modified version. The objective is to elucidate the mathematical foundations, operational mechanics, and strategies that underpin these algorithms. Moreover, a critical evaluation of their strengths, weaknesses, and potential for further development is presented. The section aims to provide the reader with a deep understanding of the algorithms, thereby setting the stage for the experimental results and discussions that follow.

3.1. Original RSA

The RSA, like many optimization metaheuristics, employs global as well as local search to effectively locate promising areas within the search space. This algorithm draws inspiration from nature, mathematically modeling the hunting strategies of crocodiles. As it is a gradient-free population-based method, it can effectively tackle complex challenges.

During the initialization process of the RSA, a population of agents is generated based on stochastic techniques. The population is then evaluated and the best solution is considered near-optimal. The population P can be represented as:

P = [\begin{matrix} x_{1, 1} & \dots & x_{1, n} \\ ⋮ & ⋱ & ⋮ \\ x_{N, 1} & \dots & x_{N, n} \end{matrix}]

(3)

where N denotes P size, n the dimensionality of a given challenge, and x a potential solution. The set population is generated in accordance with:

x_{i, j} = r a n d \cdot (B_{l o w e r} - B_{u p p e r}) + B_{l o w e r}, j = 1, 2, \dots, n

(4)

where

B_{l o w e r}

and

B_{u p p e r}

define the lower and upper bounds of the search space, and

r a n d

is an arbitrary value.

Once a population is established, the algorithm can proceed with optimization. The utilized strategy is highly dependent on the number of remaining iterations of the optimization. For the exploration mechanism, two behaviors are distinctly simulated. The first one simulates crocodile high walking. The second strategy simulates belly walking. These are mathematically modeled as:

x_{i, j} = \{\begin{matrix} B_{j} (t) \cdot - (η_{i, j} (t)) \cdot β - R_{i, j} (t) \cdot r a n d, & t \leq \frac{T}{4} \\ B_{j} (t) \cdot x_{r_{1}, j} \cdot E X (t) \cdot r a n d, & t \leq 2 \frac{T}{4} and t > \frac{T}{4} \end{matrix}

(5)

where, in this context,

B_{j} (t)

symbolizes the j-th component of a given best candidate. Randomness is introduced using the

r a n d

value selected from

[0, 1]

. The iterations are tracked using t and T, which denote the current and maximum iterations, and sensitivity is defined by

β

. Specialized values

η

, R, and

E S

are defined in accordance with the following:

η_{i, j} = B_{j} (t) - P D_{i, j}

(6)

R_{i, j} = \frac{B_{j} (t) - x_{r_{2}, j}}{B_{j} (t) + ϵ}

(7)

E S (t) = 2 \cdot r_{3} \cdot (1 - \frac{1}{T})

(8)

where the parameter

η

defines the hunter operator. The role of R is to reduce the search space, while

E S

defines the evolutionary sense. Random values are denoted as

r_{2}

and

r_{3}

, and

P D

describes the percentage difference between the current and best solution. A small value is also added by

ϵ

to avoid math errors.

Exploitation similarly employs two distinct hunting strategies, hunting cooperation and coordination. Which technique is utilized is highly dependent on the number of remaining iterations in the optimization.

x_{i, j} = \{\begin{matrix} B_{j} (t) \cdot P D_{i, j} (t) \cdot r a n d, & t \leq 3 \frac{T}{4} and t > 2 \frac{T}{4} \\ B_{j} (t) \cdot η_{i, j} \cdot ϵ - R_{i, j} \cdot r a n d & t \leq T and t > 3 \frac{T}{4} \end{matrix}

(9)

where, in this context,

B_{j} (t)

symbolizes the j-th component of a given best candidate. Randomness is introduced using the

r a n d

value selected from

[0, 1]

. the iterations are tracked using t and T, which denote the current and maximum iterations, and sensitivity is defined by

β

. Specialized values

η

, R, and

E S

are defined previously.

3.2. Modified RSA

While the original RSA demonstrates admirable performance, certain deficiencies can be observed using evaluations using standard CEC [45] evaluation functions. As a relatively novel algorithm, the RSA has significant potential for improvement through hybridization. This work attempts to address some of the observed issues associated with the original algorithm through hybridization. Mechanisms inspired by the genetic algorithm (GA) [30] are introduced to formulate the genetically inspired RSA (GIRSA).

The introduction is activated following each iteration. An arbitrary agent is selected and spliced by the best attained solution resulting in a combined solution. The parameters are uniformly combined. The crossover is governed by the control parameter

p c

. Empirically, the optimal value for this parameter has been determined to be

p c = 0.1

.

An additional modification incorporates parameter mutation. Once a mutation is triggered, an arbitrary value is selected from a given parameter constraint. One-half of the selected value is subtracted or added to said parameter. The decision of whether addition or subtraction is used is determined by the mutation direction

m d

parameter. Once again, the value of

m d

is empirically determined as

m d = 0.1

.

Once a new solution is generated, the worst-performing solution is removed from the solution and replaced by the new agent. The performance of the new solution is not conducted until the subsequent iteration of the algorithm. Taking this approach maintains the computational complexity of the original algorithm. Finally, the pseudocode for the described algorithm is presented in Algorithm 1.

Algorithm 1 Pseudocode of the introduced GIRSA.

Formulate a population P
while t < T do
Evaluate P based on objective function
Update $E S$ value
for Solution X in P do
Update agent locations applying RSA search
Generate new solution $N S$ utilizing genitally inspired mechanism
Mutate $N S$ parameters
Replace worst solution with $N S$
end for
end while
return Best attained solution in P

It should be emphasized that once a new agent is generated and mutated, it is not evaluated until the next iteration of the optimization. This way the complexity of the metahersutic remains consistent with the original version of the algorithm.

4. Experimental Setup

To evaluate the potential of the introduced approach for both detecting and identifying intrusion within IoT networks, a public real-world IoT MQTT dataset [46] (https://www.kaggle.com/datasets/cnrieiit/mqttset accessed on 24 October 2023) is utilized. However, due to the extensive computational demands of model optimization, the reduced version of this dataset is utilized. The dataset is comprised of a total of 6 classes: legitimate traffic, SlowITe, Bruteforce, Malformed data, looking, and DoS attacks. A total of 34 features are present in the dataset. Further details can be accessed in the original work that introduced the dataset [46]. An additional dataset is formulated from the existing dataset for the needs of thread detection. Classes are separated into legitimate data and anomalous activities.

Due to the large number of features present in the dataset that can result in important features blending into noise, a CNN is utilized for reduction. The reduction network consists of a convolution layer of 128 neurons total and a kernel size of 2 and utilizes a rectified linear unit (relu) activation function. This is followed by an output layer of 6 fully connected neurons that are utilized as an output. The network is trained through 10 epochs to acceptable levels of accuracy (

82.7 %

). Following training, the network output is used as an input for the XGBoost algorithm to further improve accuracy. A graphical representation of the introduced framework for feature reduction and classification can be seen in Figure 1.

Once the features have been reduced and an intermediate dataset formulated, XGBoost hyperparameters are subjected to optimization with the use of metaheuristic algorithms. XGBoost parameters chosen for optimization and associated constraints include learning rate

[0.9, 0.1]

, minimum child weight

[10, 1]

, subsample

[1, 0]

, colsample

[10, 1]

, max depth

[10, 1]

, and gamma

[0.0, 0.8]

. These parameters were chosen due to their high influence on XGBoost model performance, and their respective constraints have been empirically determined.

Several contemporary optimization metaheuristics have been included in a comparative analysis in order to determine an optimal approach. These include the original RSA [44] and GA [30] as the source of inspiration for the introduced GIRSA. Alongside these, established well-performing optimizers are also included, initialized with suggested parameters suggested in the works that originally proposed them. The included algorithms include the SSA [34], FA [33], PSO [31], WOA [35], and the COLSHADE [36] algorithm.

To provide a comprehensive assessment of the constructed models in comparison to those constructed by other contemporary optimizers a battery of standard classification metrics, including accuracy, precision, recall, and f1-score [47], is utilized, with accuracy being the objective function chosen to guide the optimization. Further metrics include Cohen’s kappa [48], described in Equation (10), which gives a more complete easement in cases when unbalanced datasets are utilized.

κ = \frac{p_{o} - p_{e}}{1 - p_{e}}

(10)

in which

p_{o}

and

p_{e}

represent observed and expected classification values.

It is important to note that the exact execution times of experimentation can vary depending on the specific hardware simulations the experimentation is carried out on. This work utilized a PC using an Intel i7 CPU and Nvidia 4070 GPU with 32 GB of available ram. Simulations are carried out using Python 3 with the supporting TensorFlow, Pandas, and Numpy libraries. Visualizations are handled using matplotlib and seaborn. Metaheuristics are independently implemented for the needs of this research.

5. Experimental Outcomes

The following section describes the outcomes of two independent experiments. The first experiment involves optimizing XGboost models for anomalous traffic detection. The second experiment involves the exact classification of the type of malicious activity. Following the presentation of the results, the outcomes are meticulously statistically validated. Finally, the best-performing model is analyzed to determine feature importance, providing an advantageous starting point for future feature reduction research.

5.1. Binary Classification Experiment

Binary classification outcomes based on the objective function evaluations over 30 independent runs are shown in terms of best, worst, as well as mean and median in Table 1. Alongside these outcomes, the standard deviation and variance are provided in order to asses algorithm stability.

Additional evaluations using the indicator function outcomes over 30 runs are provided in Table 2 with relative stability indicators.

An observation can already be made from the presented results. Models struggle with binary classification problems. Despite the poor performance shown by all models, the introduced algorithm attained the best outcome in comparison to other optimization algorithms even when limited performance is observed. Algorithm stability can be observed in Figure 2.

An obvious advantage of the introduced algorithm is the overall better performance in comparison to other algorithms. However, a high rate of stability of the original RSA algorithm and FA needs to be noted despite inferior results in both indicator and objective evaluations. Convergence rates for each metaheuristic are tracked and plots can be observed in Figure 3.

Improvements to the convergence rate can be noted, with the introduced algorithm no longer dwelling in less promising areas, and displaying a better exploration in compassion to the original, as well as competing metaheuristics. A comparison of all constructed models in detail is presented in Table 3.

As can be observed from the attained outcomes, implemented models struggle with the challenging task of handling intrusion detection through binary classification within an MQTT IoT system. However, the introduced optimizer demonstrated the best outcomes when constructing an XGBoost binary classification model with a clear potential for hyperparameter optimization. Differentiating legitimate and malicious data can be challenging with a reduced feature space and number of classes, especially with observed minority classes in the dataset.

To enforce experiment repeatability, the hyperparameters selected for the respective best-performing model are provided in Table 4.

5.2. Multi-Class Classification Experiment

Experimentation with multi-class classification is carried out under identical test conditions as the binary classification experiment. However, a total of six classes are present in the dataset. Objective function outcomes are shown in Table 5. Indicator function outcomes are demonstrated in Table 6.

By observing the outcome, a clear superiority of the introduced algorithm can be observed, with the GIRSA attaining the best outcomes in both indicator and objective function metrics. It is also important to note the admirable stability of the WOA despite this technique not demonstrating the optimal outcomes. Algorithm stability comparisons can also be observed in Figure 4, where all tested metaheuristics are graphically compared in terms of objective and indicator functions.

A significant stability improvement in terms of stability as opposed to the original RSA can be observed for the introduced GIRSA. Further improvements in algorithm convergence can be observed in Figure 5.

As can be observed in the convergence graphs, the introduced algorithm once again demonstrates an improvement in the exploration of the search space, locating a better solution and avoiding local solutions in favor of a global solution demonstrating better outcomes.

Detailed comparisons of the best-performing models generated by each metaheuristic are shown in Table 7.

Various models show differing quality of performance facing different classification challenges. This is to be expected as per the NFL theorem. Notable outcomes are shown by the GIRSA algorithm, which has demonstrated a clear dominance in terms of optimal outcomes. However, notable results are also shown by the FA and the SSA algorithm. The confusion matrix of the best performing model for multi-class classification can be observed in Figure 6.

As can be observed, the algorithm struggles to identify slowITe and flood attacks often confusing slowing for legitimate data and flood attacks for denial-of-service attacks. However, flood and denial-of-service attacks are fairly similar in practice, so detection is still within acceptable margins. Additionally, malformed data can often be classified as DOS attacks. It can be deduced that the introduced approach performs significantly better when tackling the problem of anomalous traffic detection as a multi-class classification challenge, rather than a simple binary challenge. This is likely due to the confusion between flood and slowITe attacks with legitimate data. Nevertheless, the introduced method shows great potential for real-world implementation. Furthermore, the introduced optimization metaheuristic demonstrates improvements over existing techniques as well as the original base algorithm.

To encourage experimental repeatability, hyperparameter selections made by each algorithm are shown in Table 8.

5.3. Outcome Statistical Validation

Modern optimization research demands optimization results be meticulously statistically validated in order to establish the statistical significance of the demonstrated improvements. The preferred approach for validating outcomes is the use of parametric tests. However, the safe utilization of these tests needs to be established first. Three criteria need to be met: independence, normality, and homoscedasticity [49]. The first condition is fulfilled by utilizing an independent random seed for each execution. The normality condition is assessed using the outcomes of the Shapiro–Wilk test shown in Table 9 as well as through the visual observations of objective function outcome distributions shown in Figure 7.

The resultant p-values were all less than

0.05

, suggesting that the null hypothesis (H0) may be rejected. As a result, we may infer that the outcomes produced in all three simulations do not follow a normal distribution. These outcomes are further reinforced in Figure 7.

Parametric tests were not applicable since the normality assumption was not fulfilled. As a result, the non-parametric Wilcoxon signed-rank [50] test was used in the following stage. This test can be applied to the same data series consisting of the best values achieved in every run of each metaheuristic.

The created algorithm is used as the control algorithm in this test, and the Wilcoxon signed-rank test was run on the specified data series. The calculated p-values in all three observed cases were less than

0.05

. Considering the significance level of

a l p h a = 0.1

, these findings show that the introduced algorithm outperformed all competing approaches statistically significantly. Table 10 shows the overall results of the Wilcoxon signed-rank test.

5.4. Best Model SHAP Interpretation

The attained best-performing model has been subjected to analysis through the use of Shapley additive explanations (SHAP) [51] to determine feature impacts on model decisions. The feature reduction CNN models as well as the best constructed XGBoost model have been interpreted using the kernel and tree explainer techniques. SHAP utilized a game-theory-based approach to determine the impact each feature poses on model decisions. The analysis outcomes are graphically presented in Figure 8 for the CNN model and in Figure 9 for the XGBoost model.

From the feature importance analysis of the CNN feature reduction model, it can be deduced that the tcp.time_delta presents the highest influence on model decisions, closely followed by mqtt.conack.flags.reserved. The third-highest importance is shown by the mqtt.msgtype feature. The remaining features that have a significant influence are mqtt.kalive and mqtt.retain. Following these features, a significant reduction in importance can be observed. Accordingly, a set of six features have been maintained as outputs for the CNN and inputs for the XGBoost model. The importance of these synthetic features is shown in Figure 9.

Since the synthetic features have no direct interpretation of the real-world dataset, they are simply assigned a number. Synthetic feature 3 has the highest impact on the model decision, followed by features 1 and 5. Finally, a small impact can be observed in feature 0. Features 4 and 2 do not seem to notably influence model decisions, suggesting that further feature reduction may be performed while maintaining computational complexity.

6. Conclusions and Future Work

This work presents an approach for tackling the increasingly pressing challenge of security in IoT systems relying on the MQTT server. With the rising popularity of IoT networks, this challenge needs to be addressed in an effective and adaptive way. The potential of an approach based on AI is explored for anomalous activity detection as well as attack-type detection. Due to the large feature space of MQTT transactions, a technique based on CNNs is applied to reduce the feature space and prevent important features from fading into feature noise. To improve detection, the outputs of the CNN reduction mechanism are combined with XGBoost for classification. However, due to the considerable reliance of classification performance on hyperparameter selections, metaheuristic algorithms are used to improve model performance through hyperparameter tuning. Additionally, a modified version of the relatively recently introduced RSA is introduced to overcome some of the limitations of the original approach. The proposed algorithm draws inspiration from the GA and is therefore dubbed the GIRSA.

The introduced approach is assessed on a real-world dataset. The feature space is reduced using a CNN. This reduced feature space is used by the XGBoost algorithm to classify MQTT traffic. Two experiments are carried out, one handling simple anomalous detection, and the other identifying the specific type of attack. The introduced approach demonstrates potential. While struggling with simple binary classification, multi-class classification demonstrates decent potential with an accuracy rate of

87.94 %

of the best-performing model. The observed improvements have been meticulously statistically validated, and the best-performing model has been subjected to SHAP analysis in order to determine feature importance.

The utilization of data-driven methods for detection and identification offers several advantages. Data-driven methods are capable of adapting to emerging threats without the need for explicit programming and administrator interaction. Furthermore, reduced maintenance can often offset the costs associated with initial development and integration.

Reflecting on the validity of the proposed approach, it is essential to consider both internal and external validity aspects. Internally, the utilization of a CNN for feature space reduction and the subsequent integration with XGBoost for classification raises questions about the potential impact of hyperparameter choices on the model’s performance. We addressed this concern through the application of metaheuristic algorithms for hyperparameter tuning, enhancing the robustness and reliability of our classification results. Externally, the generalizability of our findings is a crucial consideration. While our experiments were conducted on a real-world dataset, the specific characteristics of the dataset and the nature of MQTT transactions may limit the broader applicability of our approach to diverse IoT environments. Future work should include a broader range of datasets, ensuring that the effectiveness of our algorithm extends to various IoT scenarios.

As with any study, some limitations exist with this work as well. Only a limited set of optimization algorithms is explored. Due to limited computational resources, smaller populations are used for optimizations and only a limited number of optimization iterations is conducted. In future works, we hope to expand population sizes and periods of optimization to attain a better understanding of the full capabilities of each algorithm.

Future work will center on enhancing the proposed methodology by incorporating hybridization techniques and assessing alternative machine learning methodologies, including the exploration of deep CNNs and emerging versions of recurrent networks. Furthermore, there will be an investigation into the applicability of the introduced metaheuristic in addressing diverse and critical optimization challenges across various domains of research, including medicine, computer security, and waste management.

Author Contributions

Conceptualization, M.S., N.B. (Nebojsa Bacanin), and L.J.; methodology, N.B. (Nebojsa Budimirovic), N.B. (Nebojsa Bacanin), and L.A.; software, N.B. (Nebojsa Bacanin) and M.Z.; validation, M.Z. and N.B. (Nebojsa Bacanin); formal analysis, M.Z.; investigation, N.B. (Nebojsa Budimirovic), N.B. (Nebojsa Bacanin), M.Z., and M.S; resources, N.B. (Nebojsa Budimirovic), M.Z., and M.A; data curation, M.Z., N.B. (Nebojsa Budimirovic), and N.B. (Nebojsa Bacanin); writing—original draft preparation, M.Z., M.S., N.B. (Nebojsa Budimirovic), and M.A.; writing—review and editing, M.A., M.Z., and N.B. (Nebojsa Budimirovic); visualization, N.B. (Nebojsa Bacanin), M.Z., and M.A.; supervision, N.B. (Nebojsa Budimirovic) and L.A.; project administration, M.Z. and L.A.; funding acquisition, L.A. and N.B. (Nebojsa Bacanin) All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Complete data sources are publicly available at https://www.kaggle.com/datasets/cnrieiit/mqttset (accessed on 1 June 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

Ghubaish, A.; Salman, T.; Zolanvari, M.; Unal, D.; Al-Ali, A.; Jain, R. Recent Advances in the Internet of Medical Things (IoMT) Systems Security. IEEE Internet Things J. 2021, 8, 8707–8718. [Google Scholar] [CrossRef]
Gupta, D.; Kayode, O.; Bhatt, S.; Gupta, M.; Tosun, A.S. Hierarchical Federated Learning based Anomaly Detection using Digital Twins for Smart Healthcare. In Proceedings of the 2021 IEEE 7th International Conference on Collaboration and Internet Computing (CIC), Atlanta, GA, USA, 13–15 December 2021. [Google Scholar]
Turcu, C.; Turcu, C. Improving the quality of healthcare through Internet of Things. In Proceedings of the ICT Management for Global Competitiveness and Economic Growth in Emerging Economies (ICTM), Wroclaw, Poland, 21–23 October 2019. [Google Scholar]
Valtanen, K.; Backman, J.; Yrjölä, S. Blockchain-Powered Value Creation in the 5G and Smart Grid Use Cases. IEEE Access 2019, 7, 25690–25707. [Google Scholar] [CrossRef]
Okuhara, H.; Elnaqib, A.; Dazzi, M.; Palestri, P.; Benatti, S.; Benini, L.; Rossi, D. A Fully-Integrated 5mW, 0.8Gbps Energy-Efficient Chip-to-Chip Data Link for Ultra-Low-Power IoT End-Nodes in 65-nm CMOS. arXiv 2021, arXiv:2109.01961. [Google Scholar]
Luo, Z.; Wang, W.; Qu, J.; Jiang, T.; Zhang, Q. ShieldScatter: Improving IoT Security with Backscatter Assistance. arXiv 2018, arXiv:1810.07058. [Google Scholar]
Gupta, P.; Dedeoglu, V.; Najeebullah, K.; Kanhere, S.S.; Jurdak, R. Energy-aware Demand Selection and Allocation for Real-time IoT Data Trading. arXiv 2020, arXiv:2002.02074. [Google Scholar]
Azzedin, F.; Alhazmi, T. Secure data distribution architecture in IoT using MQTT. Appl. Sci. 2023, 13, 2515. [Google Scholar] [CrossRef]
Hintaw, A.J.; Manickam, S.; Aboalmaaly, M.F.; Karuppayah, S. MQTT vulnerabilities, attack vectors and solutions in the internet of things (IoT). IETE J. Res. 2023, 69, 3368–3397. [Google Scholar] [CrossRef]
Kodys, M.; Lu, Z.; Fok, K.W.; Thing, V.L.L. Intrusion Detection in Internet of Things using Convolutional Neural Networks. In Proceedings of the 2021 18th International Conference on Privacy, Security and Trust (PST), Auckland, New Zealand, 13–15 December 2021; pp. 1–10. [Google Scholar] [CrossRef]
Ayumi, V.; Rere, L.M.R.; Fanany, M.I.; Arymurthy, A.M. Optimization of Convolutional Neural Network using Microcanonical Annealing Algorithm. In Proceedings of the 2016 International Conference on Advanced Computer Science and Information Systems (ICACSIS), Malang, Indonesia, 15–16 October 2016. [Google Scholar]
Li, J.; Zhao, Z.; Li, R.; Zhang, H. AI-based Two-Stage Intrusion Detection for Software Defined IoT Networks. arXiv 2018, arXiv:1806.02566. [Google Scholar] [CrossRef]
Xiao, X.; Yan, M.; Basodi, S.; Ji, C.; Pan, Y. Efficient Hyperparameter Optimization in Deep Learning Using a Variable Length Genetic Algorithm. arXiv 2020, arXiv:2006.12703. [Google Scholar]
Guo, Z.; Cao, Y. SA-CNN: Application to text categorization issues using simulated annealing-based convolutional neural network optimization. In Proceedings of the 2022 6th International Conference on Electronic Information Technology and Computer Engineering, Xiamen, China, 21–23 October 2022. [Google Scholar] [CrossRef]
Denning, D.E. An Intrusion-Detection Model. IEEE Trans. Softw. Eng. 1987, 2, 222–232. [Google Scholar] [CrossRef]
Bace, R.; Mell, P. Intrusion Detection Systems; Technical Report, NIST Special Publication; NIST: Gaithersburg, MD, USA, 2001.
Anderson, J.P. Computer Security Threat Monitoring and Surveillance; Technical Report; James P. Anderson Co.: Fort Washington, PA, USA, 1980. [Google Scholar]
Rajib, N. Cisco Firepower Threat Defense (FTD): Configuration and Troubleshooting Best Practices for the Next-Generation Firewall (NGFW), Next-Generation Intrusion Prevention System (NGIPS), and Advanced Malware Protection (AMP); Cisco Press: Indianapolis, IN, USA, 2017. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
LeCun, Y.; Kavukcuoglu, K.; Farabet, C. Convolutional networks and applications in vision. In Proceedings of the 2010 IEEE International Symposium on Circuits and Systems, Paris, France, 30 May–2 June 2010; pp. 253–256. [Google Scholar]
Bentéjac, C.; Csörgő, A.; Martínez-Muñoz, G. A Comparative Analysis of XGBoost. arXiv 2019, arXiv:1911.01914. [Google Scholar]
Shwartz-Ziv, R.; Armon, A. Tabular Data: Deep Learning is Not All You Need. arXiv 2021, arXiv:2106.03253. [Google Scholar] [CrossRef]
Noble, W.S. What is a support vector machine? Nat. Biotechnol. 2006, 24, 1565–1567. [Google Scholar] [CrossRef] [PubMed]
Deng, X.; Li, M.; Deng, S.; Wang, L. Hybrid gene selection approach using XGBoost and multi-objective genetic algorithm for cancer classification. arXiv 2021, arXiv:2106.05841. [Google Scholar] [CrossRef] [PubMed]
Liu, D.; Perreault, V.; Hertz, A.; Lodi, A. A machine learning framework for neighbor generation in metaheuristic search. arXiv 2022, arXiv:2212.11451. [Google Scholar] [CrossRef]
Liang, Y.; He, F.; Zeng, X.; Luo, J. An improved loop subdivision to coordinate the smoothness and the number of faces via multi-objective optimization. Integr. Comput. Aided Eng. 2022, 29, 23–41. [Google Scholar] [CrossRef]
Gao, X.; He, F.; Zhang, S.; Luo, J.; Fan, B. A fast nondominated sorting-based MOEA with convergence and diversity adjusted adaptively. J. Supercomput. 2023, 7, 1–38. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S. Genetic algorithm. Evol. Algorithms Neural Netw. Theory Appl. 2019, 780, 43–55. [Google Scholar]
Marini, F.; Walczak, B. Particle swarm optimization (PSO). A tutorial. Chemom. Intell. Lab. Syst. 2015, 149, 153–165. [Google Scholar] [CrossRef]
Dorigo, M.; Birattari, M.; Stutzle, T. Ant colony optimization. IEEE Comput. Intell. Mag. 2006, 1, 28–39. [Google Scholar] [CrossRef]
Yang, X.S. Firefly algorithm, stochastic test functions and design optimisation. Int. J. Bio Inspired Comput. 2010, 2, 78–84. [Google Scholar] [CrossRef]
Mirjalili, S.; Gandomi, A.H.; Mirjalili, S.Z.; Saremi, S.; Faris, H.; Mirjalili, S.M. Salp Swarm Algorithm: A bio-inspired optimizer for engineering design problems. Adv. Eng. Softw. 2017, 114, 163–191. [Google Scholar] [CrossRef]
Mirjalili, S.; Lewis, A. The whale optimization algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
Gurrola-Ramos, J.; Hernàndez-Aguirre, A.; Dalmau-Cedeño, O. COLSHADE for real-world single-objective constrained optimization problems. In Proceedings of the 2020 IEEE Congress on Evolutionary Computation (CEC), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
Zivkovic, M.; Jovanovic, L.; Ivanovic, M.; Krdzic, A.; Bacanin, N.; Strumberger, I. Feature selection using modified sine cosine algorithm with COVID-19 dataset. In Evolutionary Computing and Mobile Sustainable Networks: Proceedings of ICECMSN 2021; Springer: Berlin/Heidelberg, Germany, 2022; pp. 15–31. [Google Scholar]
Bacanin, N.; Jovanovic, L.; Zivkovic, M.; Kandasamy, V.; Antonijevic, M.; Deveci, M.; Strumberger, I. Multivariate energy forecasting via metaheuristic tuned long-short term memory and gated recurrent unit neural networks. Inf. Sci. 2023, 642, 119122. [Google Scholar] [CrossRef]
Jovanovic, L.; Jovanovic, D.; Bacanin, N.; Jovancai Stakic, A.; Antonijevic, M.; Magd, H.; Thirumalaisamy, R.; Zivkovic, M. Multi-step crude oil price prediction based on lstm approach tuned by salp swarm algorithm with disputation operator. Sustainability 2022, 14, 14616. [Google Scholar] [CrossRef]
AlHosni, N.; Jovanovic, L.; Antonijevic, M.; Bukumira, M.; Zivkovic, M.; Strumberger, I.; Mani, J.P.; Bacanin, N. The xgboost model for network intrusion detection boosted by enhanced sine cosine algorithm. In Proceedings of the International Conference on Image Processing and Capsule Networks, Bangkok, Thailand, 20–21 May 2022; pp. 213–228. [Google Scholar]
Zivkovic, M.; Jovanovic, L.; Ivanovic, M.; Bacanin, N.; Strumberger, I.; Joseph, P.M. Xgboost hyperparameters tuning by fitness-dependent optimizer for network intrusion detection. In Proceedings of the Communication and Intelligent Systems (ICCIS 2021), Delhi, India, 18–19 December 2022; pp. 947–962. [Google Scholar]
Salb, M.; Jovanovic, L.; Zivkovic, M.; Tuba, E.; Elsadai, A.; Bacanin, N. Training logistic regression model by enhanced moth flame optimizer for spam email classification. In Computer Networks and Inventive Communication Technologies; Springer: Berlin/Heidelberg, Germany, 2022; pp. 753–768. [Google Scholar]
Jovanovic, L.; Jovanovic, G.; Perisic, M.; Alimpic, F.; Stanisic, S.; Bacanin, N.; Zivkovic, M.; Stojic, A. The explainable potential of coupling metaheuristics-optimized-xgboost and shap in revealing vocs’ environmental fate. Atmosphere 2023, 14, 109. [Google Scholar] [CrossRef]
Abualigah, L.; Abd Elaziz, M.; Sumari, P.; Geem, Z.W.; Gandomi, A.H. Reptile Search Algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Syst. Appl. 2022, 191, 116158. [Google Scholar] [CrossRef]
Jiang, S.; Yang, S.; Yao, X.; Tan, K.C.; Kaiser, M.; Krasnogor, N. Benchmark Functions for the cec’2018 Competition on Dynamic Multiobjective Optimization; Technical Report; Newcastle University: Newcastle upon Tyne, UK, 2018. [Google Scholar]
Vaccari, I.; Chiola, G.; Aiello, M.; Mongelli, M.; Cambiaso, E. MQTTset, a new dataset for machine learning techniques on MQTT. Sensors 2020, 20, 6578. [Google Scholar] [CrossRef]
Hossin, M.; Sulaiman, M.N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process. 2015, 5, 1. [Google Scholar]
Warrens, M.J. Five ways to look at Cohen’s kappa. J. Psychol. Psychother. 2015, 5, 4. [Google Scholar] [CrossRef]
LaTorre, A.; Molina, D.; Osaba, E.; Poyatos, J.; Del Ser, J.; Herrera, F. A prescription of methodological guidelines for comparing bio-inspired optimization algorithms. Swarm Evol. Comput. 2021, 67, 100973. [Google Scholar] [CrossRef]
Wilcoxon, F. Individual comparisons by ranking methods. In Breakthroughs in Statistics; Springer: Berlin/Heidelberg, Germany, 1992; pp. 196–202. [Google Scholar]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 4765–4774. [Google Scholar]

Figure 1. Flowchart of the introduced framework.

Figure 2. Binary classification evaluation outcome distribution.

Figure 3. Objective and indicator function convergence plots.

Figure 4. Objective and indicator function distribution plots for multi-class classification.

Figure 5. Objective and indicator function convergence plots for multi-class classification.

Figure 6. GIRSA XGBoost best performing multi-class classification model.

Figure 7. Objective function KDE plots for binary and multiclass classification.

Figure 8. SHAP analysis outcomes for best CNN model.

Figure 9. SHAP analysis outcomes for best XGBoost model.

Table 1. Objective function overall outcomes.

Method	Best	Worst	Mean	Median	Std	Var
CNN-XG-GIRSA	0.497019	0.497351	0.497172	0.497109	0.000130	1.69 × 10 $^{- 8}$
CNN-XG-RSA	0.497301	0.497633	0.497464	0.497472	0.000106	1.13 × 10 $^{- 8}$
CNN-XG-GA	0.497190	0.497694	0.497480	0.497563	0.000183	3.37 × 10 $^{- 8}$
CNN-XG-PSO	0.497109	0.497573	0.497369	0.497482	0.000184	3.38 × 10 $^{- 8}$
CNN-XG-FA	0.497140	0.497663	0.497524	0.497623	0.000195	3.78 × 10 $^{- 8}$
CNN-XG-SSA	0.497180	0.497532	0.497412	0.497502	0.000142	2.02 × 10 $^{- 8}$
CNN-XG-WOA	0.497170	0.497754	0.497414	0.497402	0.000221	4.89 × 10 $^{- 8}$
CNN-XG-COLSHADE	0.497049	0.497563	0.497305	0.497291	0.000195	3.82 × 10 $^{- 8}$

Table 2. Indicator function overall outcomes.

Method	Best	Worst	Mean	Median	Std	Var
CNN-XG-GIRSA	0.006218	0.005539	0.005908	0.006036	0.000265	7.02 × 10 $^{- 8}$
CNN-XG-RSA	0.005650	0.004983	0.005342	0.005262	0.000244	5.95 × 10 $^{- 8}$
CNN-XG-GA	0.005905	0.004858	0.005321	0.005241	0.000368	1.35 × 10 $^{- 7}$
CNN-XG-PSO	0.006036	0.005157	0.005576	0.005535	0.000348	1.21 × 10 $^{- 7}$
CNN-XG-FA	0.005973	0.004978	0.005245	0.005062	0.000368	1.36 × 10 $^{- 7}$
CNN-XG-SSA	0.005892	0.005151	0.005400	0.005199	0.000287	8.22 × 10 $^{- 8}$
CNN-XG-WOA	0.005913	0.004775	0.005479	0.005714	0.000464	2.15 × 10 $^{- 7}$
CNN-XG-COLSHADE	0.006158	0.005177	0.005632	0.005640	0.000394	1.55 × 10 $^{- 7}$

Table 3. Binary classification detailed metrics.

Method	Metric	Benign	Malicious	Accuracy	Macro Avg	Weighted Avg
CNN-XG-GIRSA	precision	0.501968	0.504592	0.502981	0.503280	0.503282
	recall	0.617113	0.389107	0.502981	0.503110	0.502981
	f1-score	0.553617	0.439388	0.502981	0.496502	0.496438
CNN-XG-RSA	precision	0.501744	0.504207	0.502699	0.502975	0.502976
	recall	0.614874	0.390777	0.502699	0.502826	0.502699
	f1-score	0.552578	0.440303	0.502699	0.496441	0.496378
CNN-XG-GA	precision	0.501790	0.504527	0.502810	0.503158	0.503160
	recall	0.630301	0.375606	0.502810	0.502954	0.502810
	f1-score	0.558751	0.430624	0.502810	0.494688	0.494616
CNN-XG-PSO	precision	0.501896	0.504470	0.502891	0.503183	0.503185
	recall	0.616649	0.389389	0.502891	0.503019	0.502891
	f1-score	0.553386	0.439521	0.502891	0.496454	0.496389
CNN-XG-FA	precision	0.501874	0.504419	0.502860	0.503147	0.503148
	recall	0.615580	0.390395	0.502860	0.502987	0.502860
	f1-score	0.552942	0.440142	0.502860	0.496542	0.496478
CNN-XG-SSA	precision	0.501842	0.504365	0.502820	0.503104	0.503105
	recall	0.615318	0.390576	0.502820	0.502947	0.502820
	f1-score	0.552817	0.440236	0.502820	0.496527	0.496463
CNN-XG-WOA	precision	0.501849	0.504383	0.502830	0.503116	0.503118
	recall	0.615802	0.390113	0.502830	0.502957	0.502830
	f1-score	0.553016	0.439949	0.502830	0.496483	0.496419
CNN-XG-COLSHADE	precision	0.501943	0.504556	0.502951	0.503249	0.503251
	recall	0.617294	0.388865	0.502951	0.503080	0.502951
	f1-score	0.553674	0.439220	0.502951	0.496447	0.496383
	support	49,589	49,701

Table 4. Control parameter selections made by each metaheuristic for respective best-performing binary classification models.

Method	Learning Rate	Min Child Weight	Subsample	Colsample	Max Depth	Gamma
CNN-XG-GIRSA	0.546941	1	1.000000	0.179374	10	0.180448
CNN-XG-RSA	0.687255	1	1.000000	0.010368	10	0.363584
CNN-XG-GA	0.520576	2	1.000000	0.238351	10	0.717737
CNN-XG-PSO	0.566877	5	1.000000	0.024852	10	0.800000
CNN-XG-FA	0.645137	1	1.000000	0.322124	10	0.800000
CNN-XG-SSA	0.621581	1	1.000000	0.015040	10	0.000000
CNN-XG-WOA	0.625413	10	1.000000	0.010000	10	0.800000
CNN-XG-COLSHADE	0.558799	4	1.000000	0.226316	10	0.166346

Table 5. Objective function outcome for multi-class classification.

Method	Best	Worst	Mean	Median	Std	Var
CNN-XG-GIRSA	0.140236	0.140880	0.140544	0.140487	0.000226	5.15 × 10 $^{- 8}$
CNN-XG-RSA	0.140679	0.142804	0.141478	0.140941	0.000824	6.78 × 10 $^{- 7}$
CNN-XG-GA	0.140769	0.141595	0.141164	0.141092	0.000288	8.27 × 10 $^{- 8}$
CNN-XG-PSO	0.140649	0.141283	0.140880	0.140830	0.000233	5.43 × 10 $^{- 8}$
CNN-XG-FA	0.140477	0.140659	0.140606	0.140659	0.000072	5.13 × 10 $^{- 9}$
CNN-XG-SSA	0.140900	0.141112	0.141005	0.141011	0.000074	5.50 × 10 $^{- 9}$
CNN-XG-WOA	0.140498	0.140659	0.140624	0.140659	0.000064	4.04 × 10 $^{- 9}$
CNN-XG-COLSHADE	0.140649	0.141112	0.140961	0.141051	0.000178	3.16 × 10 $^{- 8}$

Table 6. Indicator function outcome for multi-class classification.

Method	Best	Worst	Mean	Median	Std	Var
CNN-XG-GIRSA	0.755896	0.754757	0.755369	0.755456	0.000400	1.60 × 10 $^{- 7}$
CNN-XG-RSA	0.755252	0.751125	0.753567	0.754486	0.001533	2.35 × 10 $^{- 6}$
CNN-XG-GA	0.754908	0.753264	0.754114	0.754263	0.000577	3.33 × 10 $^{- 7}$
CNN-XG-PSO	0.755196	0.753870	0.754698	0.754655	0.000475	2.26 × 10 $^{- 7}$
CNN-XG-FA	0.755564	0.754996	0.755164	0.754996	0.000226	5.10 × 10 $^{- 8}$
CNN-XG-SSA	0.754713	0.754328	0.754451	0.754439	0.000154	2.38 × 10 $^{- 8}$
CNN-XG-WOA	0.755336	0.754996	0.755095	0.754996	0.000135	1.81 × 10 $^{- 8}$
CNN-XG-COLSHADE	0.755119	0.754416	0.754548	0.754510	0.000323	1.04 × 10 $^{- 7}$

Table 7. Detailed comparison between best-performing models for multi-class classification.

Method	Metric	Benign	DOS	Malformed	Bruteforce	SlowITe	Flood	Accuracy	Macro Avg	Weighted Avg
CNN-XG-GIRSA	precision	0.878533	0.867624	0.728448	0.616408	0.794778	0.967742	0.859764	0.808922	0.855635
	recall	0.950442	0.853392	0.154667	0.638934	0.551250	0.163043	0.859764	0.551955	0.859764
	f1-score	0.913074	0.860449	0.255159	0.627469	0.650984	0.279070	0.859764	0.597701	0.849664
CNN-XG-RSA	precision	0.879449	0.866883	0.662338	0.616214	0.787848	0.967742	0.859321	0.796746	0.853417
	recall	0.949254	0.854416	0.155583	0.630660	0.554147	0.163043	0.859321	0.551184	0.859321
	f1-score	0.913019	0.860604	0.251976	0.623353	0.650649	0.279070	0.859321	0.596445	0.849402
CNN-XG-GA	precision	0.878810	0.866539	0.656366	0.619365	0.795276	0.769231	0.859231	0.764264	0.852741
	recall	0.950281	0.854032	0.161989	0.618938	0.548714	0.163043	0.859231	0.549500	0.859231
	f1-score	0.913149	0.860240	0.259848	0.619152	0.649378	0.269058	0.859231	0.595138	0.849346
CNN-XG-PSO	precision	0.879615	0.865647	0.659119	0.621212	0.789446	0.967742	0.859351	0.797130	0.853170
	recall	0.948972	0.855567	0.159854	0.621926	0.552698	0.163043	0.859351	0.550343	0.859351
	f1-score	0.912978	0.860577	0.257304	0.621569	0.650192	0.279070	0.859351	0.596948	0.849456
CNN-XG-FA	precision	0.879857	0.866169	0.648559	0.622165	0.785898	0.967742	0.859523	0.795065	0.853091
	recall	0.948931	0.855618	0.178462	0.611584	0.553060	0.163043	0.859523	0.551783	0.859523
	f1-score	0.913089	0.860862	0.279904	0.616829	0.649235	0.279070	0.859523	0.599831	0.850136
CNN-XG-SSA	precision	0.879857	0.864857	0.632794	0.618892	0.798845	1.000000	0.859100	0.799208	0.852331
	recall	0.948790	0.856181	0.167175	0.608366	0.550887	0.163043	0.859100	0.549074	0.859100
	f1-score	0.913025	0.860497	0.264479	0.613584	0.652090	0.280374	0.859100	0.597341	0.849390
CNN-XG-WOA	precision	0.879515	0.865361	0.664269	0.623851	0.787410	0.967742	0.859502	0.798024	0.853237
	recall	0.949697	0.855772	0.169005	0.608366	0.552698	0.163043	0.859502	0.549764	0.859502
	f1-score	0.913260	0.860540	0.269455	0.616011	0.649500	0.279070	0.859502	0.597973	0.849721
CNN-XG-COLSHADE	precision	0.879789	0.864590	0.694805	0.619640	0.789802	0.967742	0.859351	0.802728	0.853961
	recall	0.948911	0.856028	0.163209	0.617789	0.549801	0.163043	0.859351	0.549797	0.859351
	f1-score	0.913043	0.860288	0.264328	0.618713	0.648302	0.279070	0.859351	0.597291	0.849429
	support	49,639	39,077	3278	4351	2761	184

Table 8. Control parameter selections made by each metaheuristic for respective best-performing multi-class classification models.

Method	Learning Rate	Min Child Weight	Subsample	Colsample	Max Depth	Gamma
CNN-XG-GIRSA	0.900000	6	0.484839	1.000000	6	0.123762
CNN-XG-RSA	0.894472	10	0.835341	1.000000	6	0.000000
CNN-XG-GA	0.682788	5	0.654867	1.000000	8	0.439790
CNN-XG-PSO	0.293758	10	0.890094	1.000000	10	0.348853
CNN-XG-FA	0.873667	10	1.000000	1.000000	10	0.800000
CNN-XG-SSA	0.900000	10	0.528148	0.761996	6	0.062693
CNN-XG-WOA	0.578212	10	0.863927	0.915171	10	0.800000
CNN-XG-COLSHADE	0.696242	9	0.589664	0.895006	8	0.381628

Table 9. Shapiro–Wilk normality tests.

Problem	GIRSA	RSA	GA	PSO	FA	SSA	WOA	COLSHADE
XG-Binary	0.032	0.041	0.046	0.018	0.021	0.033	0.035	0.040
XG-Multi	0.032	0.041	0.046	0.018	0.021	0.033	0.035	0.040

Table 10. Wilcoxon signed-rank test values exhibiting p-values for all three experiments (GIRSA vs. others.)

Problem/p-Values	RSA	GA	PSO	FA	SSA	WOA	COLSHADE
XG-Binary	0.042	0.003	0.018	0.037	0.04	0.027	0.031
XG-Multi	0.042	0.003	0.018	0.037	0.04	0.027	0.031

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Salb, M.; Jovanovic, L.; Bacanin, N.; Antonijevic, M.; Zivkovic, M.; Budimirovic, N.; Abualigah, L. Enhancing Internet of Things Network Security Using Hybrid CNN and XGBoost Model Tuned via Modified Reptile Search Algorithm. Appl. Sci. 2023, 13, 12687. https://doi.org/10.3390/app132312687

AMA Style

Salb M, Jovanovic L, Bacanin N, Antonijevic M, Zivkovic M, Budimirovic N, Abualigah L. Enhancing Internet of Things Network Security Using Hybrid CNN and XGBoost Model Tuned via Modified Reptile Search Algorithm. Applied Sciences. 2023; 13(23):12687. https://doi.org/10.3390/app132312687

Chicago/Turabian Style

Salb, Mohamed, Luka Jovanovic, Nebojsa Bacanin, Milos Antonijevic, Miodrag Zivkovic, Nebojsa Budimirovic, and Laith Abualigah. 2023. "Enhancing Internet of Things Network Security Using Hybrid CNN and XGBoost Model Tuned via Modified Reptile Search Algorithm" Applied Sciences 13, no. 23: 12687. https://doi.org/10.3390/app132312687

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Internet of Things Network Security Using Hybrid CNN and XGBoost Model Tuned via Modified Reptile Search Algorithm

Abstract

1. Introduction

2. Related Works

2.1. Convolutional Neural Networks

2.2. Extreme Gradient Boosting

2.3. Metaheuristic Optimization

3. Methods

3.1. Original RSA

3.2. Modified RSA

4. Experimental Setup

5. Experimental Outcomes

5.1. Binary Classification Experiment

5.2. Multi-Class Classification Experiment

5.3. Outcome Statistical Validation

5.4. Best Model SHAP Interpretation

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI