Hierarchical Classification of Botnet Using Lightweight CNN

Negera, Worku Gachena; Schwenker, Friedhelm; Feyisa, Degaga Wolde; Debelee, Taye Girma; Melaku, Henock Mulugeta

doi:10.3390/app14103966

Open AccessArticle

Hierarchical Classification of Botnet Using Lightweight CNN

by

Worku Gachena Negera

¹,

Friedhelm Schwenker

²

,

Degaga Wolde Feyisa

³

,

Taye Girma Debelee

^3,4,*

and

Henock Mulugeta Melaku

¹

Addis Ababa Institute of Technology, Addis Ababa University, Addis Ababa 445, Ethiopia

²

Institute of Neural Information, University of Ulm, 89069 Ulm, Germany

³

Ethiopian Artificial Intelligence Institute, Addis Ababa 40782, Ethiopia

⁴

Department of Electrical and Computer Engineering, Addis Ababa Science and Technology University, Addis Ababa 16417, Ethiopia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(10), 3966; https://doi.org/10.3390/app14103966

Submission received: 26 January 2024 / Revised: 19 April 2024 / Accepted: 24 April 2024 / Published: 7 May 2024

(This article belongs to the Special Issue Application of Machine Learning in Intelligent Infrastructures and Smart Cities)

Download

Browse Figures

Review Reports Versions Notes

Abstract

This paper addresses the persistent threat of botnet attacks on IoT devices, emphasizing their continued existence despite various conventional and deep learning methodologies developed for intrusion detection. Utilizing the Bot-IoT dataset, we propose a hierarchical CNN (HCNN) approach featuring three levels of classification. The HCNN approach, presented in this paper, consists of two networks: the non-hierarchical and the hierarchical network. The hierarchical network works by combining features obtained at a higher level with those of its descender. This combined information is subsequently fed into the following level to extract features for the descendant nodes. The overall network consists of 1790 parameters, with the hierarchical network introducing an additional 942 parameters to the existing backbone. The classification levels comprise a binary classification of normal vs attack in the first level, followed by 5 classes in the second level, and 11 classes in the third level. To assess the effectiveness of our proposed approach, we evaluate performance metrics such as Precision (P), Recall (R), F1 Score (F1), and Accuracy (Acc). Rigorous experiments are conducted to compare the performance of both the hierarchical and non-hierarchical models and existing state-of-the-art approaches, providing valuable insights into the efficiency of our proposed hierarchical CNN approach for addressing botnet attacks on IoT devices.

Keywords:

botnet; botnet attack; Bot-IoT; classification; CNN; hierarchical; lightweight

1. Introduction

The Internet of Things (IoT) is a dynamic ecosystem growing quickly in scale, connectivity, and variety of applications. This ubiquitous ecosystem is similar to other emerging technologies in that it permeates every aspect of our existence. Unfortunately, despite all the benefits that the Internet of Things brings, the increased attack surface that it creates has become even more significant. Vulnerabilities are made worse by the intrinsic limitations of devices, which frequently lack strong security mechanisms [1]. There has been a noticeable trend recently in which the IoT ecosystem is becoming more compromised by botnet threats [2]. An enormous network of infected IoT devices can have devastating consequences since it can grow quickly and launch powerful attacks. As a result, finding practical ways to strengthen IoT systems has emerged as a crucial and complex field of study. Among these, machine learning-based methods are particularly attractive since they allow for the early identification of possible assaults and the identification of unusual behavior [1].

In the realm of cybersecurity, an IoT botnet integrates itself into a network by executing a series of malicious operations on computing devices. These operations comprise three key steps: first, identifying susceptible devices through scanning; next, installing a compatible bot tailored to the vulnerable device’s architecture for propagation; and finally, launching an attack through a command and control operation, as elucidated by Wazzan [3]. A concrete example is the Mirai botnet, which encompasses attack vectors, a scanning process actively seeking other devices for compromise, and a command and control system governing the compromised devices (bots). This orchestrates further propagation and instigates attacks, exemplifying the sophisticated nature of contemporary botnet strategies [4].

Contribution

The paper’s contribution is the creation and application of a novel method for the hierarchical classification of botnet attacks using Convolutional Neural Networks (CNN) on the Bot-IoT datasets. The persistent threat of botnet attacks on Internet of Things (IoT) devices in the field of cybersecurity necessitates the use of strong and efficient detection systems. The hierarchical CNN architecture presented in this paper improves the precision and level of detail in identifying botnet attacks. The contribution of the paper is summarized as follows:

Our suggested method aims to enhance the efficiency of botnet attack classification on the Bot-IoT dataset by leveraging its inherent hierarchical structure.
Additionally, the proposed CNN model is lightweight, demanding less memory and execution time, rendering it suitable for deployment on compact IoT devices.
This approach is crucial in addressing the evolving IoT security landscape, as it introduces advanced hierarchical categorization algorithms. These algorithms facilitate more precise and nuanced identification of botnet activity

The rest of the paper is organized as follows. The related works are presented in Section 2. In Section 3, the dataset description, proposed approach, and evaluation metrics are covered. In Section 4, the experimental results and discussion followed by a comparison to the existing approach are discussed. Finally, Section 5 presents the conclusion.

2. Related Work

Botnet attack vulnerabilities still exist in Internet of Things (IoT) networks, even with the introduction of SDN (Software-Defined Networking) [5,6]. There are several ways to classify botnet attacks, and particularly in SDN-enabled IoT networks, bots use network-probing attacks and backdoor vulnerabilities to obtain a foothold. A variety of attacks, such as DDoS (Distributed Denial of Service), DoS (Denial of Service), scanning assaults, and information theft, can be carried out using this vulnerable position [7,8,9]. Conventional neural network-based Network Intrusion Detection Systems (NIDS) tend to incur high resource consumption, making them impractical for deployment in Internet gateways, routers, and Internet of Things (IoT) devices [10,11].

Wei et al. [10] introduced a lightweight, two-stage NIDS for IoT networks, exclusively utilizing packet-length features. The system efficiently detects botnet activities in resource-limited devices. In their approach, they involved 21 discriminative statistical features for distinguishing between malicious and normal traffic flows. In the first stage, an autoencoder-based module filters out a significant portion of normal traffic. Subsequently, a novel mechanism transforms packet length sequences into RGB images for malicious traffic classification using a lightweight CNN. The authors in [12] proposed a feature selection using Information Gain (IG) and Gain Ratio (GR) with the ranked top 50% features for the detection of DoS and DDoS attacks. The proposed approach achieved 99.9993% accuracy with 16 selected features.

Hybridized deep learning-driven botnet malware detection algorithms, as explored by Liaqat [13], have integrated convolutional neural networks (CNNs) to extract features effectively. While CNN’s convolutional and pooling layers excel at capturing spatial features, the challenge lies in their limited grasp of temporal information, hindering the identification of feature inter-dependency in a 2D-CNN. In response to this limitation, Liaqat et al. [13] proposed an innovative solution by incorporating cuDNNLSTM layers after the CNN layers, resulting in a hybrid deep learning architecture. The CNN-BiLSTM method proposed in [14] is well-known for its convolutional and bidirectional long short-term memory architecture. The effectiveness of this hybrid model was demonstrated through training and testing on the Bot-IoT dataset, with reported results showcasing an impressive accuracy of 99.99%, precision of 99.99%, recall of 99.99%, and an F1-score of 99.99%. These findings underscore the superiority of the CNN-BiLSTM architecture in successfully detecting botnet malware within Software-Defined Networking (SDN)-enabled Internet of Things (IoT) networks.

In [15], the novel hierarchical CNN-attention network, CANET, is proposed. Within CANET, local spatiotemporal feature extraction is the primary focus of the CNN-Attention (CA) Block, which is formed by merging CNN with the attention mechanism. More suited for contemporary large-scale network intrusion detection, the multi-layer CA Block combination can fully understand the multi-level spatiotemporal aspects of network attack data. Additionally, they suggest using Equalisation Loss v2 (EQL v2) to balance learning attention on minority classes and enhance the minority class weight to address the issue of class imbalance. Numerous tests show that CANET works better in terms accuracy, detection rate, and false positive rate than the most advanced techniques. Additionally, it effectively raises the minority class detection rate.

Dina et al. [16] introduced the concept of focused loss, a specialized loss function, to tackle the data imbalance problem in IoT intrusion detection. By using dynamically scaled-gradient updates to prioritize difficult negatives and automatically reduce the influence of simple examples, this function helps to train machine learning models that are incredibly successful. The proposed method was evaluated against the most advanced intrusion-detection models through comprehensive experimental evaluations on three datasets covering various IoT domains. The results showed that their method, which trained deep learning models using the focal loss function, performs better than the conventional cross-entropy loss function in terms of accuracy, precision, F1 score, and Matthews Correlation Coefficient (MCC) score.

Xu et al. [17] employed the binary grey wolf optimizer (BGWO) heuristic algorithm and recursive feature elimination (RFE) to select features for their intrusion detection system. They used the synthetic minority oversampling technique (SMOTE) to oversample the minority classes. Finally, they applied XGBoost as a classifier and achieved a perfect

F_{1}

score of 1.0 for the five-class classification on the Bot-IoT dataset. Similarly, the study conducted by Alosaimi et al. [18] involved a comparison of decision trees, ensemble bag, K-nearest neighbor, linear discriminant, and support vector machine in the context of 2, 5, and 11 class classification on the dataset. The authors asserted that the ensemble bag exhibited 100% accuracy across all hierarchical levels in the dataset. However, they refrained from disclosing the specific models that were ensembled together and neglected to provide detailed information regarding the computational resource requirements for this approach.

3. Materials and Method

In this study, we will use a step-by-step method, depicted in Algorithm 1, to identify and categorize botnet attacks using HCNN. To comprehend intricate patterns and time-related facets of botnet activity, we put out new lightweight HCNN models. Precision, recall, and F1-score are examples of regularly used assessment criteria that we employed to assess the effectiveness of our suggested model. Our objective is to use cutting-edge deep learning techniques to develop a dependable and useful system for identifying and categorizing botnet attacks.

Algorithm 1 Data Preprocessing and Model Training

1:: procedure DataPreprocessing
2:: Downsampling the dataset
3:: Remove columns with empty entries
4:: Drop rows with NaN entries
5:: Convert categorical values to label encoded data
6:: Normalize the data using Standard Scaler
7:: end procedure
8:: procedure BuildModel
9:: Build a CNN model with a small number of parameters
10:: end procedure
11:: procedure TrainModel
12:: Split the dataset into training, validation, and test datasets
13:: Define hyperparameters and set up the experiment setup
14:: Train the model using the training and validation dataset
15:: end procedure
16:: procedure EvaluateModel
17:: Evaluate the model using learning curves
18:: Evaluate the model on the testing dataset
19:: end procedure
20:: procedure CheckSatisfaction
21:: if not satisfied then
22:: Repeat from Data Preprocessing
23:: end if
24:: end procedure
25:: procedure ReportResults
26:: if satisfied then
27:: Report the results
28:: end if
29:: end procedure

3.1. Dataset

The Bot-IoT dataset, as discussed in [19] serves as a valuable resource for researchers and practitioners involved in network security within the Internet of Things (IoT) domain. The dataset, introduced by the authors, is designed to address the growing need for realistic and representative datasets that can aid in the analysis of botnets operating within IoT networks.

The paper emphasizes the importance of understanding and countering security threats posed by botnets in the dynamic landscape of IoT. The dataset is crafted to capture various aspects of IoT device behavior, network traffic, and communication patterns, encompassing both normal and malicious activities. It facilitates the development, testing, and evaluation of detection and prevention mechanisms specifically tailored to IoT-based botnets. Researchers can leverage the Bot-IoT dataset to conduct experiments, validate algorithms, and advance the field of IoT security. The ultimate goal is to enhance the capabilities of network forensic analytics, providing insights and solutions to combat the evolving challenges associated with botnets in the IoT ecosystem. The dataset’s significance lies in its contribution to creating a more secure IoT environment through the development and refinement of cutting-edge security measures [19].

The dataset consists of 72 million records and encompasses 42 features, rendering it notably extensive with a substantial size of 17.87 GB. The dataset contains a hierarchical structure as depicted in Figure 1. The dataset exhibits imbalance, evident in the significant variation among the instance counts for each class, as illustrated in Figure 2 and Figure 3. Due to the extensive size of the dataset, making it impractical to train the proposed method, we addressed this challenge by undersampling classes with more than one million instances. Specifically, we randomly selected 20% from each of these classes while retaining all samples for the remaining classes.

During the data cleaning phase, columns containing over 20% missing values are eliminated. Subsequently, for the remaining records, any rows with null values are excluded. Categorical columns undergo encoding through label encoding, and the data are then divided into X for input and Y for the target variable. The input data (X) are further normalized using a standard scaler. The dataset is divided into training, validation, and testing sets using random selection, with a split ratio of 80%, 10%, and 10%, respectively.

3.2. Proposed Approach

Hierarchical classification (HC) has demonstrated remarkable effectiveness in handling datasets with inherent hierarchical structures, according to several studies [20,21,22,23,24,25]. Their success can be attributed to their ability to capture information at different levels of granularity within the data. HC achieves this by employing a hierarchical architecture that mimics the hierarchical nature of the data itself. This allows them to learn informative features at each level of the hierarchy, ultimately leading to improved classification performance.

Numerous investigations have highlighted the efficiency of employing CNN in the classification of tabular data [10,11,13]. In the present research, we introduce a lightweight CNN model tailored specifically for this purpose. As presented in the Section 3.1, the dataset at hand exhibits a hierarchical arrangement. Consequently, we put forth a novel approach employing a Hierarchical Convolutional Neural Network (HCNN). This architecture utilizes a common backbone depicted in Table 1, yet diverges into distinct models corresponding to each level within the hierarchical structure. This strategy enables a comprehensive exploration of the HCNN’s impact by comparing the performance across different hierarchy levels.

The hierarchical feature learning process in CNNs allows the network to generalize well to new, unseen data. The learned hierarchical representations can capture essential characteristics of the input data, leading to better generalization performance on diverse datasets. While the initial level of the HCNN might not yield a substantial alteration in classification performance, its impact becomes notably evident in the final level.

This study introduces two models: a non-hierarchical model, illustrated in Figure 4, and a hierarchical model, illustrated in Figure 5. The non-hierarchical models incorporate a backbone as shown in Table 1, along with their respective final dense layer for classification employing a softmax Activation Function. The hierarchical model utilizes the same backbone but incorporates a hierarchical structure at each level of the hierarchy.

The primary structure of the backbone model includes Conv1D, MaxPooling, Conv1D, and Global Average Pooling (GAP). The adoption of Global Average Pooling (GAP) in place of the flattened layer has resulted in a decrease in the overall parameter count. The total parameters in the backbone model sum up to 848, reflecting a notably compact size. For the three non-hierarchical models, the parameter counts are 882 for the first model Figure 4a, 933 for the second model Figure 4b, and 1035 for the final model Figure 4c.

The HCNN model outlined in Figure 5 comprises two components: the backbone, the same as that of the non-hierarchical models, and the hierarchical network. The hierarchical network operates by combining the features acquired at a higher level with those of its ancestor. This combined information is then provided as input to the subsequent layer to acquire features for the descendant nodes. The network comprises 1790 parameters, with the hierarchical network introducing 942 parameters to the backbone. The backbone network produces a root representation

R_{0}

, which is given by the function in Equation (1) which is the global average pooling of the final convolutional layer of the backbone network, where X is the input features and

θ_{0}

is the parameters of the network.

R_{0} = G A P (F (X, θ_{0}))

(1)

For the hierarchical network, we first produce an independent representation for the first level, with

R_{1}

, given in Equation (2), where

W_{1}

is the weight for the first layer. The representation for the second level, denoted as

R_{2}

, is formulated by performing a matrix multiplication between

W_{2}

and the concatenation of

R_{0}

with

R_{1}

, as presented in Equation (3).

R_{1} = W_{1} \times R_{0} = W_{1} \times G A P (F (X, θ)) = H (X, θ_{1})

(2)

R_{2} = W_{2} \times (R_{0} \oplus R_{1}) = G (X, θ_{2})

(3)

Finally, the representation for the third layer, denoted as

R_{3}

, is expressed in Equation (4), where

W_{3}

denotes the weight matrix operating on the concatenation of vectors

R_{0}

,

R_{1}

, and

R_{2}

.

R_{3} = W_{3} \times (R_{0} \oplus R_{1} \oplus R_{2}) = T (X, θ_{3})

(4)

The loss of the network, Equation (5), is calculated by summing the loss of each level,

l o s s_{l}

, where the losses are calculated using binary cross entropy given in Equation (6) for the first level, and categorical cross entropy as presented in Equation (7) for level 2 and 3.

l o s s = \sum_{l = 1}^{3} l o s s_{l}

(5)

where the variable “l” denotes the hierarchical level within the model, and specifically, there are three hierarchical levels in consideration.

l o s s_{1} = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \cdot log (p_{i}) + (1 - y_{i}) \cdot log (1 - p_{i})]

(6)

where N is the total number of samples,

y_{i}

is the true label of the

i^{t h}

sample (either 0 or 1), and

p_{i}

is the predicted probability of the

i^{t h}

sample belonging to class 1.

l o s s_{2 / 3} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{j = 1}^{C} y_{i j} \cdot log (p_{i j})

(7)

where N is the total number of samples, C is the number of classes,

y_{i j}

is a ground truth label for the

i^{t h}

sample and belongs to class j, and

p_{i j}

is the predicted probability that the

i^{t h}

sample belongs to class j.

To guarantee the acquisition of accurate category structures, it is essential to introduce a supplementary loss function known as the dependency loss. The dependency loss,

d l o s s_{l}

, is calculated by Equation (8) as discussed in [24]. This loss function is formulated by assessing whether the model’s predictions conform to the anticipated hierarchical relationships among classes in the classification task. When the predicted classes of consecutive layers deviate from the parent–child relationship within the hierarchy, this loss penalizes the model, intending to promote the acquisition of precise category structures. The definition of the loss function hinges on evaluating whether the model’s predictions align with the expected hierarchical relationships among classes in the classification task.

d l o s s_{l} = - {(l o s s_{l - 1})}^{D_{l} I_{i - 1}} {(l o s s_{l})}^{D_{l} I_{l}}

(8)

where

D_{l}

, and

I_{l}

determine if the model output conflicts with the hierarchical structure.

D_{l} = \{\begin{matrix} 1 & if p_{l} ⇏ p_{l - 1} \\ 0 & otherwise \end{matrix}

(9)

I_{l} = \{\begin{matrix} 1 & if p_{l} \neq y_{l} \\ 0 & otherwise \end{matrix}

(10)

where

p_{l}

and

y_{l}

denote the prediction and actual label at level l, respectively. If the category prediction at level l aligns with the prediction at level l−1, D is set to 0; otherwise, it is set to 1. Similarly, if the prediction at level l aligns with the actual value at the same level, I is set to 0; otherwise, it is set to 1.

The final loss Equation (11) is a weighted summation of the loss in loss1, loss2, loss3 and dloss, using

α

, and

β

as weights of the two losses respectively, where 0 ≤

α

,

β

≤ 1.

L (θ) = \sum_{l = i}^{3} α_{i} l o s s_{i} + \sum_{l = 2}^{3} β_{i} d l o s s_{i}

(11)

3.3. Evaluation Parameters

Precision, recall, F1 score, and accuracy are metrics commonly used to evaluate the performance of classification models [17]. These metrics are calculated based on the counts of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN). These terms are commonly used in binary classification scenarios, where a model is predicting between two or more classes (positive and negative) [17].

Precision: This is also known as a positive predictive value, which measures the accuracy of positive predictions made by the model. It is calculated as the ratio of true positives to the sum of true positives and false positives as presented in Equation (12).

$P = \frac{TP}{TP + FP}$

(12)
Recall: This is also known as sensitivity or true positive rate, which measures the ability of the model to correctly identify all relevant instances in the dataset. It is calculated as the ratio of true positives to the sum of true positives and false negatives as presented in Equation (13).

$R = \frac{TP}{TP + FN}$

(13)
F1 score: This is the harmonic mean of precision and recall, providing a balance between the two metrics as presented in Equation (14). It is particularly useful when there is an uneven class distribution.

$F 1 Score = \frac{2 \cdot (Precision \cdot Recall)}{Precision + Recall}$

(14)
Accuracy: This measures the overall correctness of the model by considering both true positives and true negatives. It is calculated as the ratio of correctly predicted instances (TP + TN) to the total number of instances as presented in Equation (15).

$Acc = \frac{TP + TN}{TP + TN + FP + FN}$

(15)
Matthews correlation coefficient (MCC) [26]: This considers both true and false positives and negatives, making it a well-balanced measure suitable for scenarios where class imbalance. Ranging from −1 to +1, the MCC essentially represents a correlation coefficient. A value of +1 indicates a flawless prediction, 0 denotes an average random prediction, and −1 signifies an inverse prediction. The formula to calculate MCC is given in Equation (16).

$MCC = \frac{T P * T N - F P * F N}{\sqrt{(T P + F P) (T P + F N) (T N + F P) (T N + F N)}}$

(16)
Cohen’s Kappa (k) [27]: This serves as a statistical metric for measuring inter-annotator agreement. This function calculates Cohen’s kappa, a score that quantifies the level of agreement between two annotators in a classification problem. The definition, as presented in Equation (17), involves $p_{o}$ , the empirical probability of agreement on the label assigned to any sample (the observed agreement ratio), and $p_{e}$ , the expected agreement when annotators assign labels randomly. Estimating $p_{e}$ utilizes a per-annotator empirical prior over the class labels.

$k = \frac{p_{o} - p_{e}}{1 - p_{e}}$

(17)

4. Experimental Results

In this investigation, we conducted two primary experiments: the initial involved training and fine-tuning the hierarchical model, while the second replicated the same procedures for the non-hierarchical model. These experiments aimed to evaluate the performance of the proposed HCNN classification and to assess whether the proposed model outperforms the existing state-of-the-art methods. The optimal hyper-parameters value has been obtained using grid-search, as indicated in Table 2.

Figure 6 and Figure 7 provide a visual representation of the learning curve and confusion matrix associated with the HCNN model. The depicted curve exhibits a notable smoothness, indicating a balanced learning process. Notably, there is an absence of both overfitting and underfitting, crucial aspects in assessing the model’s generalization capabilities. This observation is particularly significant, as it suggests that the HCNN model effectively navigates the trade-off between complexity and simplicity, showcasing its ability to learn intricate patterns within the dataset. An absence of overfitting implies that the model does not excessively tailor itself to the training data, avoiding memorization of specific instances at the expense of broader applicability. On the other hand, the lack of underfitting signifies that the model is not oversimplified to the extent of failing to capture essential patterns within the data. The fact that the HCNN model achieves this equilibrium despite its lightweight design underscores its efficiency and competence in learning and generalizing from the given dataset.

The outcome of the proposed approach presented in Figure 8 pertains to the classification performance of the models at the initial hierarchy level, encompassing two categories: “attack” and “normal”. Examining the results outlined in Figure 9, it becomes evident that there is no notable performance improvement at the second level attributable to the hierarchical organization of the model, with both models showcasing comparable scores. Considering these observations, it can be inferred that the hierarchical structure’s impact is not markedly significant at the first level.

Further exploring the hierarchical structure of data during model development offers a primary advantage, particularly in enhancing the performance of the higher levels within the hierarchy. This advantage stems from the fact that the features learned at the initial levels of the hierarchy contribute significantly to distinguishing between classes that may be inherently challenging to separate. This is underscored by the substantial performance improvement observed at the third level hierarchy, as evidenced in Figure 10, where the

F_{1}

score for the HCNN stands at 98%, surpassing the non-hierarchical model’s score of 93%.

The results in Table 3 reveals that the model’s performance remains robust despite the imbalanced dataset, as evidenced by the MCC and k values. Our approach involved utilizing the dataset in its natural form, showcasing the model’s effectiveness under such conditions. Both evaluation metrics consistently yielded scores above 0.99 at each hierarchical level, affirming the proposed model’s robustness and excellent performance, even in the face of an unbalanced dataset. The confusion matrix shown in Figure 7 illustrates the classification performance at each level, revealing minimal misclassifications with the model accurately identifying the majority of instances.

Discussion

The performance of the suggested model closely matches that of the state-of-the-art approach in both two-class and five-class classifications. The classification of a single instance is quick, taking only 9.4 µs, tested on an Apple MacBook Pro M1. Notably, the proposed model stands out for its lightweight design, demanding a mere 6.97 KB in memory. In the case of the third-level 11-class classification within the hierarchy, our model continues to outperform existing works in terms of both performance and computational resource demands.

Comparing our suggested methodology with current state-of-the-art methods allowed for a thorough assessment of its effectiveness, provided in Table 4, Table 5 and Table 6. In particular, our method was thoroughly tested against prominent techniques [14,17,18]. This thorough comparison study sought to clarify our suggested approach’s unique benefits and functional characteristics with these state-of-the-art works.

The CNN architecture proposed in [14] is more intensive compared to the architecture proposed in this work in terms of learnable parameters. Although the performance observed in [18] is encouraging, it is worth noting that the utilization of the SMOTE technique, as highlighted by researchers in [28,29], can potentially generate synthetic examples that are very similar to existing minority class instances, leading to overfitting.

Developing a detection model tailored for BotNet attack detection within the context of SDN-enabled IoT architecture, as depicted in Figure 11, entails consideration of four primary layers: device, network, controller, and application layers [11,30]. At the device layer, sensors and actuators detect and interact with the environment. The network layer includes SDN gateways and routers responsible for data forwarding, controlled by the SDN controller. The controller layer encompasses the SDN controller, which developers program to create IoT services. Typically, the controller layer is utilized to enable high-security protocols.

Implementing lightweight machine learning models for botnet attack detection in IoT devices requires careful selection of efficient algorithms and optimization of their architecture and parameters to minimize resource usage while preserving detection accuracy. As shown in Figure 11 the proposed ML model is deployed on the device layer for analyzing data streams in real-time so that instant decisions are made without relying on centralized processing. The deployment architecture coupled with a very small size of the mode, 6.97 KB, leads to faster response time and improved network performance. Alternatively, the model can be deployed on the data plane, encompassing both network and device layers, as discussed in [31,32,33], allowing for attack detection on devices like gateways and routers.

5. Conclusions

By leveraging the hierarchical arrangement of the data, a model can capitalize on the insights gained at lower levels to better discern intricate patterns or subtle differences at the higher levels. This hierarchical learning process allows the model to develop a better understanding of the data, which in turn improves its ability to differentiate between closely related or inseparable classes. As the model progresses through the hierarchy, it refines its feature representations, thus empowering it to make more informed and accurate predictions, particularly at the levels where class separability is inherently challenging. This holistic approach to hierarchical modeling ultimately contributes to the overall effectiveness and robustness of the machine learning model.

This study introduces HCNN, a hierarchical model utilizing a lightweight CNN, for BotNet attack detection. With a small number of parameters, this model boasts efficient memory usage and requires minimal running time, rendering it well-suited for IoT devices. It achieved commendable results, surpassing 0.99 for both MCC and k values at each level. This indicates the robustness of the model despite the presence of an unbalanced dataset. Notably, the

F_{1}

scores exhibited remarkable performance across all levels, with a significant improvement observed at the third level featuring 11 classes, attaining an

F_{1}

score of 0.98. This underscores the hierarchical model’s efficacy in enhancing classification, especially at deeper levels, by leveraging features from the more distinguishable lower levels compared to the higher ones.

The effectiveness of the proposed model has only been evaluated using a single dataset. To ensure its performance in diverse scenarios, it is crucial to train and test the model with a variety of datasets. Our forthcoming plans include assessing the model’s robustness by exposing it to different datasets collected under various circumstances. The outcomes presented in our paper reflect the results of experiments conducted with our classification model using this particular dataset. Furthermore, our future efforts involve testing the model under real-world conditions, especially within an SDN-orchestrated IoT environment.

Author Contributions

Conceptualization, W.G.N.; methodology, W.G.N.; validation, F.S., T.G.D., D.W.F. and H.M.M.; writing—original, W.G.N.; writing—review and editing, T.G.D., F.S., D.W.F. and H.M.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: [https://research.unsw.edu.au/projects/bot-iot-dataset].

Conflicts of Interest

The author declares no conflicts of interest.

References

Apostol, I.; Preda, M.; Nila, C.; Bica, I. IoT Botnet Anomaly Detection Using Unsupervised Deep Learning. Electronics 2021, 10, 1876. [Google Scholar] [CrossRef]
Negera, W.G.; Schwenker, F.; Debelee, T.G.; Melaku, H.M.; Feyisa, D.W. Lightweight Model for Botnet Attack Detection in Software Defined Network-Orchestrated IoT. Appl. Sci. 2023, 13, 4699. [Google Scholar] [CrossRef]
Wazzan, M.; Algazzawi, D.; Bamasaq, O.; Albeshri, A.; Cheng, L. Internet of Things Botnet Detection Approaches: Analysis and Recommendations for Future Research. Appl. Sci. 2021, 11, 5713. [Google Scholar] [CrossRef]
Meidan, Y.; Bohadana, M.; Mathov, Y.; Mirsky, Y.; Shabtai, A.; Breitenbacher, D.; Elovici, Y. N-BaIoT—Network-Based Detection of IoT Botnet Attacks Using Deep Autoencoders. IEEE Pervasive Comput. 2018, 17, 12–22. [Google Scholar] [CrossRef]
Sarica, A.K.; Angin, P. Explainable Security in SDN-Based IoT Networks. Sensors 2020, 20, 7326. [Google Scholar] [CrossRef] [PubMed]
Wang, S.; Gomez, K.; Sithamparanathan, K.; Asghar, M.R.; Russello, G.; Zanna, P. Mitigating DDoS Attacks in SDN-Based IoT Networks Leveraging Secure Control and Data Plane Algorithm. Appl. Sci. 2021, 11, 929. [Google Scholar] [CrossRef]
Shinan, K.; Alsubhi, K.; Alzahrani, A.; Ashraf, M.U. Machine Learning-Based Botnet Detection in Software-Defined Network: A Systematic Review. Symmetry 2021, 13, 866. [Google Scholar] [CrossRef]
Wilhelm, T.; Andress, J. Sabotage. In Ninja Hacking; Elsevier: Amsterdam, The Netherlands, 2011; pp. 267–284. [Google Scholar] [CrossRef]
Nguyen, T.G.; Phan, T.V.; Nguyen, B.T.; So-In, C.; Baig, Z.A.; Sanguanpong, S. SeArch: A Collaborative and Intelligent NIDS Architecture for SDN-Based Cloud IoT Networks. IEEE Access 2019, 7, 107678–107694. [Google Scholar] [CrossRef]
Wei, C.; Xie, G.; Diao, Z. A lightweight deep learning framework for botnet detecting at the IoT edge. Comput. Secur. 2023, 129, 103195. [Google Scholar] [CrossRef]
Negera, W.G.; Schwenker, F.; Debelee, T.G.; Melaku, H.M.; Ayano, Y.M. Review of botnet attack detection in SDN-enabled IoT Using machine learning. Sensors 2022, 22, 9837. [Google Scholar] [CrossRef] [PubMed]
Nimbalkar, P.; Kshirsagar, D. Feature selection for intrusion detection system in Internet-of-Things (IoT). ICT Express 2021, 7, 177–181. [Google Scholar] [CrossRef]
Liaqat, S.; Akhunzada, A.; Shaikh, F.S.; Giannetsos, A.; Jan, M.A. SDN orchestration to combat evolving cyber threats in Internet of Medical Things (IoMT). Comput. Commun. 2020, 160, 697–705. [Google Scholar] [CrossRef]
Sinha, J.; Manollas, M. Efficient Deep CNN-BiLSTM Model for Network Intrusion Detection. In Proceedings of the 2020 3rd International Conference on Artificial Intelligence and Pattern Recognition, Xiamen, China, 26–28 June 2020. [Google Scholar]
Ren, K.; Yuan, S.; Zhang, C.; Shi, Y.; Huang, Z. CANET: A hierarchical CNN-Attention model for Network Intrusion Detection. Comput. Commun. 2023, 205, 170–181. [Google Scholar] [CrossRef]
Dina, A.S.; Siddique, A.; Manivannan, D. A deep learning approach for intrusion detection in Internet of Things using focal loss function. Internet Things 2023, 22, 100699. [Google Scholar] [CrossRef]
Xu, B.; Sun, L.; Mao, X.; Ding, R.; Liu, C. IoT Intrusion Detection System Based on Machine Learning. Electronics 2023, 12, 4289. [Google Scholar] [CrossRef]
Alosaimi, S.; Almutairi, S.M. An Intrusion Detection System Using BoT-IoT. Appl. Sci. 2023, 13, 5427. [Google Scholar] [CrossRef]
Koroniotis, N.; Moustafa, N.; Sitnikova, E.; Turnbull, B. Towards the development of realistic botnet dataset in the Internet of Things for network forensic analytics: Bot-IoT dataset. Future Gener. Comput. Syst. 2019, 100, 779–796. [Google Scholar] [CrossRef]
Shakhovska, N.; Izonin, I.; Melnykova, N. The hierarchical classifier for covid-19 resistance evaluation. Data 2021, 6, 6. [Google Scholar] [CrossRef]
Zhou, J.; Ma, C.; Long, D.; Xu, G.; Ding, N.; Zhang, H.; Xie, P.; Liu, G. Hierarchy-aware global model for hierarchical text classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 1106–1117. [Google Scholar]
Izonin, I.; Tkachenko, R.; Mitoulis, S.A.; Faramarzi, A.; Tsmots, I.; Mashtalir, D. Machine learning for predicting energy efficiency of buildings: A small data approach. Procedia Comput. Sci. 2024, 231, 72–77. [Google Scholar] [CrossRef]
Su, W.; Wang, J.; Lochovsky, F. Automatic hierarchical classification of structured deep web databases. In Proceedings of the Web Information Systems—WISE 2006: 7th International Conference on Web Information Systems Engineering, Wuhan, China, 23–26 October 2006; Proceedings 7. Springer: Berlin/Heidelberg, Germany, 2006; pp. 210–221. [Google Scholar]
Gao, D.; Yang, W.; Zhou, H.; Wei, Y.; Hu, Y.; Wang, H. Deep hierarchical classification for category prediction in e-commerce system. arXiv 2020, arXiv:2005.06692. [Google Scholar]
Fontenot, R.; Lazarus, J.; Rudick, P.; Sgambellone, A. Hierarchical Neural Networks (HNN): Using TensorFlow to build HNN. SMU Data Sci. Rev. 2022, 6, 4. [Google Scholar]
Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef] [PubMed]
Artstein, R.; Poesio, M. Inter-coder agreement for computational linguistics. Comput. Linguist. 2008, 34, 555–596. [Google Scholar] [CrossRef]
Ramezankhani, A.; Pournik, O.; Shahrabi, J.; Azizi, F.; Hadaegh, F.; Khalili, D. The impact of oversampling with SMOTE on the performance of 3 classifiers in prediction of type 2 diabetes. Med. Decis. Mak. 2016, 36, 137–144. [Google Scholar] [CrossRef] [PubMed]
Fernández, A.; Garcia, S.; Herrera, F.; Chawla, N.V. SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. J. Artif. Intell. Res. 2018, 61, 863–905. [Google Scholar] [CrossRef]
Li, Y.; Su, X.; Riekki, J.; Kanter, T.; Rahmani, R. A SDN-based architecture for horizontal Internet of Things services. In Proceedings of the 2016 IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia, 22–27 May 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–7. [Google Scholar]
Javeed, D.; Gao, T.; Saeed, M.S.; Kumar, P.; Kumar, R.; Jolfaei, A. A softwarized intrusion detection system for iot-enabled smart healthcare system. ACM Trans. Internet Technol. 2023, 1–18. [Google Scholar] [CrossRef]
Kumar, P.; Kumar, R.; Aljuhani, A.; Javeed, D.; Jolfaei, A.; Islam, A.N. Digital twin-driven SDN for smart grid: A deep learning integrated blockchain for cybersecurity. Sol. Energy 2023, 263, 111921. [Google Scholar] [CrossRef]
Kumar, R.; Aljuhani, A.; Javeed, D.; Kumar, P.; Islam, S.; Islam, A.N. Digital twins-enabled zero touch network: A smart contract and explainable AI integrated cybersecurity framework. Future Gener. Comput. Syst. 2024, 156, 191–205. [Google Scholar] [CrossRef]

Figure 1. The hierarchical structure of the Bot-IoT dataset.

Figure 2. Number of samples for each class in the second-level hierarchy.

Figure 3. Number of samples for each class in the third-level hierarchy.

Figure 4. Developing distinct models for each hierarchical level results in a structure lacking hierarchy. (a) Represents the first-level classification with two classes, (b) represents the second-level classification with five classes, and (c) represents the third-level classification with 11 classes.

Figure 5. Proposed hierarchical classification of botnet.

Figure 6. Training curve of HCNN accuracy across different levels: (a) initial level, (b) second level, (c) third level; (d) represents overall training loss.

Figure 7. Confusion matrices for (a) the level with 2 classes, (b) the level with 5 classes, and (c) the level with 11 classes.

Figure 8. Classification result for the first level in the hierarchy.

Figure 9. Classification result for the second level in the hierarchy.

Figure 10. Classification result for the third level in the hierarchy.

Figure 11. SDN-enabled IoT architecture and ML model deployment on the device layer.

Table 1. Base model architecture [AF = Activation Function].

Algorithm	Layers	Output Shape	Kernel	AF
CNN-GAP Backbone	Conv Layer	(26, 16)	(3, 3)	tanh
	MaxPooling	(13, 16)	(2, 2)	-
	Conv Layer	(13, 16)	(3, 3)	tanh
	GAP Layer	(1, 16)	-	-

Table 2. The potential configurations of hyperparameters and their corresponding optimal settings.

Hyper-Parameters	Possible Values	Optimal Values
Activation Function	ReLu, Leaky ReLu, tanh	tanh
Batch size	8, 16, 32, 64, 128	32
Initial learning rate	0.01, 0.001, 0.0001	0.001
Optimizer	SGD, RMSprop, Adam	Adam

Table 3. MCC and k result for each level in the hierarchy.

Metrics	Level-1	Level-2	Level-3
Metrics	(2-Classes)	(5-Classes)	(11-Classes)
MCC	0.992	1.00	0.998
K	0.992	1.00	0.998

Table 4. Comparing the proposed model performance to the existing techniques (2 classes).

Methods	Accuracy	Precision	Recall	F1
LSTM [18]	99.74194	99.991036	99.750848	99.8708
Decision Tree [18]	100	100	100	100
Proposed	100	100	100	100

Table 5. Comparing the proposed model performance to the existing techniques (5 classes).

Methods	Accuracy	Precision	Recall	F1
CNN-BiLSTM [14]	99.9918	99.9896	99.9918	99.9899
Decision Tree [18]	100	-	-	-
XGBoost [17]	100	100	100	100
Proposed	100	100	99	99

Table 6. Comparing the proposed model performance to the existing techniques (11 classes).

Methods	Accuracy	Precision	Recall	F1
Decision Tree [18]	100	-	-	-
KNN [18]	99.982	-	-	-
SVM [18]	99.967	-	-	-
Proposed	100	99	97	98

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Negera, W.G.; Schwenker, F.; Feyisa, D.W.; Debelee, T.G.; Melaku, H.M. Hierarchical Classification of Botnet Using Lightweight CNN. Appl. Sci. 2024, 14, 3966. https://doi.org/10.3390/app14103966

AMA Style

Negera WG, Schwenker F, Feyisa DW, Debelee TG, Melaku HM. Hierarchical Classification of Botnet Using Lightweight CNN. Applied Sciences. 2024; 14(10):3966. https://doi.org/10.3390/app14103966

Chicago/Turabian Style

Negera, Worku Gachena, Friedhelm Schwenker, Degaga Wolde Feyisa, Taye Girma Debelee, and Henock Mulugeta Melaku. 2024. "Hierarchical Classification of Botnet Using Lightweight CNN" Applied Sciences 14, no. 10: 3966. https://doi.org/10.3390/app14103966

APA Style

Negera, W. G., Schwenker, F., Feyisa, D. W., Debelee, T. G., & Melaku, H. M. (2024). Hierarchical Classification of Botnet Using Lightweight CNN. Applied Sciences, 14(10), 3966. https://doi.org/10.3390/app14103966

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hierarchical Classification of Botnet Using Lightweight CNN

Abstract

1. Introduction

Contribution

2. Related Work

3. Materials and Method

3.1. Dataset

3.2. Proposed Approach

3.3. Evaluation Parameters

4. Experimental Results

Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI