CBF-IDS: Addressing Class Imbalance Using CNN-BiLSTM with Focal Loss in Network Intrusion Detection System

Peng, Haonan; Wu, Chunming; Xiao, Yanfeng

doi:10.3390/app132111629

Open AccessArticle

CBF-IDS: Addressing Class Imbalance Using CNN-BiLSTM with Focal Loss in Network Intrusion Detection System

by

Haonan Peng

¹

,

Chunming Wu

^1,* and

Yanfeng Xiao

²

¹

College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China

²

China Mobile Group Huzhou Co., Ltd., Huzhou 313098, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(21), 11629; https://doi.org/10.3390/app132111629

Submission received: 1 September 2023 / Revised: 17 October 2023 / Accepted: 23 October 2023 / Published: 24 October 2023

Download

Browse Figures

Versions Notes

Abstract

:

The importance of network security has become increasingly prominent due to the rapid development of network technology. Network intrusion detection systems (NIDSs) play a crucial role in safeguarding networks from malicious attacks and intrusions. However, the issue of class imbalance in the dataset presents a significant challenge to NIDSs. In order to address this concern, this paper proposes a new NIDS called CBF-IDS, which combines convolutional neural networks (CNNs) and bidirectional long short-term memory networks (BiLSTMs) while employing the focal loss function. By utilizing CBF-IDS, spatial and temporal features can be extracted from network traffic. Moreover, during model training, CBF-IDS applies the focal loss function to give more weight to minority class samples, thereby mitigating the impact of class imbalance on model performance. In order to evaluate the effectiveness of CBF-IDS, experiments were conducted on three benchmark datasets: NSL-KDD, UNSW-NB15, and CIC-IDS2017. The experimental results demonstrate that CBF-IDS outperforms other classification models, achieving superior detection performance.

Keywords:

intrusion detection system; class imbalance; convolutional neural network; bidirectional long short-term memory; focal loss

1. Introduction

The rapid advancement of networks has led to a significant improvement in the quality of human life and work efficiency. Nevertheless, the growing dependence of modern society on networks has heightened the risks of network intrusions and malicious attacks, which pose threats to privacy, property, and even personal safety. Therefore, it is crucial to adopt a reliable and efficient network intrusion detection system (NIDS). NIDSs play a critical role in detecting abnormal traffic and potential security threats, such as unauthorized access and denial-of-service (DoS) attacks, by analyzing network traffic and protocol states [1]. The prompt detection of these threats by NIDSs significantly enhances the security and stability of the network.

Intrusion detection systems (IDSs) are typically classified into two primary types: misuse-based IDS (MIDS) and anomaly-based IDS (AIDS) [2]. MIDS, also known as the signature-based IDS, detects malicious activities in network traffic by comparing them with known intrusion behavior signatures. If network traffic matches a known intrusion signature, an alert is triggered to indicate the presence of an intrusion. However, MIDS requires frequent updates to its signature database and faces challenges in detecting zero-day attacks or unknown variant attacks. In contrast, AIDS utilizes statistical methods and machine learning (ML) algorithms to establish a baseline of normal network behaviors, enabling it to detect deviations from this baseline in network traffic. Through the continuous monitoring and analysis of network traffic, AIDS can detect traffic patterns that deviate from the norm and generate alerts to indicate potential intrusions. However, AIDS may exhibit a high false positive rate when the intrusion behavior is similar to normal behavior or when the training data for the model is insufficient. While AIDS may have a high false positive rate, it possesses the capability to detect both known and unknown attacks [3].

NIDSs have extensively employed traditional ML methods [4,5]. However, the effectiveness of these approaches depends heavily on manually designed features and the model’s capacity to learn relevant features from the data [6,7]. In comparison, deep learning (DL) models can exploit the inherent properties of neural networks to extract abstract and intricate features from unprocessed network traffic data. DL can overcome the limitations of traditional ML [8]. Common DL algorithms, including recurrent neural networks (RNNs), deep neural networks (DNNs), and convolutional neural networks (CNNs), have demonstrated successful application in the domain of intrusion detection [9,10,11].

However, real-world network environments often experience an imbalance between normal and attack traffic data, as well as an uneven distribution of attack categories [12]. Imbalanced datasets can lead to model bias towards the majority class, thereby affecting the performance of IDS. As a result, research efforts have focused on addressing the issue of class imbalance in IDS. Techniques for handling imbalanced datasets can be implemented at both the data level and the algorithmic level [13,14]. Data level techniques involve sampling the dataset to balance samples from different categories, such as oversampling and undersampling. However, these techniques have drawbacks, including information loss, overfitting, and increased computational complexity [15,16]. Algorithmic-level techniques can address class imbalance by introducing algorithms during the training process without altering the dataset itself. Therefore, in this paper, we address the issue of imbalanced datasets at the algorithmic level by utilizing focal loss to mitigate the imbalance problem.

This paper introduces CBF-IDS, an IDS that integrates convolutional neural networks (CNNs), bidirectional long short-term memory networks (BiLSTMs), and focal loss. CBF-IDS is designed to extract spatial and temporal features from network traffic data, facilitating comprehensive modeling and analysis. The integration of these components enhances the accuracy and robustness of the IDS, enabling the effective detection of potential intrusion behaviors within the network. In order to tackle the issue of class imbalance, CBF-IDS employs focal loss, which increases the weight of minority classes during the training process, allowing the model to prioritize learning from these classes. The contributions presented in this paper are outlined below:

We propose CBF-IDS, an IDS based on the CNN-BiLSTM architecture. By leveraging the strengths of CNN and BiLSTM, CBF-IDS effectively extracts spatial and temporal features from network traffic data, significantly enhancing its classification capability.
In order to tackle the challenge posed by class imbalance, we employ the focal loss function, which assigns higher weights to minority classes, thereby enhancing the model’s ability to learn from those specific classes.
We evaluate the performance of CBF-IDS on three datasets: NSL-KDD, UNSW-NB15, and CIC-IDS2017. The findings indicate that CBF-IDS surpasses other models in multi-classification, underscoring its capability to generalize and perform effectively in traditional network environments.

The subsequent sections of this paper are organized in the following. In Section 2, we present an overview of the pertinent research in machine learning and deep learning for IDS. Section 3.1 presents an overview of the datasets utilized and outlines the design architecture of CBF-IDS. Section 4 presents a comprehensive analysis of the experimental results obtained from CBF-IDS. Section 5 discusses the research findings of this paper and outlines future work. Section 6 provides a conclusion of the paper.

2. Related Work

NIDSs extensively employ ML and DL techniques for the analysis of network traffic data and the detection of potential security threats. While traditional ML approaches, like k-nearest neighbor (KNN) [17] and support vector machines (SVM) [18], have been widely used in intrusion detection, their performance is limited when confronted with the challenges posed by large-scale network traffic data and high-dimensional feature spaces [19,20,21]. In contrast, by utilizing neural networks, DL has proven to be effective in handling massive network traffic data and high-dimensional feature spaces, thereby improving the accuracy of IDS [22]. As a result, the research community has shown significant interest in applying DL to intrusion detection in recent years. In Table 1, we have summarized the recent research in network intrusion detection, encompassing the utilization of machine learning, deep learning, and class balancing techniques.

Khan et al. [23] proposed a CNN-based IDS in their study. Their approach involved employing three convolutional layers to effectively capture feature relationships through convolution and pooling operations. The extracted features were subsequently classified using the softmax function. The proposed method underwent evaluation on the KDD99 dataset, achieving an accuracy of 99.23% when tested on 10% of the dataset. Alsyaibani et al. [24] introduced an IDS based on BiLSTM. Their method involved exploring 24 different scenarios with various learning rates, activation functions, and optimizers. In order to address the issue of model overfitting, each scenario incorporated a dropout layer and an L2-regularizer as mitigating techniques. The evaluation was conducted on the CIC-IDS2017 dataset, and the experimental results consistently demonstrated a stable accuracy of over 95% across all scenarios. Specifically, after 100 epochs, the proposed method attained an accuracy of 97.72% for binary classification. Subsequently, the method underwent additional training for 1000 epochs, leading to the best model achieving an accuracy of 98.34%. Arief et al. [25] presented an IDS based on DNN. The DNN architecture employed in their study consisted of five hidden layers, each containing 30 neurons. The DNN model attained an accuracy of 79.26% when evaluated on the KDD99 dataset.

Padmashree et al. [26] introduced a feature selection technique that utilized a recursive feature elimination decision tree based on Pearson correlation. In this study, the model was applied to analyze the BoT-IoT dataset, leading to the identification of nine highly correlated features. Subsequently, these selected features were used in a DNN for attack detection. The experimental results demonstrated that the proposed model achieved an accuracy of 99.2%. Alzaqebah et al. [27] employed the modified GreyWolf optimization algorithm (MGWO) for feature selection and, furthermore, fine-tuned the parameters of the base classifier, the extreme learning machine (ELM), using the MGWO. The proposed approach was experimentally tested on the UNSW-NB15 dataset, achieving an accuracy of 80.93%. Alharbi et al. [28] proposed an effective method for detecting a botnet attack, known as the local-global best bat algorithm for neural networks (LGBA-NNs). This method enhances intrusion detection performance by selecting the best feature subset and hyperparameters. Their proposed method was experimentally evaluated on the N-BaIoT dataset, achieving an accuracy of 90%. Toldinas et al. [29] transformed network features into images and employed the ResNet50 model for classification. The proposed method achieved detection accuracies of 99.8%, 86%, and 67.9% for generic attacks, reconnaissance attacks, and exploit attacks, respectively, on the UNSW-NB15 dataset. Additionally, the proposed method achieved a detection accuracy of 99.7% for DDoS attacks on the BOUN DDoS dataset.

In order to address the negative impact of imbalanced datasets on the performance of IDS, Chen et al. [30] proposed an algorithm that integrates adaptive synthetic sampling (ADASYN) with random forest. ADASYN is an oversampling technique that assigns weights to minority-class samples based on their distribution density in the majority-class sample space [31]. This dynamic weighting helps determine the number of synthetic samples to generate. The researchers evaluated the proposed method on the CIC-IDS2017 dataset, and the experimental results showcased its superiority over traditional ML algorithms and the random forest algorithm using different sampling methods. The proposed method achieved an F1 score of 95.3%. However, ADASYN may introduce noise, potentially affecting the quality of the dataset. Hence, Abdelkhalek et al. [32] proposed a data resampling technique that combines ADASYN with the Tomek links algorithm. Tomek links can remove overlapping samples between adjacent classes to enhance class boundaries and reduce noise. The proposed method was evaluated by employing the NSL-KDD dataset. The experimental results clearly indicate an improvement in the detection rate of the minority class. In the context of multi-classification, the proposed method achieved an accuracy rate of 99.9%. Lee et al. [33] tackled the issue of imbalanced datasets by utilizing a generative adversarial network (GAN). They utilized GAN to augment the samples of the minority class, followed by applying the random forest algorithm to classify the new dataset. The researchers evaluated the proposed method on the CIC-IDS2017 dataset, and the experimental results revealed a substantial improvement in the performance of the random forest classifier when using the GAN-based approach. The method they proposed achieved an accuracy of 99.83%, surpassing the performance of the original dataset.

The studies mentioned above have made substantial contributions to the advancement of the NIDS field and have provided inspiration for this research. This paper addresses the challenge of imbalanced datasets at the algorithmic level. Cost-sensitive learning is one algorithmic technique that adjusts the decision boundary of the classifier by assigning varying weights to samples from different classes while maintaining the distribution of the imbalanced datasets [34,35]. This adjustment raises the cost of misclassifying minority class samples, thus enhancing classification performance. Lin et al. [36] proposed focal loss, which is based on cost-sensitive learning, as a solution to the class imbalance issue in object detection. In this study, we utilize focal loss to tackle the challenge of imbalanced datasets in IDS. In order to assess the effectiveness of CBF-IDS, we selected three widely used imbalanced datasets. These datasets cover various attack scenarios and are suitable for evaluating the performance and generalization capability of CBF-IDS in the presence of class imbalance.

Table 1. Summary of recent research in machine learning, deep learning, and class balancing techniques for network intrusion detection.

Ref	Year	Dataset	Algorithm	Balancing	Performance
[23]	2019	KDD99	CNN	✗ *	Acc = 99.23%
[24]	2021	CIC-IDS2017	BiLSTM	✗	Acc = 97.72%, F1 = 97.75%
[25]	2022	KDD99	DNN	✗	Acc = 79.26%
[26]	2022	BoT-IoT	DNN	✗	Acc = 99.2%
[27]	2022	UNSW-NB15	MGWO + ELM	✗	Acc = 80.93%, F1 = 78.08%
[28]	2021	N-BaIoT	LGBA-NN	✗	Acc = 90%
[29]	2021	UNSW-NB15, BOUN DDoS	ML.NET	Oversampling	Acc(generic) = 99.8%, Acc(DDos) = 99.7%
[30]	2021	CIC-IDS2017	Random Forest	ADASYN	F1 = 95.3%
[32]	2023	NSL-KDD	MLP, DNN, CNN, CNN-BiLSTM	ADASYN + Tomek Links	Acc = 99.9%
[33]	2021	CIC-IDS2017	Random Forest	GAN	Acc = 99.83%, F1 = 95.04%
Our method		NSL-KDD, UNSW-NB15, CIC-IDS2017	CNN-BiLSTM	Focal loss	Acc(NSL-KDD) = 99.4%, Acc(UNSW-NB15) = 82.3%, Acc(CIC-IDS2017) = 99.53%

* ✗ indicates the absence of utilizing class balancing techniques, the same below.

3. Methodology

This paper aims to address the prevalent issue of class imbalance in IDS, which can potentially hinder the performance of IDS. In order to tackle this challenge, we propose an IDS that integrates a CNN, BiLSTM, and the focal loss function named CBF-IDS. The primary goal is to enhance detection performance and mitigate the adverse effects of class imbalance on model performance.

In order to assess the effectiveness of CBF-IDS, we selected three widely used datasets in intrusion detection research: NSL-KDD, UNSW-NB15, and CIC-IDS2017. We conducted a series of data preprocessing steps on these datasets, including data cleaning, one-hot encoding, normalization, two-dimensional representation, and dataset splitting. These steps are crucial for preparing the data for subsequent model training. Subsequently, we developed a hybrid model based on the CNN-BiLSTM architecture. This model aims to comprehensively extract spatiotemporal features from the datasets and perform the classification task. In the CNN, we employed three convolutional layers to enhance the model’s capability to capture spatial features in the data. Furthermore, to better capture temporal features in the data, we introduced a BiLSTM layer, which effectively considers bidirectional temporal dependencies. Additionally, to mitigate the issue of class imbalance in the datasets, we utilized the focal loss function. This loss function increases the weight of minority classes, aiding the model in better learning and focusing on these minority classes, thereby enhancing the model’s performance on imbalanced datasets. The subsequent subsections will provide a detailed explanation of the methods described above. The overall workflow of the proposed method is illustrated in Figure 1.

3.1. Data Description

3.1.1. NSL-KDD

The dataset is an enhanced version of KDD99. It addresses the issue of numerous duplicate records present in the original dataset [37]. By removing these duplicate records, the NSL-KDD dataset prevents the model from developing biases during training, thereby enhancing the accuracy and reliability of the data. NSL-KDD is imbalanced, with the minority class U2R accounting for only 0.8% of the dataset. Table 2 displays the distribution and proportions of each network traffic type in NSL-KDD.

3.1.2. UNSW-NB15

The dataset combines the normal behavior of modern network traffic with various types of attack behavior, providing researchers with a diverse and realistic collection of network-intrusion-detection datasets [38]. UNSW-NB15 encompasses a wider range of attack traffic types. This dataset is imbalanced, with the minority classes Shellcode and Worms accounting for only 0.59% and 0.07%, respectively, of the dataset. Table 3 displays the distribution and proportions of each network traffic type in UNSW-NB15.

3.1.3. CIC-IDS2017

The dataset comprises network traffic that closely resembles real-world scenarios and includes a greater variety of traffic samples from modern attacks [39]. CIC-IDS2017 is imbalanced, with the minority classes Web Attack, Bot, and Infiltration accounting for only 0.88%, 0.79%, and 0.01%, respectively, of the dataset. Due to the extensive size of the original dataset, we have extracted a representative subset of the CIC-IDS2017 dataset exclusively for our experimental analysis. Table 4 presents a detailed description of the selected subset.

3.2. Data Preprocessing

The datasets used in this study contain nominal features that are not directly compatible with CNN and BiLSTM training. Additionally, the dataset includes instances with missing and redundant data, which requires preprocessing steps to prepare the data for constructing the classification model.

3.2.1. Data Cleaning

The datasets used in this study contain redundant and incomplete data, which can potentially impact the subsequent classification results. Therefore, prioritizing data cleaning is crucial. Before performing data cleaning, it is necessary to identify instances of redundant data and missing values. Redundant data refers to instances that contain duplicate information, whereas missing values indicate instances with incomplete or undefined data. These issues can potentially lead to biased classification model performance. In order to mitigate these issues, we eliminate duplicate data and fill the missing values with the median of the corresponding feature. As a measure of central tendency, because the median can accurately represent the distribution and statistical characteristics of features, this ensures data integrity and consistency.

3.2.2. One-Hot Encoding

NSL-KDD and UNSW-NB15 contain nominal features that are not directly compatible with classification models, such as “protocol_type”, “service”, and “flag”. Therefore, it is imperative to convert these nominal features into numerical features. One common approach to achieve this conversion is through one-hot encoding. In our study, we utilize the “get_dummies” function from the Python pandas library to facilitate the one-hot encoding for the nominal features. By using the “get_dummies” function, binary vectors are generated for each nominal feature, representing each potential feature value with a distinct binary vector.

3.2.3. Normalization

The datasets used in this study comprise diverse features with varying scales and units. Features with larger scales have a greater impact on the model, potentially causing it to overlook features with smaller scales. In order to address this issue, it is crucial to normalize the feature values within a defined range. In this paper, the min-max normalization method is utilized to rescale the feature values into the range of [0, 1]. The method for min-max normalization is given as follows:

x_{n} = \frac{x - x_{min}}{x_{max} - x_{min}}

(1)

where x represents the original feature value,

x_{min}

represents the minimum feature value,

x_{max}

represents the maximum feature value, and

x_{n}

represents the normalized feature value.

3.2.4. 2D Representation

Network traffic data encompasses numerous features, often containing intricate relationships that may be difficult to capture in a one-dimensional data representation. Consequently, there arises a necessity to convert these one-dimensional features into a two-dimensional matrix. By leveraging 2D CNN, we can capture relationships and spatial dependencies that span across diverse feature combinations [40]. After completing the aforementioned data preprocessing steps, the NSL-KDD dataset consists of 121 numerical features that are reshaped into an 11 × 11 × 1 matrix. Likewise, the UNSW-NB15 dataset contains 196 numerical features that are reshaped into a 14 × 14 × 1 matrix. Lastly, the CIC-IDS2017 dataset encompasses 78 numerical features that are reshaped into a 13 × 6 × 1 matrix.

3.2.5. Splitting Dataset

In order to evaluate the effectiveness of CBF-IDS, we split the dataset into separate training and testing sets. We utilize the “train_test_split” function from the “sklearn.model_ selection” library to perform the splitting. More specifically, we divide 80% of the data for training the model and reserve 20% of the data for testing and evaluating the model.

3.3. Model Architecture

CBF-IDS integrates CNN and BiLSTM, enabling the efficient processing of large-scale and high-dimensional network traffic. Furthermore, to tackle the significant issue of class imbalance within the dataset, the model incorporates the focal loss. The CNN component excels at extracting spatial features and capturing significant patterns in the input data. On the other hand, the BiLSTM component excels at capturing temporal dependencies, allowing the model to analyze the sequential nature of time series data.

The initial three layers of this model consist of convolutional layers, with their respective numbers of convolutional kernels set at 64, 128, and 256. These convolutional layers utilize kernels to extract salient features and generate informative feature maps. The feature maps are subjected to dimension reduction by utilizing a max-pooling layer with a size of 2 × 2, aiming to decrease the dimension of the data while preserving crucial features. Subsequently, the model adds a batch normalization (BN) layer, which greatly enhances its generalization capabilities. In order to fulfill the input requirements of the subsequent BiLSTM layer, a reshape layer is integrated after the BN step. This layer allows for adjusting the output dimensions from the preceding layer to align with the expected dimensions of the subsequent BiLSTM layer. The BiLSTM with 64 units is utilized to process reshaped time series features. In order to prevent overfitting, a dropout layer with a parameter of 0.5 is added after the BiLSTM layer. Next, a fully connected layer is added to connect all neurons from the previous layer, facilitating the integration of feature extraction. Finally, to derive the probability distribution of each data class, the softmax function is applied as the activation function. During the training process, we employed the Adam optimizer, with a learning rate set at 0.001. The loss function used was focal loss, which increased the weight of the minority class to address the issue of class imbalance.

3.3.1. Convolutional Neural Network

The proposed model synergistically combines essential components from the CNN architecture, including three convolutional layers, a pooling layer, and a BN layer. These components work together to capture meaningful patterns and extract significant features from the input data. Figure 2 depicts a representative CNN structure. The initial three layers of the model consist of convolutional layers, with 64, 128, and 256 respective convolutional kernels. Each kernel possesses a size of 2 × 2, and a rectified linear unit (ReLU) is employed as the activation function. Convolutional layers play a crucial role in this model by extracting features through kernels. The kernel can be regarded as a moving matrix that traverses the input feature matrix both horizontally and vertically [41]. During this movement, dot product calculations are performed between the input feature matrix and the corresponding kernel weights. This process continues until the kernel’s movement stops, resulting in a new output feature matrix. Figure 3 provides a depiction of the convolution operation between the input features and the kernel. The presence of three convolutional layers enhances the model’s hierarchical feature extraction capability, enabling it to extract more abstract and higher-level feature representations. Furthermore, the varying numbers of convolutional kernels aid the model in capturing features across multiple scales, thereby further improving the model’s feature representation capacity [42].

After the convolutional layers, we performed downsampling on the input feature maps using a 2 × 2 pooling layer. This process reduces the spatial dimension and computational complexity of the features [43]. Moreover, this downsampling process helps reduce the number of network parameters, which mitigates the risk of overfitting [44]. Max pooling and average pooling are two widely adopted methods. Max pooling divides the input feature matrix into distinct regions and retains the maximum value from each region. In contrast, average pooling calculates the mean value within each region. In this study, we choose to use the max pooling method because it can emphasize outstanding features. Figure 4 presents the pooling operation.

Following three convolutional layers and a pooling layer, the distribution changes in the input data pose challenges such as internal covariate shift, gradient vanishing, and reduced training efficiency. In order to overcome these challenges, BN is employed during the training process to adjust the mean and variance of each data batch [44]. A reshape layer is introduced to meet the input requirements of the subsequent BiLSTM layer by reshaping the output feature tensor. Through the integration of these fundamental components of CNN, the proposed model achieves efficient feature extraction, dimension reduction, normalization, and tensor adjustment. This integration significantly enhances the model’s performance and generalization capability.

3.3.2. Bidirectional Long Short-Term Memory

After extracting spatial features through the CNN component, the model will subsequently shift its focus towards extracting temporal features. RNN is commonly used to process time series features. However, they face challenges such as gradient vanishing or exploding when dealing with long sequences. These challenges result in a limited memory span and the loss of early input information [45]. In order to address these challenges, long short-term memory (LSTM) is specifically designed to handle long-term dependencies in time series data by three gates and a cell state [46]. Figure 5 illustrates the structure of LSTM.

The input gate selectively incorporates information into the cell state. The forget gate selectively determines the retention or loss of information by considering both the current input and the hidden state from the previous time step. Furthermore, the output gate selectively determines which parts of the cell state and hidden state should be passed on to the next time step. In the training phase, LSTM modifies the cell state by selectively adding or removing information by using these gate structures. This process is mathematically described by Equations (2)–(7).

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(2)

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(3)

{\tilde{C}}_{t} = t a n h (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C})

(4)

C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t}

(5)

o_{t} = σ (W_{o} [h_{t - 1}, x_{t}] + b_{o})

(6)

h_{t} = o_{t} * tanh (C_{t})

(7)

where

i_{t}

denotes the input gate,

f_{t}

represents the forget gate, and

o_{t}

signifies the output gate.

σ

denotes the sigmoid activation function, and

t a n h

represents the hyperbolic tangent activation function.

{\tilde{C}}_{t}

is the candidate cell state, and

C_{t}

is the cell state.

h_{t - 1}

represents the hidden state at the previous time step,

h_{t}

signifies the hidden state, and

x_{t}

denotes the current time step’s input. W represents the weight, and b signifies the bias parameters. “*” denotes element-wise multiplication, equivalent to ⊗ in Figure 5, and “+” signifies element-wise addition, equivalent to ⊕ in Figure 5.

However, LSTM can only capture unidirectional dependencies in time series data. In the field of network intrusion detection, temporal dependencies hold significant importance. This is because the features of network activities and attack behaviors evolve over time, leading to the manifestation of temporal dependencies in network traffic data. Additionally, attackers employ various strategies to evade detection, resulting in diverse patterns and temporal distributions in terms of their attack behavior. For instance, some attacks may occur during nonworking hours, or attackers might intentionally carry out attacks during holidays when network traffic is typically more congested, making it easier to conceal their malicious activities [47,48]. Therefore, network intrusion detection needs to consider the temporal dependencies before and after attack events to gain a comprehensive understanding of intrusion behavior patterns.

Nevertheless, traditional models and methods often consider only unidirectional temporal dependencies, making it challenging to effectively capture diverse temporal patterns and intrusion behaviors. Hence, in this study, we employed BiLSTM to extract temporal features from network traffic data. BiLSTM possesses the capability to consider both forward and backward dependencies in the data, effectively capturing patterns and behaviors in different time periods and directions, thus enhancing the detection capacity for various intrusion behaviors [49]. Furthermore, through integration with CNN, BiLSTM becomes a powerful hybrid model capable of comprehensively extracting spatiotemporal features from network traffic, thereby enhancing the model’s capability to handle complex intrusion behaviors.

BiLSTM consists of two LSTM units, where one operates in the forward direction to process the input sequence, whereas the other operates in the reverse direction. As depicted in Figure 6, the forward LSTM analyzes the input sequence from the beginning to the end, whereas the backward LSTM operates in the reverse order. At each time step, the hidden states of the two LSTMs are concatenated to form the final hidden state sequence, which is then passed to the next layer. This process is mathematically described by Equations (8)–(10).

{\vec{h}}_{t} = \vec{L S T M} (x_{t}, {\vec{h}}_{t - 1}, {\vec{C}}_{t - 1})

(8)

{\overset{\leftarrow}{h}}_{t} = \overset{\leftarrow}{L S T M} (x_{t}, {\overset{\leftarrow}{h}}_{t + 1}, {\overset{\leftarrow}{C}}_{t + 1})

(9)

h_{t} = [{\vec{h}}_{t}; {\overset{\leftarrow}{h}}_{t}]

(10)

where

{\vec{h}}_{t}

represents the hidden state of the forward LSTM at time t,

\vec{L S T M}

is the LSTM function for processing the input sequence in the forward direction, and

{\vec{C}}_{t - 1}

is the cell state of the forward LSTM at time

t - 1

. Similarly,

{\overset{\leftarrow}{h}}_{t}

represents the hidden state of the backward LSTM at time t,

\overset{\leftarrow}{L S T M}

is the LSTM function for processing the input sequence in the backward direction, and

{\overset{\leftarrow}{C}}_{t + 1}

is the cell state of the backward LSTM at time

t + 1

.

x_{t}

represents the input sequence at time t, and “;” denotes vector concatenation.

In order to effectively capture bidirectional temporal dependencies within network traffic sequences, we introduced a BiLSTM layer comprising 64 units. In order to enhance the model’s generalization performance, we incorporated a dropout layer after the BiLSTM layer, with a dropout rate set to 0.5. The dropout technique randomly selects and discards a portion of neuron outputs with a specified probability, setting the discarded outputs to zero [50]. This dropout operation mitigates the risk of overfitting by reducing interdependencies between neurons in the neural network. The output from the dropout layer is then fed into a fully connected layer, where each neuron in the preceding layer is linked to each neuron in the current layer. This fully connected layer further enhances the model’s feature representation [51]. Finally, by employing the softmax function, the model generates probability distributions for each category, thereby facilitating the accurate classification of different traffic types. The CNN-BiLSTM architecture described above, along with its key parameters, is summarized in Table 5.

3.4. Focal Loss Function

Despite the CNN-BiLSTM architecture’s outstanding capability in extracting spatiotemporal features, the common issue of class imbalance in network traffic data might still potentially have a detrimental impact on model performance. Therefore, to effectively mitigate this challenge, we introduce the focal loss function [36]. Focal loss is an improvement upon the cross entropy (CE) loss. During the training phase, the model computes the loss by comparing the predicted probability distribution with the actual probability distribution. By using the loss value, the model employs the backpropagation algorithm to update the parameters of each layer in the network, thereby minimizing the loss and enhancing prediction accuracy. The CE loss function is shown in Equation (11).

C E (p, y) = \{\begin{matrix} - l o g (p) i f y = 1 \\ - l o g (1 - p) o t h e r w i s e \end{matrix}

(11)

where

y \in {\pm 1}

denotes the ground truth.

p \in [0, 1]

represents the predicted probability of the model for class

y = 1

.

In order to simplify Equation (11), a function defined in terms of p is as follows:

p_{t} = \{\begin{matrix} p i f y = 1 \\ 1 - p o t h e r w i s e \end{matrix}

(12)

where

p_{t}

represents the predicted probability of the t-th sample belonging to a certain class. By combining Equation (12), we can simplify Equation (11) to obtain Equation (13):

C E (p, y) = C E (p_{t}) = - l o g (p_{t})

(13)

When handling imbalanced datasets with the CE loss function, models have a tendency to prioritize training samples from the majority class, leading to lower performance on minority classes. Therefore, it becomes necessary to introduce a weighting factor

α \in [0, 1]

to adjust the weights of positive and negative samples. When

y = 1

, the weight factor is

α

, and when

y = - 1

, the weight factor is

1 - α

. Equation (15) represents the

α

-balanced CE loss function:

α_{t} = \{\begin{matrix} α i f y = 1 \\ 1 - α o t h e r w i s e \end{matrix}

(14)

C E (p_{t}) = - α_{t} l o g (p_{t})

(15)

where

α_{t}

denotes the weight of the class for the t-th sample.

The utilization of a weighting factor helps alleviate the issue of imbalanced distribution between positive and negative samples. However, it does not mitigate the impact of easy samples on the model’s performance. Despite assigning a lower weighting factor to easy samples, their loss still contributes significantly to the overall loss when their numbers are high. In order to address this issue, focal loss introduces a modulating factor that diminishes the influence of easy samples in the overall loss function, allowing the model to focus more on hard samples. Focal loss is shown in Equation (16).

F L (p_{t}) = - α_{t} {(1 - p_{t})}^{γ} l o g (p_{t})

(16)

{(1 - p_{t})}^{γ}

is the modulating factor, where

γ

represents the focusing parameter. As

p_{t}

approaches 1, it indicates that the sample is easy to classify. Consequently, the modulating factor tends towards 0, diminishing the impact of easy samples on the overall loss. As the value of

γ

increases, the loss contributed by easy samples decreases. By integrating the weighting factor

α

and the modulating factor

{(1 - p_{t})}^{γ}

, the model can effectively tackle the challenge of imbalanced distribution between positive and negative samples while alleviating the impact of easy samples on model performance.

Focal loss demonstrates unique advantages when addressing class imbalance issues without the need for introducing additional data operations. It simply involves adjusting the weighting factor and modulating factor within the loss function, thus avoiding potential noise or information loss introduced by additional data operations. Furthermore, it maintains relatively low computational costs. In contrast, oversampling and undersampling typically require the addition or removal of a significant number of data samples, which can lead to alterations in data distribution, the loss of crucial information, and a substantial increase in computational overhead [15,29].

When compared to other cost-sensitive learning methods, focal loss does not necessitate the introduction of an extra cost matrix or a multitude of redundant parameters. This simplifies the model’s training and tuning process. Some cost-sensitive learning techniques, such as cost-sensitive support vector machines (SVMs), often require researchers to predefine a cost matrix before training the model. This matrix includes cost weights between different classes, demanding prior knowledge of different classes and introducing numerous extra parameters for managing the cost matrix and weights [52,53].

The proposed method integrates the aforementioned components, CNN, BiLSTM, and the focal loss function, to construct a comprehensive network intrusion detection model. Within this model, the CNN-BiLSTM structure comprehensively captures the features of network traffic data. Specifically, the CNN is responsible for extracting spatial features, with its primary objective being the capture of local patterns and crucial features within the data. On the other hand, BiLSTM focuses on modeling temporal features, effectively capturing the data’s temporal dependencies. The focal loss function plays a pivotal role in the model, addressing class imbalance issues by adjusting the weighting factor and modulating factor within the loss function, all without introducing additional data operations or a cost matrix. This integration endows the model with a more powerful feature representation capability, effectively addressing the challenges posed by imbalanced classes. As a result, the model can more accurately detect network intrusion behavior, enhancing the performance of network intrusion detection.

3.5. Evaluation Metrics

In order to evaluate the performance of the proposed model, we employ several commonly used evaluation metrics, including accuracy, recall, precision, and F1 score. Accuracy is a reliable metric when dealing with balanced datasets. However, in this study, imbalanced datasets were used. Imbalanced datasets consist of a significantly larger number of samples in the majority class, often resulting in model bias towards the majority class while neglecting the minority class. Therefore, accuracy should be regarded as a supplementary metric to evaluate model performance rather than the sole metric. When handling imbalanced datasets, the F1 score is widely employed as an evaluation metric as it incorporates both precision and recall, offering a more comprehensive assessment of model performance [54]. The evaluation metrics were computed using the values in the confusion matrix, which is presented in Table 6.

Accuracy is the proportion of samples correctly identified out of all the samples. The calculation equation is as follows:

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(17)

Recall, also called detection rate, refers to the ratio of the number of correctly identified samples belonging to a specific class to the total number of samples in that class. Recall is used to evaluate the ability of an IDS to identify attacks of a specific class. The calculation equation is as follows:

R e c a l l = \frac{T P}{T P + F N}

(18)

Precision is the proportion of samples identified as a specific class that are correctly identified, relative to the total number of samples identified as that class. The calculation equation is as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(19)

The F1 score represents the harmonic mean of precision and recall. It effectively assesses the balance between precision and recall in the classifier. The calculation equation is as follows:

F 1 - s c o r e = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(20)

4. Experimental Results and Analysis

The experimental platform used in this study consisted of 32 GB of memory, an AMD Ryzen 7 5800H @ 3.2GHz CPU, an NVIDIA GeForce RTX 3070 (8G) Laptop GPU, and Windows 10. The proposed model was implemented using the programming framework TensorFlow, accelerated by a GPU, and the DL library Keras. After hyperparameter optimization, the method proposed in this paper employs the hyperparameters, as shown in Table 7.

We conducted an evaluation of the proposed CBF-IDS using three imbalanced datasets: NSL-KDD, UNSW-NB15, and CIC-IDS2017. In order to assess the effectiveness of CBF-IDS, we compared it with several algorithms. Additionally, we measured the performance of various CNN and BiLSTM structures and their hybrid models in multi-classification, and we conducted an evaluation of the performance of the focal loss function on minority class samples and compared it with other loss function methods, including CBC-IDS (using the CE loss function), CBA-IDS (using the

α

-balanced CE loss function), and the proposed CBF-IDS (using the focal loss function). As highlighted in Section 3.5, these datasets are imbalanced, and when evaluating model performance, it is necessary to consider metrics such as recall, precision, and F1 score in addition to accuracy. The F1 score, in particular, is a crucial metric for evaluating the effectiveness of IDS.

4.1. NSL-KDD Results and Analysis

We conducted a comprehensive evaluation and comparison of CBF-IDS with other algorithms on the NSL-KDD dataset. The experimental results indicate that CBF-IDS achieved an F1 score of 99.40%. For a detailed overview of the multi-classification performance of CBF-IDS and other algorithms on the NSL-KDD dataset, Table 8 is presented, highlighting the best metrics in bold. From Table 8, it can be observed that CBF-IDS achieves slightly lower accuracy compared to the work presented in [32]. This difference arises because of [32] addressed class imbalance in the NSL-KDD dataset by combining ADASYN with the Tomek links techniques. Although ADASYN-Tomek links represents an effective method for handling class imbalance, it does come with certain challenges. This method may be affected by the curse of dimensionality, particularly when dealing with large-scale and high-dimensional data. Additionally, both ADASYN and Tomek links sampling techniques require the analysis and computation of each sample to determine which samples need synthetic generation. This can potentially result in longer computation times and increased memory requirements. In contrast, CBF-IDS employs the focal loss function to address class imbalance without the need for additional data operations. It achieves this by adjusting the weighting factor and modulating factor in the loss function.

In Table 8, CBC-IDS, CBA-IDS, and CBF-IDS represent models with the same CNN-BiLSTM structure but have different loss functionsintegrated within them. For multi-classification on NSL-KDD, CBF-IDS, using the focal loss function, outperforms other loss functions in terms of performance. Table 9 displays a performance comparison of various model structures for multi-classification on NSL-KDD, highlighting the best metrics in bold. We observed that as the number of convolutional layers increased, CNN models showed improved performance in multi-classification. Furthermore, for the CNN-BiLSTM hybrid model with the same number of convolutional layers, integrating BiLSTM improved multi-classification performance when compared to pure CNN models without BiLSTM integration.

In order to assess the impact of focal loss on the model’s performance in handling the minority class, we conducted an evaluation of the CNN-BiLSTM model’s performance in each class. Table 10 presents the precision, recall, and F1 scores of the model for different classes using three loss functions: CE,

α

-balanced CE, and focal loss. The best F1 scores are highlighted in bold. The results in Table 10 show that CBF-IDS, when using focal loss, achieves the highest F1 score for all network traffic types. As depicted in Figure 7, CBF-IDS attains a precision of 77.78%, a recall of 58.33%, and an F1 score of 66.67% for the minority class U2R. CBF-IDS outperforms other models in terms of detection performance metrics for the minority class U2R. When compared to CBC-IDS, CBF-IDS demonstrates improvements of 16.66%, 133.32%, and 83.36% in precision, recall, and F1 score, respectively. Similarly, when compared to CBA-IDS, CBF-IDS exhibits improvements of 16.66%, 39.98%, and 30.01% in precision, recall, and F1 score, respectively.

4.2. UNSW-NB15 Results and Analysis

We conducted a comprehensive evaluation of CBF-IDS’s detection capabilities against novel attacks on the UNSW-NB15 dataset, comparing it with other algorithms. The experimental findings demonstrate that CBF-IDS surpasses other algorithms in all metrics, achieving an F1 score of 79.61%. Table 11 presents a detailed overview of the multi-classification performance of CBF-IDS and other algorithms on the UNSW-NB15 dataset, highlighting the best metrics in bold.

Table 11 presents the performance of CBC-IDS, CBA-IDS, and CBF-IDS on UNSW-NB15. These models share the same CNN-BiLSTM structure but are integrated with different loss functions. It can be observed that CBF-IDS, which employs the focal loss function, outperforms other loss functions. In Table 12, a performance comparison of different model structures in the multi-classification on UNSW-NB15 is provided. The best metrics are highlighted in bold. With an increase in the number of convolutional layers, CNN models exhibit improved performance in multi-classification. Additionally, within the CNN-BiLSTM hybrid model, we observe that models integrating BiLSTM outperform pure CNN models without BiLSTM integration, with the same number of convolutional layers, in multi-classification.

The performance of the CNN-BiLSTM model using three different loss functions (CE,

α

-balanced CE, and focal loss) on different classes is presented in Table 13. The best F1 scores are highlighted in bold. These results demonstrate that CBF-IDS, when using focal loss, achieves the highest F1 score across all network traffic types.

Figure 8a illustrates that CBF-IDS attains a precision of 59.10%, a recall of 65.56%, and an F1 score of 62.17% on the minority class Shellcode. When compared to CBC-IDS, CBF-IDS exhibits significant improvements in precision, recall, and F1 score, with increases of 1.76%, 72.16%, and 35.15%, respectively. Furthermore, despite having lower precision than CBA-IDS, CBF-IDS shows significant improvements in recall and F1 score for Shellcode, with increases of 69.23% and 28.32%, respectively.

Figure 8b illustrates that CBF-IDS attains a precision of 56.00%, a recall of 40.00%, and an F1 score of 46.67% on the minority class Worms. In Figure 8b, it can be observed that while CBF-IDS shows relatively lower precision on the minority class Worms, it achieves excellent performance in terms of recall and F1 score. When compared to CBC-IDS, CBF-IDS achieves significant improvements of 179.92% and 91.35% in recall and F1 score, respectively. Additionally, in comparison to CBA-IDS, CBF-IDS exhibits noteworthy improvements of 100% and 43.34% in recall and F1 score, respectively.

4.3. CIC-IDS2017 Results and Analysis

We conducted a comprehensive evaluation and comparison of CBF-IDS with other algorithms on the CIC-IDS2017 dataset. The experimental results clearly show that CBF-IDS surpasses other algorithms in all performance metrics. Particularly, CBF-IDS achieves an exceptional F1 score of 99.53% on the CIC-IDS2017 dataset. Table 14 presents a comprehensive overview of the multi-classification performance of CBF-IDS and other algorithms on the CIC-IDS2017 dataset, highlighting the best metrics in bold.

Table 14 illustrates that CBF-IDS with focal loss outperforms other loss functions in multi-classification performance on CIC-IDS2017. We also tested models based on different structures that use focal loss for multi-classification performance on the CIC-IDS2017, as shown in Table 15. The best metrics are highlighted in bold. From Table 15, it is evident that increasing the number of convolutional layers enhances model performance. Furthermore, we observe that, under the same convolutional structure, the CNN-BiLSTM hybrid model is superior to pure CNN models.

We conducted a comprehensive evaluation and comparison of the performance of CBF-IDS on various traffic types for the CIC-IDS2017 dataset. Table 16 displays the precision, recall, and F1 scores of the CNN-BiLSTM model utilizing three distinct loss functions: CE,

α

-balanced CE, and focal loss. The best F1 scores are highlighted in bold. CBF-IDS exhibits outstanding performance when assessed across all traffic types. Except for Patator, CBF-IDS attains the highest F1 scores in all remaining categories.

Figure 9a shows that CBF-IDS achieves superior performance metrics for the minority class Bot in terms of precision, recall, and F1 score, reaching 91.97%, 96.18%, and 94.03%, respectively. Notably, CBF-IDS outperforms CBC-IDS in all aspects of minority-class Bot detection. Specifically, CBF-IDS demonstrates improvements of 7.03%, 31.70%, and 19.10% for precision, recall, and F1 score, respectively, compared to CBC-IDS. Moreover, as observed from Figure 9a, despite its precision being relatively lower than that of CBA-IDS, CBF-IDS exhibits improvements of 7.99% in recall and 2.90% in F1 score for Bot.

Figure 9b presents CBF-IDS, which achieves superior performance metrics for the minority class Infiltration in terms of precision, recall, and F1 score, reaching 100.00%, 85.71%, and 92.31%, respectively. It can be observed from Figure 9b that the CNN-BiLSTM model using the three different loss functions all achieve a precision of 100% for Infiltration. This can be attributed to the limited amount of Infiltration traffic and its similarity to normal traffic, which allows it to evade IDS [55,56]. This scenario arises due to the high demand for the model to detect Infiltration, leading to some instances of Infiltration being incorrectly classified as normal traffic. CBF-IDS exhibits outstanding performance in terms of recall and F1 score, achieving remarkable improvements of 200.00% and 107.72% compared to CBC-IDS, and improvements of 19.99% and 10.78% compared to CBA-IDS, respectively.

Figure 9c illustrates the clear advantage of CBF-IDS in terms of Web Attack detection performance metrics, with precision, recall, and F1 score attaining 98.61%, 97.71%, and 98.16%, respectively. Compared to CBC-IDS, CBF-IDS achieves significant improvements of 44.17%, 8.12%, and 26.06% for precision, recall, and F1 score, respectively. Similarly, compared to CBA-IDS, CBF-IDS exhibits improvements of 35.12%, 2.41%, and 18.69% for precision, recall, and F1 score, respectively.

4.4. Runtime Results and Analysis

In the same experimental environment, we conducted a comparison and analysis of the runtime of CBF-IDS with other algorithms. In order to obtain reliable runtime results and analysis, we performed multiple experiments and recorded the training and testing times for each epoch. The final results are based on the averages from these multiple experiments. Table 17 displays the training and testing times for each epoch of CBF-IDS and other algorithms for the NSL-KDD dataset. From the data in Table 17, it is evident that CBF-IDS has the longest training time for a single epoch, which is 71.14 s, and a testing time of 6.52 s. Ref. [24] follows closely with a training time of 69.22 s for one epoch and a testing time of 7.48 s. In contrast, refs. [23,25] and CNN require less training and testing time. Table 18 shows the training and testing times for each epoch of CBF-IDS and other algorithms on the UNSW-NB15 dataset. Through Table 18, we can observe that ref. [24] has the longest training time for one epoch, reaching 163.52 s, with a testing time of 17.14 s. Next is CBF-IDS, with a training time of 127.65 s for one epoch and a testing time of 11.28 s. In comparison, refs. [23,25] and CNN require less training and testing time. Table 19 presents the training and testing times for each epoch of CBF-IDS and other algorithms for the CIC-IDS2017 dataset. From Table 19, it can be seen that CBF-IDS has the longest training and testing times, with a training time of 118.18 s for one epoch and a testing time of 10.24 s. Next is ref. [24], with a training time of 91.00 s for one epoch and a testing time of 8.98 s. In contrast, refs. [23,25] and CNN require relatively shorter training and testing times.

From the results presented in the above tables, we can observe that CBF-IDS and ref. [24] require more runtime. Primarily, this is because CBF-IDS employs a hybrid model, including a BiLSTM layer aimed at capturing temporal features more effectively. While the authors of ref. [24] only used a BiLSTM model, they employed two layers of BiLSTM, leading to more significant time consumption. The BiLSTM model in the table has the same structure and parameter settings as the BiLSTM layer in CBF-IDS, with only one layer of BiLSTM used. It can be observed that it consumes time, ranking third in the list. As described in Section 3.3.2, BiLSTM consists of forward and backward LSTM structures, typically involving a large number of parameters. Some research results suggest that although BiLSTM models require more training time, they can provide more accurate predictive performance [57,58]. Ref. [23] utilizes the CNN model, whereas ref. [25] employs the DNN model. Both refs. [23,25] and the CNN model in the table exhibit relatively less time consumption. While our proposed method incurs higher time consumption, this investment yields an improvement in model performance.

5. Discussion

The evaluation results of CBF-IDS with different datasets confirm its excellent performance as an IDS. Our experimental results demonstrate that CBF-IDS excels in a range of key performance metrics, including accuracy, precision, recall, and F1 score, outperforming other models. These findings underscore its effectiveness in accurately identifying and classifying network intrusions, providing a promising solution for enhancing network security defenses.

Our research investigation into focal loss is particularly intriguing. The experimental results of CBF-IDS, when using diverse datasets, provide validation for the efficacy of focal loss in improving model performance. Our research findings consistently demonstrate that CBF-IDS achieves the highest recall and F1 score for minority classes across three datasets when employing focal loss. When compared to models using traditional CE or

α

-balanced CE, this significant performance improvement affirms the efficacy of focal loss in tackling the challenge of class imbalance. This finding holds particular significance in the field of intrusion detection, where accurately detecting rare and subtle attack patterns is crucial for effective threat mitigation.

The three datasets used in this paper, NSL-KDD, UNSW-NB15, and CIC-IDS2017, are widely utilized in traditional intrusion detection research and have gained recognition for their validation and effectiveness in traditional network environments [24,27,32,59]. However, some studies have attempted to employ these datasets for intrusion detection research in the context of the Internet of Things (IoT) to assess the performance of IDS for IoT. It is important to note that IoT environments differ from traditional network environments, and these differences may result in the IDSs that are trained on traditional network intrusion detection datasets being less effective at detecting actual intrusion behaviors in IoT environments [60]. In order to more accurately evaluate the performance of CBF-IDS in IoT environments and enhance its generalization capability across diverse network environments, we plan to utilize datasets specifically designed for the IoT environments in our future work to train the model.

The longer runtime of CBF-IDS can be attributed to its hybrid model characteristic, which demands more computational resources and time to enhance the model’s detection performance. It is noteworthy that recent research has successfully employed hybrid models to address network intrusion detection issues [61,62,63]. Al et al. [61] proposed an IDS that combines CNN and LSTM and validated its effectiveness in a big data environment. Altunay et al. [62] developed an IDS based on a CNN-LSTM architecture for securing industrial IoT (IIoT). Furthermore, Khan et al. [63] introduced a hybrid convolutional recurrent neural network (CNN-RNN) NIDS, demonstrating its effectiveness in the field of network intrusion. These successful studies demonstrate that employing hybrid models is a viable and effective approach when dealing with intrusion detection challenges. CBF-IDS, as a CNN-BiLSTM-based hybrid model, excels in handling large-scale and high-dimensional datasets due to its relatively higher complexity. Especially in network environments with ample computational capabilities and hardware resources, such as IIoT [62], CBF-IDS can fully leverage its advantages. Network environments equipped with sufficient computational capabilities and hardware resources render the deployment of CBF-IDS feasible. In these scenarios, the complexity of CBF-IDS is no longer a limiting factor; instead, it becomes an advantage, enabling the effective handling of complex network intrusion behaviors.

However, it is essential to recognize that not all network environments possess the computational capabilities and hardware resources suitable for deploying CBF-IDS. In some scenarios, it is the complexity of CBF-IDS that may pose limitations. The training and testing processes of CBF-IDS require more computational resources and time, which are not problematic for network environments with substantial computational capabilities. However, in scenarios with limited computational capabilities and a need for rapid response to intrusion threats, prolonged training and testing may have adverse effects. Therefore, the deployment of CBF-IDS in such scenarios needs to be carefully considered. Additionally, resource-constrained network environments pose challenges to the deployment of CBF-IDS, especially in some IoT devices, including small sensor nodes and smart wearable devices. These devices typically have compact sizes and simple structures, making them unsuitable for hosting high-performance computing and storage equipment. Consequently, they have limited computational resources and memory [64]. The complexity of CBF-IDS may exceed the performance limits of these devices, making it difficult to deploy effectively in such environments.

Therefore, future work will focus on addressing the challenges posed by the complexity of CBF-IDS to further enhance its efficiency, meeting the requirements of diverse network environments. Feature engineering is considered an effective solution, and recent research has successfully utilized feature engineering to improve the performance and efficiency of IDS. Padmashree et al. [26] utilized a recursive feature elimination decision tree based on the Pearson correlation method, which can eliminate irrelevant features, improve resource utilization, and reduce the complexity of IDS. Alzaqebah et al. [27] used the MGWO to select the optimal feature set, excluding irrelevant and noisy features, thereby enhancing the efficiency and performance of the IDS. In addition, Alharbi et al. [28] proposed the LGBA-NN to determine feature subsets and hyperparameters, effectively improving the detection performance of botnet attacks. These research findings provide important inspiration for our future work. They clearly demonstrate that feature engineering is an effective method for improving IDS detection efficiency. Therefore, in future work, we will draw inspiration from and enhance these methods to reduce the computational resources and time consumption of CBF-IDS.

6. Conclusions

In this study, we introduce CBF-IDS, an innovative IDS that effectively addresses the issue of class imbalance in IDS. The proposed model integrates CNN and BiLSTM, enabling it to capture both spatial and temporal features in network traffic. A notable aspect of CBF-IDS is its utilization of the focal loss function to handle class imbalance. By incorporating focal loss, the model assigns higher weights to minority classes, effectively prioritizing their correct classification. We assessed the effectiveness and generalization capability of CBF-IDS by employing three benchmark datasets. The experimental findings consistently demonstrate that CBF-IDS outperforms other classification algorithms. It is worth noting that focal loss performs better in minority classes compared to CE and

α

-balanced CE, further emphasizing its effectiveness in addressing the issue of class imbalance. Our model achieves remarkable performance when using the three datasets. Specifically, for the NSL-KDD dataset, CBF-IDS achieves 99.40% accuracy and an F1 score of 99.40%. On the UNSW-NB15 dataset, CBF-IDS achieves 82.30% accuracy and an F1 score of 79.61%. Finally, for the CIC-IDS2017 dataset, CBF-IDS achieves 99.53% accuracy and an F1 score of 99.53%. These results unequivocally showcase the effectiveness of the proposed CBF-IDS in mitigating the issue of class imbalance in IDS, thus augmenting the overall performance of IDS. The proposed model is a hybrid model that achieves excellent detection performance at the cost of increased model complexity.

Future work will focus on two aspects. First, we plan to use datasets specifically designed for the IoT environment to train the model, enhancing the generalization capability of CBF-IDS in diverse network environments. Second, given that CBF-IDS is a hybrid model that demands more computational resources and time to improve detection performance, we will use feature engineering to enhance the model’s resource utilization, making it suited for resource-constrained environments. These two aspects of work will further refine CBF-IDS, making it applicable to different types of network environments.

Author Contributions

Conceptualization, H.P. and C.W.; methodology, H.P.; software, H.P.; validation, H.P. and Y.X.; formal analysis, H.P. and Y.X.; investigation, H.P. and Y.X.; resources, C.W.; data curation, H.P.; writing—original draft preparation, H.P.; writing—review and editing, C.W. and Y.X.; visualization, H.P.; supervision, C.W.; project administration, H.P. and C.W.; funding acquisition, C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the “Pioneer” and “Leading Goose” R&D Program of Zhejiang (2022C01085).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This study used the NSL-KDD dataset, USNW-NB15 dataset, and CIC-IDS2017 dataset, all of which are publicly available datasets. They can be found at https://www.unb.ca/cic/datasets/nsl.html, https://research.unsw.edu.au/projects/unsw-nb15-dataset, and https://www.unb.ca/cic/datasets/ids-2017.html (accessed on 31 August 2023).

Conflicts of Interest

The authors declare no conflict of interest.

References

Liao, H.J.; Lin, C.H.R.; Lin, Y.C.; Tung, K.Y. Intrusion detection system: A comprehensive review. J. Netw. Comput. Appl. 2013, 36, 16–24. [Google Scholar] [CrossRef]
Sohal, A.S.; Sandhu, R.; Sood, S.K.; Chang, V. A cybersecurity framework to identify malicious edge device in fog computing and cloud-of-things environments. Comput. Secur. 2018, 74, 340–354. [Google Scholar] [CrossRef]
Costante, E.; Fauri, D.; Etalle, S.; Den Hartog, J.; Zannone, N. A hybrid framework for data loss prevention and detection. In Proceedings of the 2016 Security and Privacy Workshops (SPW), San Jose, CA, USA, 22–26 May 2016; pp. 324–333. [Google Scholar]
Suthishni, D.N.P.; Kumar, K.S. A Review on Machine Learning based Security Approaches in Intrusion Detection System. In Proceedings of the 2022 9th International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 23–25 March 2022; pp. 341–348. [Google Scholar]
Pajouh, H.H.; Dastghaibyfard, G.; Hashemi, S. Two-tier network anomaly detection model: A machine learning approach. J. Intell. Inf. Syst. 2017, 48, 61–74. [Google Scholar] [CrossRef]
Sun, P.; Liu, P.; Li, Q.; Liu, C.; Lu, X.; Hao, R.; Chen, J. DL-IDS: Extracting features using CNN-LSTM hybrid network for intrusion detection system. Secur. Commun. Netw. 2020, 2020, 8890306. [Google Scholar] [CrossRef]
Liu, H.; Lang, B.; Liu, M.; Yan, H. CNN and RNN based payload classification methods for attack detection. Knowl.-Based Syst. 2019, 163, 332–341. [Google Scholar] [CrossRef]
Liu, H.; Lang, B. Machine learning and deep learning methods for intrusion detection systems: A survey. Appl. Sci. 2019, 9, 4396. [Google Scholar] [CrossRef]
Lee, S.W.; Mohammed sidqi, H.; Mohammadi, M.; Rashidi, S.; Rahmani, A.M.; Masdari, M.; Hosseinzadeh, M. Towards secure intrusion detection systems using deep learning techniques: Comprehensive analysis and review. J. Netw. Comput. Appl. 2021, 187, 103111. [Google Scholar] [CrossRef]
Ferrag, M.A.; Maglaras, L.; Moschoyiannis, S.; Janicke, H. Deep learning for cyber security intrusion detection: Approaches, datasets, and comparative study. J. Inf. Secur. Appl. 2020, 50, 102419. [Google Scholar] [CrossRef]
Gamage, S.; Samarabandu, J. Deep learning methods in network intrusion detection: A survey and an objective comparison. J. Netw. Comput. Appl. 2020, 169, 102767. [Google Scholar] [CrossRef]
Chou, D.; Jiang, M. A survey on data-driven network intrusion detection. ACM Comput. Surv. (CSUR) 2021, 54, 1–36. [Google Scholar] [CrossRef]
Spelmen, V.S.; Porkodi, R. A review on handling imbalanced data. In Proceedings of the 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT), Coimbatore, India, 1–3 March 2018; pp. 1–11. [Google Scholar]
Kotsiantis, S.; Kanellopoulos, D.; Pintelas, P. Handling imbalanced datasets: A review. GESTS Int. Trans. Comput. Sci. Eng. 2006, 30, 25–36. [Google Scholar]
Mienye, I.D.; Sun, Y. Performance analysis of cost-sensitive learning methods with application to imbalanced medical data. Informat. Med. Unlocked 2021, 25, 100690. [Google Scholar] [CrossRef]
Telikani, A.; Gandomi, A.H.; Choo, K.K.R.; Shen, J. A cost-sensitive deep learning-based approach for network traffic classification. IEEE Trans. Netw. Service Manag. 2021, 19, 661–670. [Google Scholar] [CrossRef]
Li, W.; Yi, P.; Wu, Y.; Pan, L.; Li, J. A new intrusion detection system based on KNN classification algorithm in wireless sensor network. J. Elect. Comput. Eng. 2014, 2014, 240217. [Google Scholar] [CrossRef]
Tao, P.; Sun, Z.; Sun, Z. An improved intrusion detection algorithm based on GA and SVM. IEEE Access 2018, 6, 13624–13631. [Google Scholar] [CrossRef]
Cui, J.; Zong, L.; Xie, J.; Tang, M. A novel multi-module integrated intrusion detection system for high-dimensional imbalanced data. Appl. Intell. 2023, 53, 272–288. [Google Scholar] [CrossRef]
Ding, Y.; Zhai, Y. Intrusion detection system for NSL-KDD dataset using convolutional neural networks. In Proceedings of the 2nd International Conference on Computer Science and Artificial Intelligence (CSAI), Shenzhen, China, 8–10 December 2018; pp. 81–85. [Google Scholar]
Zhang, B.; Yu, Y.; Li, J. Network intrusion detection based on stacked sparse autoencoder and binary tree ensemble method. In Proceedings of the IEEE International Conference on Communications Workshops (ICC Workshops), Kansas City, MO, USA, 20–24 May 2018; pp. 1–6. [Google Scholar]
Aldweesh, A.; Derhab, A.; Emam, A.Z. Deep learning approaches for anomaly-based intrusion detection systems: A survey, taxonomy, and open issues. Knowl.-Based Syst. 2020, 189, 105124. [Google Scholar] [CrossRef]
Khan, R.U.; Zhang, X.; Alazab, M.; Kumar, R. An improved convolutional neural network model for intrusion detection in networks. In Proceedings of the 2019 Cybersecurity and Cyberforensics Conference (CCC), Melbourne, VIC, Australia, 8–9 May 2019; pp. 74–77. [Google Scholar]
Alsyaibani, O.M.A.; Utami, E.; Hartanto, A.D. An Intrusion Detection System Model Based on Bidirectional LSTM. In Proceedings of the 2021 3rd International Conference on Cybernetics and Intelligent System (ICORIS), Makasar, Indonesia, 25–26 October 2021; pp. 1–6. [Google Scholar]
Arief, M.; Supangkat, S.H. Comparison of CNN and DNN Performance on Intrusion Detection System. In Proceedings of the 9th International Conference on ICT for Smart Society (ICISS), Bandung, Indonesia, 10–11 August 2022; pp. 1–7. [Google Scholar]
Padmashree, A.; Krishnamoorthi, M. Decision Tree with Pearson Correlation-based Recursive Feature Elimination Model for Attack Detection in IoT Environment. Inf. Technol. Control 2022, 51, 771–785. [Google Scholar] [CrossRef]
Alzaqebah, A.; Aljarah, I.; Al-Kadi, O.; Damaševičius, R. A modified grey wolf optimization algorithm for an intrusion detection system. Mathematics 2022, 10, 999. [Google Scholar] [CrossRef]
Alharbi, A.; Alosaimi, W.; Alyami, H.; Rauf, H.T.; Damaševičius, R. Botnet attack detection using local global best bat algorithm for industrial internet of things. Electronics 2021, 10, 1341. [Google Scholar] [CrossRef]
Toldinas, J.; Venčkauskas, A.; Damaševičius, R.; Grigaliūnas, Š.; Morkevičius, N.; Baranauskas, E. A novel approach for network intrusion detection using multistage deep learning image recognition. Electronics 2021, 10, 1854. [Google Scholar] [CrossRef]
Chen, Z.; Zhou, L.; Yu, W. ADASYN- Random Forest Based Intrusion Detection Model. In Proceedings of the 4th International Conference on Signal Processing and Machine Learning, Beijing, China, 18–20 August 2021; pp. 152–159. [Google Scholar]
He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, 1–8 June 2008; pp. 1322–1328. [Google Scholar]
Abdelkhalek, A.; Mashaly, M. Addressing the class imbalance problem in network intrusion detection systems using data resampling and deep learning. J. Supercomput. 2023, 79, 10611–10644. [Google Scholar] [CrossRef]
Lee, J.; Park, K. GAN-based imbalanced data intrusion detection system. Pers. Ubiquitous Comput. 2021, 25, 121–128. [Google Scholar] [CrossRef]
Liu, X.Y.; Zhou, Z.H. The influence of class imbalance on cost-sensitive learning: An empirical study. In Proceedings of the Sixth International Conference on Data Mining (ICDM’06), Hong Kong, China, 18–22 December 2006; pp. 970–974. [Google Scholar]
Zhang, C.; Tan, K.C.; Li, H.; Hong, G.S. A cost-sensitive deep belief network for imbalanced classification. IEEE Trans. Neural Netw. Learn. Syst. 2018, 30, 109–122. [Google Scholar] [CrossRef]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A.A. A detailed analysis of the KDD CUP 99 data set. In Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar]
Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the Military Communications and Information Systems Conference (MilCIS), Canberra, ACT, Australia, 10–12 November 2015; pp. 1–6. [Google Scholar]
Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018), Funchal, Portugal, 22–24 January 2018; Volume 1, pp. 108–116. [Google Scholar]
Singh, A.; Jang-Jaccard, J. Autoencoder-based Unsupervised Intrusion Detection using Multi-Scale Convolutional Recurrent Networks. arXiv 2022, arXiv:2204.03779. [Google Scholar]. [Google Scholar]
Tran, N.N.; Sarker, R.; Hu, J. An approach for host-based intrusion detection system design using convolutional neural network. In Proceedings of the International Conference, Monami 2017, Melbourne, Australia, 13–15 December 2017; Springer: Berlin/Heidelberg, Germany, 2018; pp. 116–126. [Google Scholar]
Zhou, Y.; Jing, W.; Wang, J.; Chen, G.; Scherer, R.; Damaševičius, R. MSAR-DefogNet: Lightweight cloud removal network for high resolution remote sensing images based on multi scale convolution. IET Image Process 2022, 16, 659–668. [Google Scholar] [CrossRef]
Xiao, Y.; Xing, C.; Zhang, T.; Zhao, Z. An intrusion detection model based on feature reduction and convolutional neural networks. IEEE Access 2019, 7, 42210–42219. [Google Scholar] [CrossRef]
Azizjon, M.; Jumabek, A.; Kim, W. 1D CNN based network intrusion detection with normalization on imbalanced data. In Proceedings of the International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Fukuoka, Japan, 19–21 February 2020; pp. 218–224. [Google Scholar]
Nugaliyadde, A.; Sohel, F.; Wong, K.W.; Xie, H. Language modeling through Long-Term memory network. In Proceedings of the 2019 International Joint Conference on Neural Networks (IJCNN), Budapest, Hungary, 14–19 July 2019; pp. 1–6. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Rodriguez, A.; Okamura, K. Generating real time cyber situational awareness information through social media data mining. In Proceedings of the 2019 IEEE 43rd annual computer software and applications conference (COMPSAC), Milwaukee, WI, USA, 15–19 July 2019; Volume 2, pp. 502–507. [Google Scholar]
Almahmoud, Z.; Yoo, P.D.; Alhussein, O.; Farhat, I.; Damiani, E. A holistic and proactive approach to forecasting cyber threats. Sci. Rep. 2023, 13, 8049. [Google Scholar] [CrossRef]
Islam, N.; Farhin, F.; Sultana, I.; Kaiser, M.S.; Rahman, M.S.; Mahmud, M.; SanwarHosen, A.; Cho, G.H. Towards Machine Learning Based Intrusion Detection in IoT Networks. Comput. Mater. Contin. 2021, 69, 1801–1821. [Google Scholar] [CrossRef]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Su, T.; Sun, H.; Zhu, J.; Wang, S.; Li, Y. BAT: Deep learning methods on network intrusion detection using NSL-KDD dataset. IEEE Access 2020, 8, 29575–29585. [Google Scholar] [CrossRef]
He, H.; Garcia, E.A. Learning from imbalanced data. EEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar]
Li, Z.; Zhang, J.; Yao, X.; Kou, G. How to identify early defaults in online lending: A cost-sensitive multi-layer learning framework. Knowl.-Based Syst. 2021, 221, 106963. [Google Scholar] [CrossRef]
Ahsan, R.; Shi, W.; Corriveau, J.P. Network intrusion detection using machine learning approaches: Addressing data imbalance. IET Cyber-Phys. Syst. Theory Appl. 2022, 7, 30–39. [Google Scholar] [CrossRef]
Emeç, M.; Özcanhan, M.H. A hybrid deep learning approach for intrusion detection in IoT networks. Adv. Electr. Comput. Eng. 2022, 22, 3–12. [Google Scholar] [CrossRef]
Kaur, G.; Lashkari, A.H.; Rahali, A. Intrusion traffic detection and characterization using deep image learning. In Proceedings of the 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), Calgary, AB, Canada, 17–22 August 2020; pp. 55–62. [Google Scholar]
Imrana, Y.; Xiang, Y.; Ali, L.; Abdul-Rauf, Z. A bidirectional LSTM deep learning approach for intrusion detection. Expert Syst. Appl. 2021, 185, 115524. [Google Scholar] [CrossRef]
Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The performance of LSTM and BiLSTM in forecasting time series. In Proceedings of the 2019 IEEE International conference on big data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 3285–3292. [Google Scholar]
Chen, J.; Wu, D.; Zhao, Y.; Sharma, N.; Blumenstein, M.; Yu, S. Fooling intrusion detection systems using adversarially autoencoder. Digit. Commun. Netw. 2021, 7, 453–460. [Google Scholar] [CrossRef]
Haque, S.; El-Moussa, F.; Komninos, N.; Muttukrishnan, R. A Systematic Review of Data-Driven Attack Detection Trends in IoT. Sensors 2023, 23, 7191. [Google Scholar] [CrossRef]
Al, S.; Dener, M. STL-HDL: A new hybrid network intrusion detection system for imbalanced dataset on big data environment. Comput. Secur. 2021, 110, 102435. [Google Scholar] [CrossRef]
Altunay, H.C.; Albayrak, Z. A hybrid CNN+ LSTMbased intrusion detection system for industrial IoT networks. Eng. Sci. Technol. Int. J. 2023, 38, 101322. [Google Scholar]
Khan, M.A. HCRNNIDS: Hybrid convolutional recurrent neural network-based network intrusion detection system. Processes 2021, 9, 834. [Google Scholar] [CrossRef]
Lee, S.J.; Yoo, P.D.; Asyhari, A.T.; Jhi, Y.; Chermak, L.; Yeun, C.Y.; Taha, K. IMPACT: Impersonation attack detection via edge computing using deep autoencoder and feature abstraction. IEEE Access 2020, 8, 65520–65529. [Google Scholar] [CrossRef]

Figure 1. Methodology flowchart.

Figure 2. Typical CNN architecture.

Figure 3. Convolution operation.

Figure 4. Pooling operation.

Figure 5. LSTM architecture.

Figure 6. BiLSTM architecture.

Figure 7. Performance of the CNN-BiLSTM model using different loss functions on the minority class U2R for NSL-KDD.

Figure 8. Performance of the CNN-BiLSTM model using different loss functions on the minority class for UNSW-NB15: (a) Performance for the minority class Shellcode; (b) performance for the minority class Worms.

Figure 9. Performance of the CNN-BiLSTM model using different loss functions on the minority class for CIC-IDS2017: (a) performance on the minority class Bot; (b) performance on the minority class Infiltration; (c) performance on the minority class Web Attack.

Table 2. Composition of NSL-KDD.

Class	Size	Ratio (%)
Normal	77,232	52.00
DoS	53,387	35.95
Probe	14,077	9.48
R2L	3702	2.49
U2R	119	0.08
Total	148,517	100.00

Table 3. Composition of UNSW-NB15.

Class	Size	Ratio (%)
Normal	93,000	36.09
Generic	58,871	22.85
Exploits	44,525	17.28
Fuzzers	24,246	9.41
DoS	16,353	6.34
Reconnaissance	13,987	5.43
Analysis	2677	1.04
Backdoor	2329	0.90
Shellcode	1511	0.59
Worms	174	0.07
Total	257,673	100.00

Table 4. Composition of CIC-IDS2017.

Class	Raw Dataset	Subdataset	Ratio (%)
Normal	2,273,097	130,000	52.42
DoS	380,699	75,000	30.24
PortScan	158,930	25,000	10.08
Patator	13,835	13,835	5.58
Web Attack	2180	2180	0.88
Bot	1966	1966	0.79
Infiltration	36	36	0.01
Total	2,830,743	248,017	100.00

Table 5. CNN-BiLSTM architecture and key parameters.

Layer	Configuration
Input	The input features consist of a two-dimensional matrix transformed in Section 3.2.4
Conv2D	Filter = 64, Kernel Size = 2 × 2
Conv2D	Filter = 128, Kernel Size = 2 × 2
Conv2D	Filter = 256, Kernel Size = 2 × 2
MaxPooling2D	Pool Size= 2 × 2
Batch Normalization	Default
Reshape	Adjustment based on input data
BiLSTM	Units = 64
Dropout	Dropout Rate = 0.5
Dense	The number of neurons is equal to the number of classes in the dataset.
Output	Activation = SoftMax

Table 6. Confusion matrix.

	Predicted Positive	Predicted Negative
Actual Positive	True Positive (TP)	False Negative (FN)
Actual Negative	False Positive (FP)	True Negative (TN)

Table 7. Hyperparameter setting.

Hyperparameter	Setting
Batch Size	32
Optimizer	Adam
Learning Rate	0.001
CNN Activation Function	Relu
BiLSTM Activation Function	Tanh
Dropout Rate	0.5
Loss Function	Focal Loss
$α$	0.25
$γ$	2
Epoch	50

Table 8. Performance of multi-classification for NSL-KDD.

Algorithm	Accuracy	Precision	Recall	F1 Score
SVM	96.97	96.96	96.97	96.95
KNN	99.11	99.10	99.11	99.11
[23]	99.01	98.99	99.01	98.99
[24]	97.84	97.80	97.84	97.79
[25]	98.82	98.79	98.82	98.80
[32]	99.98	-	-	-
CBC-IDS	99.01	99.01	99.01	99.00
CBA-IDS	99.23	99.22	99.23	99.22
CBF-IDS	99.40	99.40	99.40	99.40

Table 9. Multi-classification performance of different model structures for NSL-KDD.

Model	Layer	Accuracy	Precision	Recall	F1 Score
CNN	1 Conv	97.51	97.63	97.51	97.54
	2 Conv	98.54	98.51	98.54	98.49
	3 Conv	99.10	99.08	99.10	99.09
BiLSTM	1 BiLSTM	99.02	99.01	99.02	99.01
CNN-BiLSTM	1 Conv + 1 BiLSTM	98.93	98.93	98.93	98.92
	2 Conv + 1 BiLSTM	99.19	99.13	99.19	99.14
	3 Conv + 1 BiLSTM *	99.40	99.40	99.40	99.40

* i.e., CBF-IDS.

Table 10. Performance of each class for NSL-KDD.

Class		CBC-IDS			CBA-IDS			CBF-IDS
Class	Precision	Recall	F1 Score	Precision	Recall	F1 Score	Precision	Recall	F1 Score
DoS	99.81	99.88	99.85	99.81	99.94	99.88	99.85	99.91	99.88
Normal	99.24	98.94	99.09	99.24	99.33	99.29	99.46	99.47	99.46
Probe	98.24	99.04	98.64	99.11	98.72	98.91	99.25	99.15	99.20
R2L	86.51	90.14	88.29	91.53	90.54	91.03	92.98	93.11	93.05
U2R	66.67	25.00	36.36	66.67	41.67	51.28	77.78	58.33	66.67

Table 11. Performance of multi-classification for UNSW-NB15.

Algorithm	Accuracy	Precision	Recall	F1 Score
SVM	76.88	74.49	76.88	72.31
KNN	76.48	76.13	76.48	76.17
[23]	79.06	79.77	79.06	78.58
[24]	79.05	78.18	79.05	75.88
[25]	80.02	79.22	80.02	78.80
[27]	80.93	-	-	78.08
CBC-IDS	81.31	80.48	81.31	78.32
CBA-IDS	81.67	82.25	81.67	78.69
CBF-IDS	82.30	82.76	82.30	79.61

Table 12. Multi-classification performance of different model structures for UNSW-NB15.

Model	Layer	Accuracy	Precision	Recall	F1 Score
CNN	1 Conv	76.31	74.43	76.31	74.30
	2 Conv	79.48	77.86	79.48	76.91
	3 Conv	80.94	80.76	80.94	78.25
BiLSTM	1 BiLSTM	80.29	79.79	80.29	77.12
CNN-BiLSTM	1 Conv + 1 BiLSTM	80.52	79.02	80.52	78.33
	2 Conv + 1 BiLSTM	81.36	81.61	81.37	79.00
	3 Conv + 1 BiLSTM *	82.30	82.76	82.30	79.61

* i.e., CBF-IDS.

Table 13. Performance of each class for UNSW-NB15.

Class		CBC-IDS			CBA-IDS			CBF-IDS
Class	Precision	Recall	F1 Score	Precision	Recall	F1 Score	Precision	Recall	F1 Score
Analysis	81.25	4.86	9.17	96.43	5.05	9.59	79.55	6.54	12.09
Backdoor	76.92	6.44	11.88	89.66	5.58	10.51	88.37	8.15	14.93
DoS	38.55	0.98	1.91	51.47	2.14	4.11	57.62	4.74	8.76
Exploits	59.45	93.58	72.71	58.57	95.37	72.57	60.28	93.70	73.37
Fuzzers	66.14	52.77	58.71	71.16	48.44	57.64	71.34	51.29	59.68
Generic	99.81	97.11	98.44	99.76	97.17	98.45	99.69	97.46	98.56
Normal	88.98	93.42	91.15	89.26	94.19	91.66	89.09	94.52	91.73
Reconnaissance	86.35	71.19	78.04	90.55	72.98	80.82	92.70	74.48	82.60
Shellcode	58.08	38.08	46.00	64.64	38.74	48.45	59.10	65.56	62.17
Worms	83.33	14.29	24.39	87.50	20.00	32.56	56.00	40.00	46.67

Table 14. Performance of multi-classification for CIC-IDS2017.

Algorithm	Accuracy	Precision	Recall	F1 Score
SVM	92.66	92.16	92.66	91.91
KNN	98.91	98.92	98.91	98.91
[23]	97.72	97.73	97.72	97.68
[24]	96.86	96.97	96.86	96.83
[25]	99.40	99.39	99.40	99.38
CBC-IDS	98.72	98.79	98.72	98.73
CBA-IDS	98.97	99.04	98.97	98.99
CBF-IDS	99.53	99.54	99.53	99.53

Table 15. Multi-classification performance of different model structures on CIC-IDS2017.

Model	Layer	Accuracy	Precision	Recall	F1 Score
CNN	1 Conv	95.65	95.64	95.65	95.61
	2 Conv	97.83	97.86	97.83	97.80
	3 Conv	98.72	98.77	98.72	98.73
BiLSTM	1 BiLSTM	98.36	98.42	98.37	98.36
CNN-BiLSTM	1 Conv + 1 BiLSTM	98.01	98.06	98.01	98.00
	2 Conv + 1 BiLSTM	98.92	98.94	98.92	98.91
	3 Conv + 1 BiLSTM *	99.53	99.54	99.53	99.53

* i.e., CBF-IDS.

Table 16. Performance of each class for CIC-IDS2017.

Class		CBC-IDS			CBA-IDS			CBF-IDS
Class	Precision	Recall	F1 Score	Precision	Recall	F1 Score	Precision	Recall	F1 Score
Normal	99.24	98.39	98.82	99.33	98.77	99.05	99.62	99.55	99.58
DoS	99.11	99.74	99.42	99.09	99.89	99.49	99.52	99.53	99.53
Patator	98.39	99.64	99.01	99.82	99.71	99.76	99.78	99.57	99.67
PortScan	99.38	99.70	99.54	99.65	97.98	98.81	99.68	99.82	99.75
Bot	85.93	73.03	78.95	93.83	89.06	91.38	91.97	96.18	94.03
Infiltration	100.00	28.57	44.44	100.00	71.43	83.33	100.00	85.71	92.31
Web Attack	68.40	90.37	77.87	72.98	95.41	82.70	98.61	97.71	98.16

Table 17. Comparison of runtime performance for NSL-KDD.

Algorithm	Training Time (Seconds/Epoch)	Testing Time (Seconds)
[23]	15.05	1.08
[24]	69.22	7.48
[25]	21.22	2.66
CNN	23.06	1.08
BiLSTM	38.89	3.20
CBF-IDS	71.14	6.52

Table 18. Comparison of runtime performance for UNSW-NB15.

Algorithm	Training Time (Seconds/Epoch)	Testing Time (Seconds)
[23]	25.30	1.93
[24]	163.52	17.14
[25]	38.69	4.64
CNN	41.75	1.78
BiLSTM	89.82	7.51
CBF-IDS	127.65	11.28

Table 19. Comparison of runtime performance for CIC-IDS2017.

Algorithm	Training Time (Seconds/Epoch)	Testing Time (Seconds)
[23]	24.37	1.55
[24]	91.00	8.98
[25]	37.10	5.07
CNN	35.48	1.52
BiLSTM	54.41	4.06
CBF-IDS	118.18	10.24

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Peng, H.; Wu, C.; Xiao, Y. CBF-IDS: Addressing Class Imbalance Using CNN-BiLSTM with Focal Loss in Network Intrusion Detection System. Appl. Sci. 2023, 13, 11629. https://doi.org/10.3390/app132111629

AMA Style

Peng H, Wu C, Xiao Y. CBF-IDS: Addressing Class Imbalance Using CNN-BiLSTM with Focal Loss in Network Intrusion Detection System. Applied Sciences. 2023; 13(21):11629. https://doi.org/10.3390/app132111629

Chicago/Turabian Style

Peng, Haonan, Chunming Wu, and Yanfeng Xiao. 2023. "CBF-IDS: Addressing Class Imbalance Using CNN-BiLSTM with Focal Loss in Network Intrusion Detection System" Applied Sciences 13, no. 21: 11629. https://doi.org/10.3390/app132111629

APA Style

Peng, H., Wu, C., & Xiao, Y. (2023). CBF-IDS: Addressing Class Imbalance Using CNN-BiLSTM with Focal Loss in Network Intrusion Detection System. Applied Sciences, 13(21), 11629. https://doi.org/10.3390/app132111629

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

CBF-IDS: Addressing Class Imbalance Using CNN-BiLSTM with Focal Loss in Network Intrusion Detection System

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Data Description

3.1.1. NSL-KDD

3.1.2. UNSW-NB15

3.1.3. CIC-IDS2017

3.2. Data Preprocessing

3.2.1. Data Cleaning

3.2.2. One-Hot Encoding

3.2.3. Normalization

3.2.4. 2D Representation

3.2.5. Splitting Dataset

3.3. Model Architecture

3.3.1. Convolutional Neural Network

3.3.2. Bidirectional Long Short-Term Memory

3.4. Focal Loss Function

3.5. Evaluation Metrics

4. Experimental Results and Analysis

4.1. NSL-KDD Results and Analysis

4.2. UNSW-NB15 Results and Analysis

4.3. CIC-IDS2017 Results and Analysis

4.4. Runtime Results and Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI