Residual Dense Optimization-Based Multi-Attention Transformer to Detect Network Intrusion against Cyber Attacks

Alsulami, Majid H.

doi:10.3390/app14177763

Open AccessArticle

Residual Dense Optimization-Based Multi-Attention Transformer to Detect Network Intrusion against Cyber Attacks

by

Majid H. Alsulami

Applied College, Shaqra University, Shaqra 11961, Saudi Arabia

Appl. Sci. 2024, 14(17), 7763; https://doi.org/10.3390/app14177763

Submission received: 16 July 2024 / Revised: 16 August 2024 / Accepted: 22 August 2024 / Published: 3 September 2024

(This article belongs to the Special Issue New Technology Trends in Smart Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

Achieving cyber-security has grown increasingly tricky because of the rising concern for internet connectivity and the significant growth in software-related applications. It also needs a robust defense system to defend itself from multiple cyberattacks. Therefore, there is a need to generate a method for detecting and classifying cyber-attacks. The developed model can be integrated into three phases: pre-processing, feature selection, and classification. Initially, the min-max normalization of original data was performed to eliminate the impact of maximum or minimum values on the overall characteristics. After that, synthetic minority oversampling techniques (SMOTEs) were developed to reduce the number of minority attacks. The significant features were selected using a Hybrid Genetic Fire Hawk Optimizer (HGFHO). An optimized residual dense-assisted multi-attention transformer (Op-ReDMAT) model was introduced to classify selected features accurately. The proposed model’s performance was evaluated using the UNSW-NB15 and CICIDS2017 datasets. A performance analysis was carried out to demonstrate the effectiveness of the proposed model. The experimental results showed that the UNSW-NB15 dataset attained a higher precision, accuracy, F1-score, error rate, and recall of 97.2%, 98.82%, 97.8%, 2.58, and 98.5%, respectively. On the other hand, the CICIDS 2017 achieved a higher precision, accuracy, F1-score, and recall of 98.6%, 99.12%, 98.8%, and 98.2%, respectively.

Keywords:

intrusion detection systems; synthetic minority oversampling technique; hierarchical clustering algorithm; fire hawks

1. Introduction

Cybersecurity and security against frequent cyber-attacks have become critical issues in recent times. The leading cause of cyber-attacks is an enormous expansion in computer networks and the number of significant applications that individuals utilize for private or business use. It causes more monetary losses and damage in large-scale networks [1]. The challenges caused by future requirements cannot be handled by the current solutions, which include user authentication, hardware and software firewalls, and data encryption techniques. Unfortunately, it cannot protect computer networks from numerous cyber threats. These traditional security mechanisms cannot defend against intrusions due to the intrusion systems’ faster and more demanding expansion [2,3]. Therefore, to generate more precise detection techniques, researchers have been developing intrusion detection for cyber-attacks.

A dynamically monitored environment, such as system calls or syslog records and network traffic of operating methods is known as an intrusion detection system (IDS). Its primary function is to identify whether the actions observed indicate an attack or lawful use [4,5]. The IDS can be integrated into two phases, namely, host and network-based intrusion detection systems (HNIDS). A HIDS tracks system information, while a NIDS operates on characteristic vectors that include compiled data about network traffic over an established time [6]. However, many advances are present in network topologies, hardware, and software, such as cyber-attacks and the Internet of Things (IoT), which are becoming more challenging to detect. Due to the variable nature of cyber-attacks, anomaly detection is a captivating field for the utilization of more conventional classifiers like K-nearest neighbors (KNN), support vector machines (SVM), and random forests (RF). However, to evaluate and compare various anomaly detection techniques, extensive and accurate data from simulated network environments are needed [7]. The existing literature has identified that one of the main issues in intrusion detection research is the need for trustworthy datasets.

For better detection, deep learning (DL) has been employed. Among the DL models, recurrent neural networks (RNN) and convolutional neural networks (CNN) were mainly utilized to minimize the complexity of the current classifiers. Moreover, DL networks are well-known for their enhanced performance, so they can be used as a central component in network intrusion systems [8,9]. However, these models are costly and require more time to train the data. Likewise, artificial neural networks (ANN) were also utilized in IDS [10]. Due to its numerous benefits, an ANN is particularly well-suited for network intrusion detection. Initially, with an enormous amount of input characteristics, the model performed better in demonstrating non-linear data [11]. Next, when trained thoroughly, an ANN will quickly detect cyber-attacks. An ANN model is trained using a large dataset developed into a generalized solution for a specific task. On the other hand, intrusion detection is the conventional signature-based IDS that relies on manually established, accessible rules [12].

Similarly, auto-encoders (AE) have also been employed in IDS. Combining generative adversarial nets (GAN) into the AE, an enhanced method named adversarial auto-encoders (AAE) was utilized to detect cyber-attacks. However, this model had overfitting issues and a high computational load [13]. To generate computational techniques for detecting different types of attacks, varied incident patterns need to be analyzed before using cybersecurity data to estimate potential threats. Data-driven intelligent intrusion detection is the name assigned to it [14,15]. The primary objectives of this research are as follows:

⮚: To minimize the effect of maximum or minimum value on overall features, original data is first normalized using the min-max method.
⮚: To balance the classes, use SMOTE techniques, which help reduce overfitting issues.
⮚: To introduce a Hybrid Genetic Fire Hawk Optimizer for choosing an optimal subset of features.
⮚: The optimized residual dense-assisted multi-attention transformer model was used to classify the selected features.

The entire workflow of this research is as follows: Section 2 describes the most recent related research. In Section 3, the proposed methodology and fundamental conceptions are defined. Additionally, the experimental outcomes are discussed in Section 4. Section 5 summarizes this research and presents a conclusion.

2. Related Works/Literature Review

Hnamte et al. [16] developed an enhanced two-stage DL technique hybridizing long short-term memory (LSTM) and AE to detect attacks. To evaluate the parameters of the optimum network in the suggested model, CICIDS2017 and CICDS2018 were utilized. The results showed that the developed model performed better. Kabir et al. [17] developed two ML models combined with an extra tree (ET) classifier to detect network intrusion. Moreover, mutual information gain feature selection techniques were utilized for better accuracy. The UNSW-NB15 packet-based dataset was used to evaluate the performance of the suggested model. Compared to the current methods, the results proved that the developed model attained a higher accuracy of 96.24%, respectively.

To identify varied classes of network intrusion, Li et al. [18] introduced a model named decision tree twin SVM and hierarchical clustering (HC-DTTWSVM). DT was created initially by utilizing a hierarchical clustering algorithm. Then, the bottom-up merging technique maximizes the distinction of DT’s upper nodes. Twin SVM was incorporated in the generated DT to implement the suggested model. The performance evaluation was done using NSL-KDD and UNSW-NB15 datasets. The results demonstrated that the developed model could detect different classes of network intrusion and performed better than other techniques.

Hu et al. [19] developed an enhanced hybrid attention mechanism combined with an improved algorithm to enhance the capability for generation and accuracy. The feature information focuses on the essential characteristic data through the effective channel layer and curve space layer, enabling the model to focus more on the characteristics combined with classification and become more widely usable. Here, three datasets, namely CICIDS2017, UNSW-NB15, and CICDS2018, were utilized to verify the model’s performance. The experimental results proved that the developed model attained higher accuracies of 100%, 99.7%, and 98.10% for binary classification issues and 96.37%, 98.12%, and 99.06% for multi-classification problems, respectively.

To enhance the accuracy of IDSs, El-Rady et al. [20] presented a customized CNN model. The outcomes of the suggested CNN model are compared with random forest (RF), a well-known ML method frequently applied in IDS. Using standard measurements like accuracy and f1-measure, the performances of both techniques were evaluated. Two datasets, namely UNSW-NB15 and CSE-CICIDS2018, were used to verify the efficiency of the suggested model. Compared to the RF algorithm, the proposed CNN model achieved better outcomes. According to experimental results, the developed model attained 99.17% accuracy in the UNSW-NB15 dataset and 99.18% for CSE-CICIDS2018.

Du et al. [21] developed an enhanced NIDS by combining CNN and LSTM (NIDS-CNNLSTM) to classify intrusion detection. By combining the effective learning capabilities of LSTM neural networks and time series data, the suggested model learns and categorizes the features chosen by the CNN and validates its applicability using scenarios including binary and multi-classification. By utilizing the UNSW-NB15 and KDD-CUP99 datasets, the suggested models were trained. The final results proved that the developed model performed better than other current techniques and attained high classification accuracy and detection rates. Table 1 illustrates the comparisons from the literature survey.

Many researchers have chosen to develop an automatic IDS method for cyber-attacks. An expert has developed a process by combining LSTM and AE. However, this model had overfitting issues. To detect cyber-attacks accurately, experts introduced two different ML models combined with an extra tree classifier. However, the model needed better data generalization. Likewise, another model named decision tree twin SVM and hierarchical clustering was utilized to identify cyber-attacks. However, this model consumed more time. Another enhanced model, the hybrid attention mechanism, combined with an enhanced algorithm, was employed to detect cyber-attacks. However, the model had a high false alarm rate and it did not perform well. Therefore, to overcome the above-mentioned problems, this research focuses on developing an advanced model for cyber-attack detection.

3. Proposed Methodology

With the introduction of software-defined networks, smart homes, and the IoT, today’s computer networks and the applications that connect to these devices have experienced unprecedented growth. However, the possibility of network intrusions has also gradually increased, providing a constant risk to network infrastructures. These attacks aim to destroy availability, confidentiality, authority, and integrity, which are the primary basis of computing systems. A particular optimized transformer model has been utilized for network intrusion detection mechanisms to detect cyber-attacks. Initially, min-max normalization was employed to eliminate the effect of a maximum or minimum value on the main characteristics. Secondly, to solve the inadequate minority attack detection efficiency resulting from an imbalance in training data, the synthetic minority over-sampling method utilizes a data augmentation mechanism. Furthermore, feature selection was performed using Hybrid Genetic Fire Hawk Optimizer (HGFHO) to select the optimal subset of features. In the classification phase, optimized residual dense-assisted multi-attention transformer (Op-ReDMAT) models were developed to identify network cyber-attacks. Here, input data is collected from two publicly available datasets: UNSW-NB15 and CICIDS2017. Figure 1 illustrates the block diagram of the proposed methodology.

3.1. Pre-Processing

Numerous techniques were utilized to pre-process the input data for IDS. Pre-processing helps prepare the data for further analysis and modeling while also improving the effectiveness and precision of IDS. In this section, min-max normalization was utilized in the pre-processing stage.

Min-max Normalization

Normalizing data is the first step in the min-max normalization technique. This model measures the feature values between the 0–1 range. This is attained by subtracting the minimum value of the characteristics from each value and dividing the results by the feature’s range.

Synthetic Minority Oversampling Technique

SMOTE (synthetic minority oversampling technique) is one of the most utilized oversampling techniques to create synthetic samples from the minority class. This is done using the K-nearest neighbors (KNN) approach and the Euclidean distance to form synthetic samples along the line segments joining the original minority class samples. It helps in managing overfitting problems and provides a good representation of the minority class, which makes it widely popular in the field of handling imbalanced datasets.

SMOTE has been reported to need improvement, especially when small training datasets are in question, according to Salazar et al. [22]. GANSO outperforms SMOTE, as this study shows how GANSO works with generative adversarial network synthesis for data oversampling by creating more synthetic and realistic data than SMOTE [23]. The flexibility offered by the GANSO approach to create samples closer to the actual data distribution makes it more suitable in environments where data is scarce and variation is limited.

The authors prove that even though SMOTE minimizes the dangers of data repetition and overfitting, it may fail to capture the myriad patterns within limited data sets. On the other hand, GANSO uses the adversarial process to help enhance the generated samples, resulting in better model performance. This study shows that GANSO could serve as a better substitute for SMOTE when the actual data distribution needs to be preserved in the data augmentation of small or imbalanced datasets.

SMOTE is a standard over-sampling process [24]. SMOTE successfully minimizes overfitting problems and addresses data-oriented repetition by generating synthetic positive values in a feature space consisting of minority class instances and their related KNN. Given that the dataset includes both

a

negative and

b

positive samples, it is necessary to create synthetic samples for

a - b

. Initially, from

b

samples, the sample

y_{i}

was randomly chosen. By employing Euclidean distance, the KNN of

y_{i}

in

b

instances was identified. The mathematical representation of estimating the Euclidean distance is expressed in Equation (1):

d (y_{i}, y_{j}) = {\sqrt{\sum_{l}^{b} (y_{i}^{l} - y_{j}^{l})}}^{2}

(1)

Here, the Euclidean distance is denoted as

y_{i k}

, where

k = 1, 2, 3, \dots, 5

. Lastly, the synthetic samples were created in Equation (2):

y_{n e w} = y_{i} + r a n d (0, 1) . (y_{i} - y_{i k}),

(2)

Here, the random number from 0 to 1 is represented as

r a n d (0, 1)

, indicating that

y_{n e w}

is positioned on the line connecting the initial data

(y_{i})

and one of its neighbors

y_{i k}

. The steps mentioned above were repeated to acquire additional synthetic minority samples. Hence, SMOTE proved its capability in managing unbalanced data.

3.2. Feature Selection Using Hybrid Genetic Fire Hawker Optimizer (FHO)

Feature selection helps reduce false alarms, simplify data, improve ML and DL approach detection rates, remove redundant data, and simplify computations. This section uses a Hybrid Genetic Fire Hawker Optimizer to select the relevant features.

By considering starting and spreading fires and obtaining prey, the FHO algorithm resembles the hunting methods of fire hawks [25]. Several potential solutions

(Y)

were attained depending on the position of fire hawks and their prey. A random initialization mechanism was utilized to create the starting locations of each vector in the search space. The mathematical representation of the random initialization mechanism is illustrated in Equations (3) and (4):

Y = [\begin{array}{l} Y_{1} \\ Y_{2} \\ ⋮ \\ Y_{p} \\ ⋮ \\ Y_{M} \end{array}] = [\begin{array}{l} y_{1}^{1} y_{1}^{2} \dots y_{1}^{q} \dots y_{1}^{c} \\ y_{2}^{1} y_{2}^{2} \dots y_{2}^{q} \dots y_{2}^{c} \\ ⋮ ⋮ ⋮ ⋱ ⋮ \\ y_{p}^{1} y_{p}^{2} \dots y_{p}^{q} \dots y_{p}^{c} \\ ⋮ ⋮ ⋮ ⋱ ⋮ \\ y_{M}^{1} y_{M}^{2} \dots y_{M}^{j} \dots y_{M}^{c} \end{array}], \{\begin{array}{l} p = 1, 2, 3, \dots, M . \\ q = 1, 2, 3, \dots, c . \end{array}

(3)

y_{p}^{q} (0) = y_{p, m i n}^{q (y_{p, m a x}^{q_{p, m i n}^{q}}, \{\begin{array}{l} p = 1, 2, 3, \dots, M . \\ q = 1, 2, 3, \dots, c . \end{array})}

(4)

Here, the number of solution candidates is denoted as

N

. The candidate solution of the

i

-th search space is represented as

Y_{i}

. Here,

d

and

y_{i}^{j} (0)

denote the dimension of the problem under consideration and the initial place of the solution candidate. For the

p

-th solution candidate, the maximum and minimum values of

q

-th decision variables are

y_{i}^{j}

,

y_{i, m i n}^{j}

and

y_{i, m a x}^{j}

, respectively. The fire hawks initially disperse flames throughout the search area, which is also seen as the most efficient global solution. The mathematical representations for these features are illustrated in Equations (5) and (6):

P R = [\begin{array}{l} P R_{1} \\ P R_{2} \\ ⋮ \\ P R_{l} \\ ⋮ \\ P R_{n} \end{array}], l = 1, 2, \dots,

(5)

F H = [\begin{array}{l} F H_{1} \\ F H_{2} \\ ⋮ \\ F H_{c} \\ ⋮ \\ F H_{m} \end{array}], c = 1, 2, \dots,

(6)

Here,

F H_{c}

denotes the

c

-th fire hawk in a complete search space. The

l

-th prey in the search space is indicated as

P R_{l}

. The prey is denoted as

P R

. The distance between fire hawks and their prey was calculated in Equation (7):

D_{l}^{c} = {\sqrt{{(z_{2} - z_{1})}^{2} + (x_{2} - x_{1})}}^{2}, \{\begin{array}{l} c = 1, 2, 3, \dots, p \\ l = 1, 2, 3, \dots, q \end{array} .

(7)

Here, the total number of fire hawks and preys is represented as

p

and

q

. The total distance calculated between prey and fire hawk is denoted as

D_{l}^{c}

. Both of its positions in the search space are presented as

(z_{2}, x_{2})

and

(z_{1}, x_{1})

.

The group of fire hawks was identified by utilizing the nearby prey in the surroundings. Then, fire hawks gather hot coals to initiate the fire in a particular space. Both actions might be applied in the FHO primary search loop, because some birds use a flaming stick that belongs to a different fire hawk’s territory. This is illustrated in Equation (8):

F H_{c}^{n e w} = F H_{c} + (r_{1} \times G B - r_{2} \times F H_{N e a r}), c = 1, 2, \dots, n

(8)

Here, the global solution is denoted as

G B

. The position vector of

c

-th fire hawks is represented as

F H_{c}^{n e w}

. The random, uniformly distributed numbers are represented as

r_{1}

and

r_{2}

. In the search area, among many fire hawks,

F H_{N e a r}

represents one. The movement of prey during each fire provides the information for the algorithm’s subsequent step. A key aspect of animal behavior is assumed to be the hawk’s areas. The mathematical representation of updating a position is expressed in Equation (9):

P R_{w}^{n e w} = P R_{w} + (r_{3} \times F H_{c} - r_{4} \times S P_{c}), \{\begin{array}{l} c = 1, 2, \dots, n . \\ w = 1, 2, \dots, r . \end{array}

(9)

where

P R_{w}^{n e w}

is denoted as the actual position vector of the

w

-th prey. Here,

S P_{c}

is represented as the safe area below the

c

-th fire hawk territory.

Moreover, the prey might be transferred to a group of other fire hawks. Additionally, there is a probability that the prey will come near the fire hawks that become captured by nearby flames. Equation (10) demonstrates the position update development technique as follows:

P R_{w}^{n e w} = P R_{w} + (r_{5} \times F H_{A l t e r} - r_{6} \times S P), \{\begin{array}{l} c = 1, 2, 3, \dots, n . \\ w = 1, 2, 3, \dots, r . \end{array}

(10)

Here, outside of the

c

-th fire hawk’s area,

S P

represents the safe place. The mathematical representation for

S P_{c}

and

S P

is represented in Equation (11):

S P_{c} = \frac{\sum_{w = 1}^{r} P R_{w}}{a}, \{\begin{array}{l} w = 1, 2, 3, \dots, a . \\ c = 1, 2, 3, \dots, c . \end{array}

(11)

S P = \frac{\sum_{l = 1}^{m} P R_{l}}{n}, l = 1, 2, 3, \dots, n

(12)

Within the search space,

P R_{l}

indicates the

l

-th prey. Algorithm 1 explains the FHO algorithm’s pseudo-code.

Algorithm 1: Pseudo-code for FHO algorithm.

Procedure Fire Hawk Optimizer
Establish the initial locations for a potential solution

(Y)

Calculate values of fitness for initial potential solutions
Generate central fire as the global best

(G B)

solution
while Iteration < Higher number of iterations
Create the number of fire hawks for generating a random integer
number

(n)

Create preys

(P R)

and fire hawks

(F H)

in search space
Evaluate the overall distance between prey and fire hawks
Determine fire hawk location by integrating prey
for

c = 1 : n

Determine the fire hawks’ new location by utilizing Equation (8)
for

w = 1 : r

Estimate the safe location lower

c^{t h}

fire hawk area with Equation (11)
Evaluate the prey’s location with Equation (9)
Using Equation (12), calculate the

c^{t h}

fire hawk region outside the safe
location
Equation (10) can be utilized to evaluate the location of prey
end
end
Determine the potential fitness values of recently generated fire
hawks and prey
By handling central fire, generate the global best

(G B)

solution
end while
Return

G B

end procedure

A population-based optimization algorithm, the genetic algorithm (GA), enhances the solution by computing multiple populations at once [26]. The GA provides high parameters for model optimization, and parallelism might be established as continuous or discrete. The GA was employed to determine the optimal hyperparameters for CNN model training. Each hyperparameter was signified by a chromosome present in a GA-based HPO. Every chromosome has many genes encoded in a binary format. Techniques like crossover, selection, and mutation were transferred in this chromosome to estimate the best parameters. A higher chance of selecting and transferring chromosomes to the next generation has been connected with the higher function of fitness values. In the subsequent generation, such chromosomes generate more individuals that combine enhanced features from both parents. By interchanging the percentage of genes from different chromosomes, crossover was developed. A mutation is another process that creates new chromosomes and involves randomly changing one or more chromosomal genes. The next generation will have a more extensive variety of processes of features and a reduced probability of lacking important ones due to crossover and mutation methods. Algorithm 2 illustrates the pseudo-code for GA-based HPO.

Algorithm 2: Pseudo-code for GA-based HPO.

Hyperparameter Optimization of Genetic Algorithm
Input: Size of population:

p

Total generations in maximum:

N

Output: overall best solution (Top hyper-parameters):

H_{T o p}

Step 1: Begin
Step 2: Generate an initial

n

chromosome population

H_{j} (j = 1, 2, 3, \dots, p)

Step 3: Establish counter of generation

g = 0

Step 4: while

g < N

do
Step 5: Assess and update the CNN model
Step 6: The subsequent generation (keep the fittest individual)
Step 7: According to fitness, choose a chromosomal combination through people
Step 8: Select crossover techniques to the recently chosen chromosome
Step 9: Transform the offspring through mutation
Step 10: Substitute the old population with the new one
Step 11:

g = g + 1

Step 12: end
Step 13: return

H_{b e s t}

Step 14: end

3.3. Classification for Intrusion Detection Using Optimized Residual Dense-Assisted Multi-Attention Transformer

Classification is the process of organizing data into homogenous groups or classes based on some shared features obtained in the feature. This study employed an optimized residual dense-assisted multi-attention transformer (Op-ReDMAT) model to classify the features.

The proposed transformer encoder layer was utilized to predict the frequency distributions of different cyber-attacks based on each attack’s mel-frequency cepstral coefficients (MFCC) structure. The transformer’s multi-head self-attention layer enables the network to search for various prior time steps while anticipating the next ones, as attacks incorporate the entire frequency distributions rather than a one-time step. To minimize the trainable parameters of the network, the input features of MFCC were then mapped into the transformer block. The set of key-value pairs

(K, V)

, whose dimensions were equal to the length of the input sequence and were utilized by the transformer architecture. The hidden state of the encoder consists of keys and values. The output sequence of the decoder was mapped from

K - V

pairs using

Q

as

(Q, K, V)

. The decoder’s output is the averaged sum of all values in

(K, V)

stored representation of inputs. The transformer’s self-attention, which is defined as a sequence-length-scaled dot product for all the keys, yields each hidden state alignment, as illustrated in Equation (13):

A t t e n t i o n (Q, K, V) = s o f t \max (\frac{Q K^{T}}{\sqrt{n}})

(13)

The scaled dot product for the sequence output at the time step

t

was scaled by the hidden state dimension

n

. Numerous self-attention techniques can be utilized. This allows multi-head self-attention (MHSA) to estimate an over-term with a different weight dependent on a subspace of the input sequence. Multiplying and combining the output from each attention head incorporated with a weight matrix can minimize the encoded state dimensions. Instead of utilizing a single feed-forward layer as the transformer encoder in this study, the number of attention heads Conv-1D operates on stored latent space. The mathematical representations for estimating a softmax prediction are expressed in Equations (14) and (15):

M u l t i H e a d (Q, K, V) = [h e a d_{1}; h e a d_{2}; \dots h e a d_{n}] W^{O}

(14)

h e a d_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

(15)

Here, the learnable parameter matrices are denoted as

Q W_{i}^{Q}

,

K W_{i}^{K}

, and

V W_{i}^{V}

. The total weighted layers of multi-head attention are denoted as

W^{O}

. The structure of the multi-head attention transformer is shown in Figure 2.

Four identical stacked blocks of transformer encoder were utilized to categorize different attacks. Each block contains one MHSA layer and a fully connected feed-forward layer. After the MHSA layer, a skip connection and a normalization layer were added. The development of a skip connection and normalization follows the feed-forward layer. The skip connection increases the original embedding by using the output from the MHSA layer. The normalization is identical to batch normalization. The norm layer is additionally applied during testing, compared to batch normalization, which is adjusted to sequential inputs. After the residual connection, the norm layer is applied to the combined embeddings.

The researchers created a residual dense network (RDN) utilizing 1-dimensional (1D) convolutions to classify segments of 1s long, that is,

{(2 \times 2048 - dimensional)}^{2}

segments, into positive and negative binary segments. The RDN, divided into 27 blocks with two convolutional layers each, has a depth of 54. Blocks 5, 8, 11, 14, and 17 have two-stride residual connections, indicating a reduction of half in dimensionality. Moreover, the remaining were utilized in the residual connection. Figure 3 illustrates the structure of RDN.

The output of residual block layers was written as

j = f (y) + h (y)

(16)

Here, the block of two convolutional layers is denoted as

f (y)

. The input of the first layer is represented as

y

. The identity function is

h (y) = y

, or a strided convolutional layer forms the residual function

h (y)

. The ReLU activation function and batch normalization replaced each of the convolutional layers. Finally, two further individual convolutional layers reduced the output to a binary result.

4. Results and Discussion

This study proposes an enhanced DL technique to detect cyber-attacks automatically. The work is implemented using the Python programming language. The proposed method employs two datasets, UNSW-NB15 and CICIDS 2017. The system configuration is shown in Table 2.

4.1. Dataset Description

4.1.1. UNSW-NB15

The IXIA PerfectStorm tool in the UNSW Canberra Cyber Range Lab synthesized the initial network packets of the UNSW-NB15 dataset to create a hybrid of actual modern normal activities and artificial contemporary assault behaviors. Here, 100 GB of raw data were obtained by employing the tcpdump application. Nine varied attacks, namely shellcode, fuzzes, Backdoors, worms, denial-of-service, generic, and exploit attacks, were incorporated into this dataset.

4.1.2. CICIDS2017

The benign and most recent common attacks are included in the CICIDS2017 dataset, which accurately represents actual real-world data (PCAPs). It also contains CIC flow meter network traffic analysis results, attack types (CSV files), source and destination IP addresses, protocols, and labeling flows according to time stamps.

4.2. Analysis of Performance Metrics

This section details performance metrics such as accuracy, precision, recall, and f-measure.

Accuracy: This can be calculated by dividing the overall samples by several correctly classified samples, as expressed in Equation (17):

A c c u r a c y (A) = \frac{T_{N} + T_{P}}{T_{N} + T_{P} + F_{N} + F_{P}}

(17)

Here, accurate positives, negatives, and false positive negatives are denoted as

T_{N}

,

T_{P}

,

F_{N}

,

F_{P}

.

Precision: This can be calculated by dividing the number of positively categorized samples by the actual attack correct predictions, as illustrated in Equation (18):

Precision (P) = \frac{T_{P}}{T_{P} + F_{P}}

(18)

Recall: This is the ratio of the total number of actual positive samples in the dataset to the total number of false negative and accurate positive samples, as illustrated in Equation (19):

Recall (R) = \frac{T_{P}}{T_{P} + F_{N}}

(19)

F1-Score: The F1-score combines recall and precision in a single metric to enhance the model performance analysis, as expressed in Equation (20):

F 1 - s c o r e = 2 * \frac{P * R}{P + R}

(20)

4.3. Performance Evaluation of UNSW-NB15

The comparison between the existing and proposed methods in terms of precision, accuracy, recall, error, and F1-measure is demonstrated in Figure 4. The proposed method is compared with MOPSO-Levy-KNN and AAFSA with GA-FR-CNN and GWO-PSO-RF. The existing GWO-PSO-RF attained a meager F1-measure value of 79.0%, respectively. However, the model had a low-quality solution and high complexity. The existing MOPSO-Levy-KNN achieved a higher F1-measure value of 87.3%, respectively. As a result, the existing MOPSO-Levy-KNN had storage and memory issues for large datasets. The existing AAFSA with GA-FR-CNN attained a higher recall value of 94.5%. However, the existing model affected the network’s convergence rate on the error surface. The existing GWO-PSO-RF attained a high error value of 15.54, respectively. However, the existing GWO-PSO-RF model failed to detect cyber-attacks.

The proposed method yields higher results than MOPSO-Levy-KNN, AAFSA with GA-FR-CNN, and GWO-PSO-RF in precision, accuracy, recall, error, and F1-measure. Compared to existing models, such as GWO-PSO-RF and MOPSO-Levy-KNN, the proposed method has higher F1-measure and recall values to detect more accurate and effective cyber-attacks. The study’s implications are that the suggested model presents a more effective and efficient solution to protect against cyber threats.

Figure 5a,b demonstrates the accuracy and loss curve analysis. The diagram proves that the labeled technique positively impacts convergence. The dataset has been integrated into two phases: training and testing. The researchers created 75% and 25% of the training and testing data, respectively, for this study. The processed training set was used to train the indicated technique for 200 epochs during the exercise. Here, the learning rate was set to 0.1.

Figure 6 illustrates the confusion matrix of the proposed Op-ReDMAT method generated on UNSW-NB15. The proposed model detects samples 13,905 as normal, 9939 as generic, and 8407 as exploits, etc. Thus, the proposed model detects the sample correctly and clearly in the confusion matrix.

The ROC curve for detecting cyber-attacks is shown in Figure 7. A receiver operating characteristic curve (ROC) is used to graphically represent the performance of a binary classifier model at different threshold circumstances. The true and false favorable rates (TPR-FPR) are calculated at every threshold level to create the ROC curve. The suggested ROC curve reached 97.38, respectively.

The TPR and FPR for the UNSW-NB15 dataset are shown in Figure 8a,b. In Figure 8a, the proposed TPR attained 0.9982, respectively. The existing RNN achieved a very low TPR of 0.94, respectively. However, the existing RNN needs more labeled data and space to train. Furthermore, the existing LR attained 0.996, respectively. However, the existing LR had a high computational load and was very expensive. In Figure 8b, the proposed FPR attained 0.028, respectively. The existing SGD attained 0.073, respectively. However, the model had high computational requirements. This analysis shows that the TPR and FPR can perform better in the proposed model.

According to the findings displayed in Table 3, the accuracy of the proposed model is the highest at 98.82% compared to other models, including DBN, RNN, LR, and SGD. Even though LR and SGD have a perfect recall and high precision of 98%, it is essential to include the accuracy of the proposed model, which was equal to 97.2% and the recall equal to 98.5%, which also demonstrates a high ability to detect a cyber-attack.

The implications of these results, coupled with the findings presented in Figure 9 and Figure 10, are that the proposed model outperforms existing models with a true positive rate of 0.9982 and a false positive rate of 0.028. The ROC curve, with an AUC of 97.38%, also proves the model’s high effectiveness in differentiating between actual threats and false alarms. Thus, the proposed model proves to be highly effective in detecting cyber-attacks with high accuracy and reliability, and it ensures a highly effective solution for eradicating the drawbacks of the conventional models.

4.4. Performance Analysis for CICIDS2017

Figure 9 compares existing and proposed precision, accuracy, F1-score, and recall methods. The DNN, LSTM, and CNN are compared with the proposed technique. The existing DNN attained higher accuracy values of 94.6%, respectively. However, the existing DNN needs more interpretability issues. The existing LSTM achieved a low F1-score value of 93.5%, respectively. However, the existing LSTM needs more training data. Furthermore, the existing CNN had an average value of 96.9%. As a result, the existing CNN needs high computational requirements. Compared to the existing methods, the proposed technique attained higher values.

Figure 9. Comparison between the existing and proposed methods.

The observation in Figure 9 also validates the importance of the proposed technique’s improvement over the existing models like DNN, LSTM, and CNN by highlighting the difference in accuracy, precision, F1-score, and recall. However, DNN has some limitations; while it has high accuracy, it could be improved in the interpretability of results. LSTM needs a large amount of training data, and CNN requires more computational power. The proposed method obtains better performance for these metric values and overcomes the above shortcomings, proving that the proposed method has good application prospects for efficient time series data processing. In conclusion, the proposed technique offers a better and more optimal solution than the existing methods in terms of improved performance and reduced chances of failure.

Figure 10a,b illustrates the accuracy and loss curve analysis. The accuracy values gradually increase to 50 epochs, so the model attains high accuracy. When calculating the loss function, the testing loss is high, from 0.1 after 50 epochs, and the training loss is below 0. Thus, the proposed model consumes less time for training and testing.

Figure 10. (a,b): Accuracy and loss analysis for the CICIDS2017 dataset.

The results presented in Figure 10 demonstrate the importance of the learning effectiveness of the proposed model. The constant rise in the level of accuracy up to fifty epochs shows that the model is learning patterns and thus attains its maximum accuracy. The fact that the training loss is low, and the testing loss starts to level off around the 50th epoch, indicates that the model is learning well, tested on rounds, and does not overfit. Consequently, these findings suggest that the proposed model is promising for practical implementation in terms of accuracy and time to accomplish training and testing phases.

Figure 11 illustrates the confusion matrix of the proposed Op-ReDMAT method generated on CICIDS 2017. The proposed model predicts 97 regular attacks and 733,593 cyber-attacks from the total number of samples.

The confusion matrix depicted in Figure 11 confirms the capability of the proposed Op-ReDMAT method in distinguishing between the standard and cyber-attack instances using the CICIDS 2017 dataset. This means that the model is efficient in detecting 97 standard samples and 3593 cyber-attack samples, suggesting efficiency in detecting malicious activity. These figures demonstrate that the high number of correctly identified cyber-attacks stems from the accuracy and efficacy of the model in differentiating between regular traffic and malicious behaviors. Finally, the conclusions prove that the Op-ReDMAT method is highly effective in accurately detecting cyber-attacks, which shows the effectiveness and potential of the technique as a tool for improving the reliability of cybersecurity systems.

Figure 11. Confusion matrix for the CICIDS2017 dataset.

Figure 12 demonstrates the ROC value calculated with an accurate and false positive rate. The proposed ROC achieves a high accuracy value of 98.5. Table 3 and Table 4 illustrate the performance analysis between the proposed and existing models for CICIDS2017 [27] and UNSW-NB15 [28].

The ROC value of 98.5 for the proposed model is explained in Figure 12, which means that the model differentiates between true positive and false positive values precisely. This can be attributed to the fact that this model has proven very effective in its ability to identify cyber-attacks in the first place, while at the same time ensuring that it issues very few false alarms. The extended performance analysis in Table 4 confirms that the proposed model achieves superior performance over existing models on both CICIDS2017 datasets. Thus, the results show that the proposed model has a significantly higher detection capability and accuracy and can be considered an improvement over current cyber-attack detection.

Table 4 shows the performance comparison of the different models on the CICIDS2017 dataset. Compared to other methods, the proposed model yields a higher accuracy of 99.12%, a precision of 98.6%, a recall of 98.2%, and an F1-score of 98.8%. However, the single LSTM layer and bi-directional LSTM models show 97.7% accuracy and F1-scores of 96.9% and 97.6%, respectively. The proposed CNN-LSTM also has comparatively lesser accuracy at 93.0% with an F1-score of 81.3%, which signifies that the model has relatively low performance.

These findings are valuable, because the model-based approach achieves higher accuracy and detection rates. This is evident from the high values of precision, recall, and F1-score, which shows that it can accurately identify the positive samples as true positives while minimizing the number of false negatives. Finally, owing to the proposed model’s high accuracy and comprehensive performance coefficients, there is a significant indication of its higher efficacy against various cyber-attacks in contrast to the existing model.

4.5. Discussion

Different studies have employed multiple techniques for intrusion detection systems [16,17,18,19,20,21]. However, these models have a very high computational load, are slower on large datasets, need better data generalization, and are highly complex. An enhanced optimized residual dense-assisted multi-attention transformer (Op-ReDMAT) model was introduced to overcome these issues. Here, the UNSW-NB15 dataset attained higher precision, accuracy, F1-score, error rate, and recall of 97.2%, 98.82%, 97.8%, 2.58, and 98.5%, respectively. Moreover, the CICIDS2017 dataset attained higher precision, accuracy, F1-score, and recall of 98.6%, 99.12%, 98.8%, and 98.2%, respectively. From Table 5, it can be seen that the proposed model performed better than the existing methods.

Classifier fusion is the procedure of using the outputs of multiple classifiers to boost the general decision-making process and has, in the recent past, attracted much attention in the improvement of the stability and accuracy of IDS. Salazar et al. [22] showed that by fusing different classifiers, it is possible to improve overall accuracy through using fusion methods such as using majority vote or weighted average. This approach is particularly suitable for the application in the proposed Op-ReDMAT model, since the integration of the classifier fusion can improve the detection rate and the variance against a broad spectrum of cyber threats. The addition of classifier fusion into the Op-ReDMAT can result in better resistance to intrusion detection, particularly with diverse and large sets of samples such as UNSW-NB15 and CICIDS2017. Future works may try to implement the classifier fusion strategies with the proposed model to enhance the reliability in identifying attacks and to make the IDS more accurate in a variety of attack cases, which makes it even more actionable in the real life.

A significant limitation of this study is the lack of rigorous statistical analysis of the results to the extent that confidence intervals or p-values were not used. It is, therefore, important to supplement them with factors such as p-values or confidence intervals to evaluate the robustness and significance of the results. Further, though the performance of the proposed model is promising, the study might require a more adequate analysis of the number of computational resources needed to implement the model on a larger scale. This would give a broader approach to its usage in different real-life situations.

Future studies should use more scientific measures, such as p-values and confidence intervals, to offer more reliable outcomes. Enlarging the scope of the study to incorporate a wider range of datasets and using real-time data to test the model would further confirm the model’s usability in other settings. Moreover, an analysis of the computational resource’s utilization and assessment of the model in the different topologies of networks can enrich the applicability and efficiency of the proposed approach against cybersecurity issues.

5. Conclusions

The most critical issue in the present age of network communication is network interference. The increasing frequency of cyber-attacks seriously endangers networks. Numerous studies were performed to identify how to prevent network interference and safeguard privacy and network security. In this study, new and enhanced DL techniques were developed to detect cyber-attacks. Min-max normalization was utilized to normalize the data. Due to the imbalance of training data, several minority attacks occurred. Therefore, to avoid these issues, synthetic minority oversampling techniques (SMOTE) were developed. By utilizing the Hybrid Genetic Fire Hawk Optimizer (HGFHO), feature selection was executed to select optimal features. Finally, the residual dense-assisted multi-attention transformer (Op-ReDMAT) model was employed to classify the selected features. Performance analysis was implemented to show the efficiency of the proposed model. The experimental results showed that the UNSW-NB15 dataset attained a higher precision, accuracy, F1-score, error rate, and recall of 97.2%, 98.82%, 97.8%, 2.58, and 98.5%, respectively. On the other hand, the CICIDS2017 dataset attained higher precision, accuracy, F1-score, and recall of 98.6%, 99.12%, 98.8%, and 98.2% respectively. To provide the cyber-security community with automated security services, the researchers aim to expand the cyber-security datasets and create a data-driven IDS.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the author.

Acknowledgments

The author would like to thank the Deanship of Scientific Research at Shaqra University for supporting this work.

Conflicts of Interest

The author declares no conflict of interest.

References

Alqahtani, H.; Sarker, I.H.; Kalim, A.; Minhaz Hossain, S.M.; Ikhlaq, S.; Hossain, S. Cyber intrusion detection using machine learning classification techniques. In Proceedings of the Computing Science, Communication and Security: First International Conference, COMS2 2020, Gujarat, India, 26–27 March 2020; Revised Selected Papers 1. Springer: Singapore, 2020; pp. 121–131. [Google Scholar]
Mohammadi, S.; Mirvaziri, H.; Ghazizadeh-Ahsaee, M.; Karimipour, H. Cyber intrusion detection by combined feature selection algorithm. J. Inf. Secur. Appl. 2019, 44, 80–88. [Google Scholar] [CrossRef]
Anwar, S.; Mohamad Zain, J.; Zolkipli, M.F.; Inayat, Z.; Khan, S.; Anthony, B.; Chang, V. From intrusion detection to an intrusion response system: Fundamentals, requirements, and future directions. Algorithms 2017, 10, 39. [Google Scholar] [CrossRef]
He, H.; Sun, X.; He, H.; Zhao, G.; He, L.; Ren, J. A Novel Multimodal-Sequential Approach Based on Multi-View Features for Network Intrusion Detection. IEEE Access 2019, 7, 183207–183221. [Google Scholar] [CrossRef]
Ring, M.; Wunderlich, S.; Gruedl, D.; Landes, D.; Hotho, A. Generation Scripts for the Coburg Intrusion Detection Data Sets (Cidds). 2017. Available online: https://github.com/markusring/CIDDS (accessed on 11 May 2020).
Rashid, A.; Siddique, M.J.; Ahmed, S.M. Machine and Deep Learning Based Comparative Analysis Using Hybrid Approaches for Intrusion Detection System. In Proceedings of the 2020 3rd International Conference on Advancements in Computational Sciences (ICACS), Lahore, Pakistan, 17–19 February 2020; pp. 1–9. [Google Scholar]
Małowidzki, M.; Berezinski, P.; Mazur, M. Network Intrusion Detection: Half a Kingdom for a Good Dataset. In Proceedings of the NATO STO SAS-139 Workshop, Lisbon, Portugal, 20–21 April 2015; Available online: https://pdfs.semanticscholar.org/b39e/0f1568d8668d00e4a8bfe1494b5a32a17e17.pdf (accessed on 2 February 2021).
Al-Emadi, S.; Al-Mohannadi, A.; Al-Senaid, F. Using deep learning techniques for network intrusion detection. In Proceedings of the 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), Doha, Qatar, 2–5 February 2020; pp. 171–176. [Google Scholar]
Vinayakumar, R.; Soman, K.P.; Poornachandran, P. Applying convolutional neural network for network intrusion detection. In Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, India, 13–16 September 2017. [Google Scholar] [CrossRef]
Ciaburro, G.; Venkateswaran, B. Neural Networks with R: Smart Models Using CNN, RNN, Deep Learning, and Artificial Intelligence Principles; Packt Publishing: Birmingham, UK, 2017. [Google Scholar]
Bottou, L.; Curtis, F.E.; Nocedal, J. Optimization methods for large-scale machine learning. Siam Rev. 2018, 60, 223–311. [Google Scholar] [CrossRef]
Jyothsna, V.V.R.P.V.; Prasad, V.V.R.; Prasad, K.M. A review of anomaly-based intrusion detection systems. Int. J. Comput. Appl. 2011, 28, 26–35. [Google Scholar] [CrossRef]
Hara, K.; Shiomoto, K. Intrusion detection system using semi-supervised learning with adversarial auto-encoder. In Proceedings of the NOMS 2020-2020 IEEE/IFIP Network Operations and Management Symposium, Budapest, Hungary, 20–24 April 2020; pp. 1–8. [Google Scholar]
Sarker, I.H.; Kayes, A.S.M.; Badsha, S.; Alqahtani, H.; Watters, P.; Ng, A. Cybersecurity data science: An overview from a machine learning perspective. J. Big Data 2020, 7, 41. [Google Scholar] [CrossRef]
Sarker, I.H.; Abushark, Y.B.; Alsolami, F.; Khan, A.I. Intrudtree: A machine learning-based cyber security intrusion detection model. Symmetry 2020, 12, 754. [Google Scholar] [CrossRef]
Hnamte, V.; Nhung-Nguyen, H.; Hussain, J.; Hwa-Kim, Y. A novel two-stage deep learning model for network intrusion detection: LSTM-AE. IEEE Access 2023, 11, 37131–37148. [Google Scholar] [CrossRef]
Kabir, M.H.; Rajib, M.S.; Rahman AS, M.T.; Rahman, M.M.; Dey, S.K. Network intrusion detection using UNSW-NB15 dataset: Stacking machine learning based approach. In Proceedings of the 2022 International Conference on Advancement in Electrical and Electronic Engineering (ICE), Gazipur, Bangladesh, 24–26 February 2022; pp. 1–6. [Google Scholar]
Li, Z.; Luo, X.; Zhang, Y.; Yang, X.; Wang, X. HC-DTTSVM: A network intrusion detection method based on decision tree twin support vector machine and hierarchical clustering. IEEE Access 2023, 11, 21404–21416. [Google Scholar]
Hu, X.; Meng, X.; Liu, S.; Liang, L. An Improved Algorithm for Network Intrusion Detection Based on Deep Residual Networks. IEEE Access 2024, 12, 66432–66441. [Google Scholar] [CrossRef]
El-Rady, A.A.; Osama, H.; Sadik, R.; El Badwy, H. Network Intrusion Detection CNN Model for Realistic Network Attacks Based on Network Traffic Classification. In Proceedings of the 2023 40th National Radio Science Conference (NRSC), Giza, Egypt, 30 May–1 June 2023; Volume 1, pp. 167–178. [Google Scholar]
Du, J.; Yang, K.; Hu, Y.; Jiang, L. NIDS-CNNLSTM: Network intrusion detection classification model based on deep learning. IEEE Access 2023, 11, 24808–24821. [Google Scholar] [CrossRef]
Salazar, A.; Vargas, N.; Safont, G.; Vergara, L. Late Fusion for Improving Intrusion Detection in a Network Traffic Dataset. In Proceedings of the 2021 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 15–17 December 2021; pp. 1684–1689. [Google Scholar] [CrossRef]
Salazar, A.; Vergara, L.; Safont, G. Generative Adversarial Networks and Markov Random Fields for oversampling very small training sets. Expert Syst. Appl. 2021, 163, 113819. [Google Scholar] [CrossRef]
Yi, H.; Jiang, Q.; Yan, X.; Wang, B. Imbalanced classification based on minority clustering smote with wind turbine fault detection application. IEEE Trans. Ind. Inform. 2020, 17, 5867–5875. [Google Scholar] [CrossRef]
Shishehgarkhaneh, M.B.; Azizi, M.; Basiri, M.; Moehler, R.C. BIM-based resource tradeoff in project scheduling using fire hawk optimizer (FHO). Buildings 2022, 12, 1472. [Google Scholar] [CrossRef]
Latif, S.; Boulila, W.; Koubaa, A.; Zou, Z.; Ahmad, J. DTL-IDS: An optimized Intrusion Detection Framework using Deep Transfer Learning and Genetic Algorithm. J. Netw. Comput. Appl. 2024, 221, 103784. [Google Scholar] [CrossRef]
Figueiredo, J.; Serrão, C.; de Almeida, A.M. Deep learning model transposition for network intrusion detection systems. Electronics 2023, 12, 293. [Google Scholar] [CrossRef]
Rao, Y.N.; Suresh Babu, K. An imbalanced generative adversarial network-based approach for network intrusion detection in an imbalanced dataset. Sensors 2023, 23, 550. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Proposed methodology for the current study.

Figure 2. Structure of multi-head attention transformer.

Figure 3. Structure and connection illustration of a residual dense network (RDN).

Figure 4. Performance metrics for the UNSW-NB15 dataset.

Figure 5. (a,b): Accuracy and loss analysis for the UNSW-NB15 dataset.

Figure 6. Confusion matrix proposed dataset.

Figure 7. Binary classifier model performance.

Figure 8. (a,b): True and false positive rate for the UNSW-NB15 dataset.

Figure 12. Performance analysis between the proposed and existing model.

Table 1. Comparison of Different models with their merits and demerits.

Author and Reference	Model	Merits	Demerits
Hnamte et al. [16]	LSTM and AE	It can handle longer sequences. It is able to train the data.	Previously reliable autoencoders frequently learn completely connected networks, which are prone to overfitting. Typical DNN and CNN models lead to restricted results.
Kabir et al. [17]	Two different ML models combined with extra tree classifier.	It has high stability and accuracy when handling difficult issues.	It leads to poor generalization of data. Network intrusion has been generalized.
Li et al. [18]	Decision tree twin SVM and hierarchical clustering	The model has higher speed. It is very easy to understand and use.	High complexity. Consumes more time. Based on complex calculations.
Hu et al. [19]	Hybrid attention mechanism combined with enhanced algorithm	The model has faster convergence.	It has high false alarm rate. Only three datasets lead to generalizing the results and precision accuracy.
El-Rady et al. [20]	Customized CNN	It automatically extracts intricate patterns and features from complex data.	It requires large amount of training data. It is very expensive. Slower tendency. Large memory footprints are essential.
Du et al. [21]	CNN and LSTM	It can handle sequential data.	The model is very slow to train on large datasets. Complex model to run. Requires very powerful hardware. Computational requirements are very high.

Table 2. System configuration for selected data sets.

Installed RAM	16.0 GB
Pen and Touch	No pen or Touch Input is available.
Type of System	x64-based process, 64-bit operating system

Table 3. Comparison analysis of the existing and proposed models for the UNSW-NB15 dataset.

Models	Accuracy	Precision	Recall
DBN	92.3%	0.00%	0.00%
RNN	95.3%	0.00%	0.00%
LR	98.17%	98%	1.00
SGD	97.9%	98%	1.00
Proposed	98.82%	97.2%	98.5%

Table 4. Performance evaluation of proposed and existing techniques in the CICIDS2017 dataset.

Models	Accuracy	Precision	Recall	F1-Score
Single LSTM Layer	97.7%	0.00%	0.00%	96.9%
Bi-directional LSTM	97.7%	0.00%	0.00%	97.6%
CNN-LSTM	93.0%	86.4%	76.8%	81.3%
Proposed	99.12%	98.6%	98.2%	98.8%

Table 5. Comparison of the existing methods.

Methods	Precision	Accuracy	F1-Score	Recall
LSTM and AE [16]	0.00%	99.0%	0.00%	0.00%
Two different ML models combined with an extra tree classifier [17]	0.00%	96.2%	0.00%	0.00%
A hybrid attention mechanism combined with an enhanced algorithm [19]	95.03%	96.3%	95.0%	95.19%
Proposed Model	98.6%	99.12%	98.8%	98.2%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alsulami, M.H. Residual Dense Optimization-Based Multi-Attention Transformer to Detect Network Intrusion against Cyber Attacks. Appl. Sci. 2024, 14, 7763. https://doi.org/10.3390/app14177763

AMA Style

Alsulami MH. Residual Dense Optimization-Based Multi-Attention Transformer to Detect Network Intrusion against Cyber Attacks. Applied Sciences. 2024; 14(17):7763. https://doi.org/10.3390/app14177763

Chicago/Turabian Style

Alsulami, Majid H. 2024. "Residual Dense Optimization-Based Multi-Attention Transformer to Detect Network Intrusion against Cyber Attacks" Applied Sciences 14, no. 17: 7763. https://doi.org/10.3390/app14177763

APA Style

Alsulami, M. H. (2024). Residual Dense Optimization-Based Multi-Attention Transformer to Detect Network Intrusion against Cyber Attacks. Applied Sciences, 14(17), 7763. https://doi.org/10.3390/app14177763

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Residual Dense Optimization-Based Multi-Attention Transformer to Detect Network Intrusion against Cyber Attacks

Abstract

1. Introduction

2. Related Works/Literature Review

3. Proposed Methodology

3.1. Pre-Processing

3.2. Feature Selection Using Hybrid Genetic Fire Hawker Optimizer (FHO)

3.3. Classification for Intrusion Detection Using Optimized Residual Dense-Assisted Multi-Attention Transformer

4. Results and Discussion

4.1. Dataset Description

4.1.1. UNSW-NB15

4.1.2. CICIDS2017

4.2. Analysis of Performance Metrics

4.3. Performance Evaluation of UNSW-NB15

4.4. Performance Analysis for CICIDS2017

4.5. Discussion

5. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI