Intelligent Cyber Security Framework Based on SC-AJSO Feature Selection and HT-RLSTM Attack Detection

Dahiya, Mahima; Nitin, Nitin; Dahiya, Deepak

doi:10.3390/app12136314

Open AccessArticle

Intelligent Cyber Security Framework Based on SC-AJSO Feature Selection and HT-RLSTM Attack Detection

by

Mahima Dahiya

¹,

Nitin Nitin

¹ and

Deepak Dahiya

^2,*

¹

Department of Electrical Engineering and Computer Science, College of Engineering and Applied Science, University of Cincinnati, 2600 Clifton Ave, Cincinnati, OH 45221, USA

²

Department of Computer Science, College of Computer and Information Sciences, Majmaah University, Al-Majmaah 11952, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(13), 6314; https://doi.org/10.3390/app12136314

Submission received: 5 February 2022 / Revised: 18 June 2022 / Accepted: 18 June 2022 / Published: 21 June 2022

(This article belongs to the Special Issue Distributed Computing Systems and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Cyber security is identified as an emerging concern for information technology management in business and society, owing to swift advances in telecommunication and wireless technologies. Cyberspace security has had a tremendous impact on numerous crucial infrastructures. Along with current security status data, historical data should be acquired by the system to implement the latest cyber security defense and protection. It also makes intelligent decisions that can provide adaptive security management and control. An intelligent cyber security framework using Hyperparameter Tuning based on Regularized Long Short-Term Memory (HT-RLSTM) technique was developed in this work to elevate the security level of core system assets. To detect various attacks, the proposed framework was trained and tested on the collection of data. Owing to missing values, poor scaling, imbalanced and overlapped data, the data was primarily incomplete and inconsistent. To elevate the decision making for detecting attacks, the inconsistent or unstructured data issue was addressed. The missing values were handled by this work along with scaling performance using the developed Kernelized Robust Scaler (KRS). Using the developed Random Over Sample-Based Density-Based Spatial Clustering Associated with Noise (ROS-DBSCAN), the imbalanced and overlapped data were handled, which was followed by the relevant feature selection of data utilizing the Sine Cosine-Based Artificial Jellyfish Search Optimization (SC-AJSO) technique. The data were split under the provision of Stratified K-Fold cross-validation along being trained in the proposed HT-RLSTM. The experimental analysis depicted that better accuracy was attained in detecting attacks by the proposed work for different datasets. When analogized with prevailing state-of-the-art methods, a low false detection rate, as well as computation time, was attained by the proposed scheme.

Keywords:

cyber security; Kernelized Robust Scaler (KRS); Sine Cosine-Based Artificial Jellyfish Search Optimization (SC-AJSO); Random Over Sample-Based Density-Based Spatial Clustering Associated with Noise (ROS-DBSCAN); Hyperparameter Tuning based on Regularized Long Short-Term Memory (HT-RLSTM); attacks; Advanced Persistent Threats (APTs)

1. Introduction

Cloud computing offers computing resources to users through the network (Internet) [1].

The network is utilized by cloud service providers for communication, along with exchanging their private and confidential data [2]. It is vulnerable to numerous attacks like Denial of Services (DoS), exploits, generic, reconnaissance, shellcode, zero-day attack, Advanced Persistent Threat (APT), worms, and so on, owing to the cloud’s distributed nature [3,4]. The server’s resource consumption starts elevating rapidly when such an attack is launched [5]. Owing to auto-scaling, assigning more resources to this server is commenced by the cloud manager [6]. If the attack is unidentified and unmitigated, the resource allocation process continues making a huge loss [7]. Moreover, the running of numerous Virtual Machines (VMs) occurs on a single physical server. A VM affects other VMs running on the same server when it is under attack [8]. Extortion and protest are the crucial motives behind these attacks [9]. A major concern for the latest attack paradigms is cloud security along with engendering severe damage to the cloud [10]. Therefore, a Cyber-Security Risk Management (CSRM) plan is requisite to tackle these issues, and limit the possibility of correlated risks, along with the readiness for cyber resiliency [11]. It is the process of locating cyber-security potential risks along with planning defenses to desist those risks [12]. Recently, the intensity of these attacks on the cloud has increased.

To combat these attacks, simple deep neural network models and Machine Learning (ML) models are utilized by most CSRM plans to detect cyber-attacks on hosts and network systems [13]. A simple deep neural network consisting of one or two Hidden Layers (HL) is Artificial Neural Network (ANN). However, many HLs, together with different architectures, are included in a deep network [14]. Owing to its capability to study computation processes in-depth, which mimics the human brain’s natural behavior, most researchers broadly utilize Deep Learning (DL) [15]. Moreover, extremely high false-positive rates have been seen using this method, along with difficulty in detecting the latest attack types [16]. Some limitations like extensive training time needs, low detection accuracy, along with a high rate of false alarms were shown by conventional ML techniques [17]. Consequently, typical adversary agnostic was shown by most Artificial Intelligence (AI) systems, i.e., they do not recognize the malicious attack’s possibility. Hence, a high chance of an adversarial backdoor poisoning attack can exist. Additionally, false attack detection and data breaches were led by devoid of labelled samples [18]. Hence, this work has proposed intelligent cyber security defenses and protection using HT-RLSTM to tackle the above-mentioned problem.

This paper’s organization is given as follows: the associated works regarding the proposed methodology are surveyed in Section 2. The proposed methodology is explained in Section 3. Based on performance metrics, Section 4 illustrates the proposed methodology’s results and discussion. Finally, Section 5 concludes the paper with future work.

2. Literature Survey

Zhihong Tian et al. [19] presented a web Attack Detection System (ADS) that used the URL analysis advantage. The system, which was implemented on edge devices, was intended to identify web attacks. The cloud was able to manage a variety of issues in the Edge of Things paradigm (EoT). Multiple concurrent deep models were employed to improve system stability while also making the update process easier. The experiment was carried out on a system with ‘2’ parallel deep models, and numerous datasets were used to compare the system to existing systems. The trial findings showed that the system was competent in the detection of web attacks, with 99.41% accuracy, 98.91% True Positive Rate (TPR), and 99.55% Detection Rate of Normal Requests (DRN). The approach, however, was no longer effective in protecting computers from new cyber security threats. Abhishek Agarwal et al. [20] developed a P-estimation detection scheme that detected attacks effectively. Centered on the web server logs, this was executed by several trained DL-LSTM models. An estimate of the attack proportion was derived, which was then used to deploy a suitable detection model. This technique considered the dynamic nature of websites, where the popularity of web pages fluctuated over time due to the frequent updating of detection models. The FNR and FPR attained in this methodology were 0.0059% and 0.0%; respectively, together with attack detection capability as low as 2% intensity. Furthermore, this article included mitigation and attribution techniques for identifying and stopping such attacks. However, the technique necessitated many training samples. Thereby, high computational complexities were also identified for Advanced Persistent Threat (APT) attack identification; Fargana J. Abdullayeva et al. [21] used an auto-encoder-based deep learning technique. By identifying complex relationships between features, the approach produced a high classification result. Furthermore, by reducing the data size in the encoder, the model eased the process of large-volume data classification. Initially, the auto-encoder neural network was applied and then the informative features from network traffic data were studied in an unsupervised way. The SoftMax regression layer was then added to the top layer of the built auto-encoder network for APT attack classification. According to the findings, the approach was 98.32% accurate. However, physical tampering or theft by opponents was impossible to prevent.

AdiMaheswara Reddy Gali et al. [22] developed a dynamic and scalable virtual machine placement algorithm, where the set of moves’ computation was limited by means of the Previously-Selected-Server-First (PSSF) policy. The greedy baseline design was built from the obtained set of moves chosen. When a new VM placement request was processed, the highest priority was given to the previously hosted servers or already hosted VM from a similar user in PSSF. This approach’s performance comparison was executed with 20 different VM on four different servers. When analogized with other VM placement policies, the developed approach had better efficiency evaluations such as hit rate, and loss rate, coupled with resource loss. The precision of this procedure, however, was not satisfactory.

Gopal Singh Kushwah et al. [23] presented the latest method for DDoS attacks detection in a CC environment which comprised a Voting Extreme Learning Machine (V-ELM). The NSL-KDD and ISCX intrusion detection datasets were used to evaluate the experiments. The system detected the assaults with an accuracy of 99.18% with the NSL-KDD dataset and 92.11% with the ISCX dataset, according to the tests. Centered on back-propagation ANN, the developed system’s performance was analogized with other systems. Black hole optimization, extreme learning machine, random forest, and Adaboost were used to train the ANN. The method, however, was limited by an unacceptably high network overhead, particularly in terms of latency.

3. Proposed Framework for Intelligent Cyber Security

Cyber security has had a huge influence on a variety of essential infrastructures. Passive defense, on the other hand, is not the most efficient tool to safeguard against modern risks which occur in cyber security, like APT and zero-day assaults. Furthermore, as cyber threats grow more prevalent and long-lasting, the cost of deploying cyber threats is reduced by the variety of attack access values, high-level intrusion methods, and systematic assault tools. To maximize the safety level in the key system assets, it is critical to build a new safety protection procedure which manages a wide range of attacks. As illustrated in Figure 1, the study produced intelligent cyber security defenses and protections using HT-RLSTM, which collects historical and current security status data and generates intelligent judgments for adaptive security management and control. Intelligent cyber security defenses and protections have been incorporated in this model utilizing HT-RLSTM, which is depicted in Figure 1. This scheme collects historical and current security status data and makes intelligent judgments for adaptive security management and control.

3.1. Dataset

From publicly available websites, the historical, combined with current, security data are gathered and framed into datasets

(D_{n}^{C S})

. Datasets comprise detailed analysis of attacks such as DoS, exploits, generic, reconnaissance, shellcode, zero-day attack, APTs, and worms, etc. However, the collected data are in an unstructured format which might impact intelligent judgments. To address these issues, the data are cleansed from errors and transformed into a predefined format.

3.2. Handling Missing Values

When no information is given for one or more items or a whole unit, Missing Data (MD) can occur. In real-life scenarios, MD is a huge issue that refers to NA (Not Available). At a data frame, there is a possibility of datasets to enter with a large amount of NA that is not considered. The handling of missing values is important to detect attacks. Ignoring missing values leads to a high probability of error for detecting attacks or, sometimes, it may also represent some attacks. Handling the missing and Nan values is necessary to obtain an efficient outcome.

ℑ_{n} = \forall_{n a n} (D_{n}^{C S}, m e t h o d s = “ M ”)

(1)

In the Equation (1),

\forall_{n a n}

denotes the method that handles missing and Nan values. Various methods

M

can be used, such as median, mode, backward fill, forward fill, Imputation, etc., to fill the missing and Nan value. With respect to

i^{t h}

rows and

j^{t h}

columns, the framed data without missing values are framed as:

D_{n}^{C S} = {[D_{1}, D_{2}, D_{3}, D_{4}, D_{5}, \dots, D_{6}]}_{i \times j}

(2)

3.3. Scaling

The term “feature scaling” refers to the process of converting a set of independent variables or data characteristics into a single unit or scale. To handle the scaling of data, this work has developed Kernelized Robust Scaler (KRS) to scale down features within the same range. However, the existing scaling methods are highly impacted by outliers and most of the algorithms do not take the median into account and only focus on the bulk data, which leads to loss of local informative data. To overcome this, this work used a Gaussian kernel that helps to look after the left-skewed and right-skewed data points and to scale them down. If the data points belong within the percentile, then they are considered safe points, otherwise, they are outliers, i.e., risk data. The feature vector is calculated using KRS by removing the median of each feature and then dividing by the interquartile range (75–25%). The KRS approach assists in the removal of outliers and the assault on aberrant data.

S_{n} = K \frac{D_{n}^{c s} - {\tilde{D}}_{n}}{P (75 % - 25 %)}

(3)

K = e^{- \frac{D}{2 σ^{2}}}

(4)

where,

P

denotes the percentile,

K

represents the Gaussian kernel,

σ^{2}

denotes variance that validates the skewed data point spread from the mean and

{\tilde{D}}_{n}

denotes the median value.

Thus, the data are being pre-processed to obtain healthier data to minimize the error rate as observed in the Equation (5).

S_{n}^{c s} = {S_{1}, S_{2}, S_{3}, S_{4}, S_{5}, \dots, S_{n}}

(5)

3.4. Handling Imbalanced and Overlapped Data

The learning that is based upon the imbalanced data will give a powerful connection between the overlap and the class imbalance. The most critical steps in overcoming major cyber security issues are to deal with handling imbalance and overlapped data. When it comes to malware detection, domain reputation, or detecting network intrusion, unbalanced datasets are the most common. A model claiming “everything is harmless” may improve accuracy, but it will be useless since an uneven dataset is not discovered. The unbalanced difficulty is not unique to cyber security, but it is a key component of many cyber security issues. Using the given training inputs, the performance level is not achieved because of the large values of overlap and imbalance states. To remove the detrimental effects of the imbalance and overlap data, this work has developed a random over sample-based DBSCAN to overcome the imbalance and overlap data issues. The technique developed initially separates the overlapped data points into various clusters and thereafter, based on the count, the balancing of the class is performed. The developed approach helps to remove the noisy data points and handle the uncertain data in the ground truth.

Initially, density-based clustering is performed based on the data points. The approach depends upon the two parameters that are minimum points (MinPts) and Eps. The Epsilon (

ε

) states the radius of a circle by considering data from the dataset. MinPts illustrates the point that satisfies the user condition to form a dense cluster or region over data.

Now, based on the Eps and MinPts, the important points formed are core points, boundary points, and noise points. A data point is said to be the core point if it satisfies the MinPts within the Eps distance. A data point is said to be a boundary point if it is a neighbour of the core point. Finally, if no points lie nearer to the core or boundary point, then it is said to be a noise point.

The data points are chosen randomly for detecting whether they are core points, boundary points, or noise data. This may lead to high computation time along with a high error rate. This work used the Deterministic Initialization Method (DIM) for selecting optimal data points.

\begin{array}{l} f o r i = 0 : (D - 1) \\ C^{*} (i + 1) = [l b + 0.1 (u b - l b)] + [0.8 (u b - l b) / (D - 1)] . i \\ e n d \end{array}

(6)

In the Equation (6),

C^{*} (i + 1)

represents the optimal clusters point,

u b

and

l b

denotes the upper bound and lower bound, whose values ranges between [0, 1]. Now, core point (

\forall^{ε}

), boundary point (

B^{ε}

), and noise point are computed based on Euclidean distance for the optimal data point. The data point is checked by drawing a circle of distance

ε

and a condition of satisfying the MinPts using Euclidean distance that is computed by the Equations (7) and (8):

\forall^{ε} [S (C_{i, j}^{*})] = \sqrt{{(C_{l}^{1} - C_{m}^{1})}^{2} + {(C_{l}^{2} - C_{m}^{2})}^{2} + \dots {(C_{l}^{n} - C_{m}^{n})}^{2}}

(7)

B^{ε} [S (C_{i, j}^{*})] = \sqrt{{(B_{l}^{1} - \forall_{m}^{1})}^{2} + {(B_{l}^{2} - \forall_{m}^{2})}^{2} + \dots {(B_{l}^{n} - \forall_{m}^{n})}^{2}}

(8)

The data cluster

(C_{K}^{I})

is formed based on the core point and boundary points. The noisy data or outliers are stated as

(c^{+})

. Finally, the overlapped classes are separated, and a standard dataset is obtained

B_{d a t a}

. Now, the count of the classes is taken. If the classes are in an imbalanced proportion, then a random over sampler is performed. The ROS helps to pick up a sample of records from a dataset by improving the balancing rate. The random over sampler performs a copy of a set of data (

B_{d a t a}

) that are minorly distributed (

α_{\min o r i t y}

) in a dataset. The Equation (9) provides the increased likelihood of overfitting and a high balancing rate.

Γ = R a n d o m o v e r s a m p l e r (B_{d a t a} + \forall_{s} (α_{\min o r i t y}))

(9)

Now, the data are balanced and preceded further for the selection of relevant features.

3.5. Feature Selection

Feature selection is the method used to select the optimal attributes to develop an effective classification model. Characteristic extraction reduces calculation time while improving model accuracy. It is more important to evaluate the impact of every feature at the end of the result. Inactivating the features is most important because they either support or do not support negatively on the result side. In this step, the selection of features with positive outcomes is necessary to reduce the APT and achieve more accurate outputs or any other attacks. The current feature selection method has a poor relevance rate when it comes to identifying the most significant features that define an assault.

In addition to this, the high computation time for validating the data leads to a high error rate. To overcome this issue, this work has developed a Sine Cosine-Based Artificial Jellyfish Search (SC-AJS) optimizer. The feature selection is based upon the common nature of jellyfish in the sea. The searching behavior of the jellyfish in the sea comes with extraordinary mechanisms, such as in a swarm of jellyfish, which means there is a lot of activity, the method to switch from one movement to another movement, and the blooming approach. This mechanism helps to search and select the best features to detect various attacks. However, the issue of high computational complexity, inaccurate multivariate feature selection, and poor convergence rate leads to a high error rate in selecting the high relevance features to detect the attack. To overcome this issue, this work has developed an SC-AJS optimizer.

The optimizer achieves the best fitness value

χ

which tends to relate to the highly relevant feature for detecting an attack. The Equation (10) represents the best fitness value that depends upon the objective function that is given by:

Θ ((B_{d a t a}) = \max_{χ} [χ_{1} B_{1} + χ_{2} B_{2} + χ_{3} B_{3} + χ_{4} B_{4} + \dots + χ_{n} B_{n}])

(10)

The selection is performed on the working mechanism of jellyfish (features) that become attracted towards the ocean current (datasets), which contains much nutrition. The current best solution is calculated by taking the average value of all vectors of every jellyfish along the direction of the sea current. The formulation of ocean current is given denoted by the Equations (11) and (12) as,

Φ = \frac{1}{n_{p o p}} \sum Φ_{i} = \frac{1}{n_{p o p}} \sum (B - \partial_{c} B_{i}) = B - \partial_{c} \frac{\sum B_{i}}{n_{p o p}} = B - \partial_{c} υ

(11)

d f = \partial_{c} υ

(12)

where

n_{p o p}

is the number of jellyfish;

B

is the best location of the jelly fish in the swarm at the present stage;

\partial_{c}

is the attraction manage factor;

υ

is the base location of each jellyfish;

d f

is the difference between B and

υ

.

The Equations (13) and (14) depict the probability when the average distance between the place of each jellyfish that occurs during the spatial distribution is considered,

d f = α \times β \times r a n d^{f} (0, 1)

(13)

β = r a n d^{α} (0, 1) \times υ

(14)

where,

β

denotes the distribution’s standard deviation.

The latest location of all jellyfish is denoted by,

B_{i} (t + 1) = B_{i} (t) + r a n d (0, 1) \times (B_{i}^{*} - ζ \times r a n d (0, 1) \times υ)

(15)

Here, in the Equation (15),

ζ > 0

is a distribution coefficient that associates with the length of

Φ

.

The motion of the jellyfish is constrained within two groups that are A, B type of motion. The live location of every jellyfish is denoted by A type motion and the equivalent location update of every jelly fish is denoted by the Equation (16):

B_{i} (t + 1) = B_{i} (t) + ν \times r a n d (0, 1) \times (U_{B} - L_{B})

(16)

where

U_{B}

and

L_{B}

represent the upper bound and lower bound of search spaces, respectively;

ν > 0

is a motion coefficient related to the length of motion around jellyfish’s locations.

At B type motion, it states that a jellyfish is selected randomly and, based on the amount of food and change in position of the jellyfish, the motion is estimated along the direction of food, and, thereafter, the updating of the location is carried out using the Equation (17),

B_{i} (t + 1) = B_{i} (t) + S P

(17)

A time control technique is introduced to evaluate the motion type over time. It is used to control the movements of jellyfish in the sea current and the level of both A and B type motions in the swarm. As seen in the Equation (18), the time control function is an irregular value that varies from 0 to 1 over time.

Ψ (t) = | (1 - \frac{t}{\max_{i t e r}}) \times (2 \times r a n d (0, 1) - 1) |

(18)

where

t

represents the specified time in terms of iteration number and

\max_{i t e r}

is the maximum number of iterations, this is the starting parameter.

The jellyfish populations are created at random, which leads to slow convergence and becoming trapped at local minima. To improve the convergence speed, this work has used a bit-shift map. This will give more kinds of population, compared with the random selection method which produced a low probability of premature production. The bit shift mapping is formulated as within the range of

\forall : [0, 1) \to {[0, 1)}^{\infty}

where

\forall = [Z_{1}, Z_{2}, Z_{3}, \dots, Z_{n}]

, and initially,

Z_{0} = Z

. Now, the iterated function of the mapping is given in the Equation (19) as:

B (x) = {\begin{matrix} 2 x & 0 \leq x \leq \frac{1}{2} \\ 2 x - 1 & \frac{1}{2} \leq x < 1 \end{matrix}

(19)

If the binary notation is used to represent the repeated value, then the next repeated value will be calculated by shifting the binary point one bit to the right, and if the bit to the left of the new binary point is “one,” then it will be replaced by one zero.

Certain conditions are followed by the jellyfish, that is when the jellyfish return after circulating the entire ocean, the revised placement of the jellyfish is denoted in the Equation (20) by:

{\begin{matrix} {B^{″}}_{i, d} = (B_{i, d} - U_{B, d}) + L_{B} (d) i f B_{i, d} > U_{B, d} \\ {B^{″}}_{i, d} = (B_{i, d} - L_{B, d}) + U_{B} (d) i f B_{i, d} > L_{B, d} \end{matrix}

(20)

where,

B_{i, d}

is the location of the

i^{t h}

jellyfish in

d^{t h}

dimension;

{B^{″}}_{i, d}

is the updated location after checking boundary constraints.

Thus, after following all the conditions, at one point, the global best solution is obtained for the objective function. Now, to achieve the best features, updating the sine and cosine of the exploitation and exploration phase takes place in the Equations (21) and (22).

B_{i} (t + 1) = B_{i} (t) + r_{1} \sin (r_{2}) \times (B_{i}^{*} - ζ \times r a n d (0, 1) \times υ)

(21)

B_{i} (t + 1) = B_{i} (t) + ν \times r_{1} \cos (r_{2}) \times (U_{B} - L_{B})

(22)

where,

r_{1}

and

r_{2}

denotes the random number that ranges between [0, 1].

The best optimal solution is obtained by using the SC algorithm, which makes use of both the sine and cosine wave functions. In existing AJS, the location of the best member is affected by the distance and movement of every feature. In addition, it leads to high computation complexity and inaccurate multivariate classification.

As shown in the Equation (23), the feature selection technique provides the most relevant features required to classify the attacks, and it has been framed out in a data frame.

B_{s e l e c t i o n}^{C S} = [B_{1}^{T}, B_{2}^{T}, B_{3}^{T}, B_{4}^{T}, B_{5}^{T}, B_{6}^{T}, B_{7}^{T}]

(23)

3.6. Dataset Split

During the training of the detection model, the entire dataset is split into a training set and test set. In this method, it is trained and tested using the training and test sets. The difficulty with splitting is that, when the random state value is changed during the train-test split, for various random states there is the introduction of various accuracy, making it impossible to pinpoint the model’s accuracy precisely. Furthermore, random sampling prevents detailed training and testing of the characteristics, resulting in minimal bias and variation. To solve this problem, the researchers employed stratified K-Fold cross-validation.

The ordinary K-Fold cross-validation is extended into stratified K-Fold cross-validation to overcome the issues during classification; in the total dataset the ratio between target classes remains constant in every fold, rather than the splits being fully random.

As an example, the stratified sampling method uses a 64 negative class 0 training set (

B_{n}^{T r a i n}

) and 16 positive class 1 (80% of 20) samples, i.e., 64 {0} + 16 {1} = 80 samples in the training set, which represents the original dataset (

B_{n}^{d a t a}

) in equal proportion, and the test set

(B_{n}^{T e s t})

consists of 16 negative class 0 (20% of 80) and 4 positive class 1 (20% of 20) samples, which states that the total dataset is in the same proportion. The accuracy of this method of train-test split is excellent.

3.7. Detection

After splitting the dataset, the training

(B_{n}^{T r a i n})

and testing

(B_{n}^{T e s t})

of the model are carried out in order to detect attacks. Detection is a crucial step that acquires knowledge from the features and predicts or detects the attack by providing intelligent cyber security. This work has developed Hyperparameter Tuning based on Regularized Long Short-Term Memory (HT-RLSTM). The existing LSTM leads to vanishing and exploding gradient descent problems due to poor initialization of weights. In addition, there are problems caused by high bias and low variance, low bias and low variance, and improper selection of regularization parameters that lead to a high error rate as well as computation time and cost. Figure 2 shows that this work has tuned (initialized) the weight using a confidence interval (CI) and performed Average Deviation-Based Square-Root Elastic Net Regularization (AD-SREnetReg) in the LSTM to tackle this issue. Avoiding catastrophic forgetting was the added advantage generated by the detection technique, which aids in the detection of adversarial backdoor poisoning attacks.

Fundamentally, the memory unit in the RNN is added to store the data and this is improved by the LSTM, which improves on the original hidden layer neural nodes of RNN. Similarly, an input gate, an output gate, and a forget gate are added to the LSTM to assess if past information should be discarded. The RNN is more complicated than the layer cell design. This LSTM network comprises an input gate, an output gate, a forget gate, and a cell state. The introduction of new data is prevented by the input gate, the output data is prevented by the output gate, the stored information is controlled by the forget gate, and the valuable information is stored by using the cell state.

Initially, the weight initialization is carried out using a confidence interval, i.e.,

w_{n} = [\bar{B} + Z * \frac{φ}{\sqrt{N}}, \bar{B} - Z * \frac{φ}{\sqrt{N}}]

(24)

where,

\bar{B}

denotes the training and testing samples,

Z

indicates the 95% of confidence,

φ

denotes the standard deviation, and

N

denotes the total dataset size. Based on the confidence interval, the weight achieves 95% surety that it belongs within this range, which keeps the weight within a moderate value and avoids the exploding or vanishing gradient descent problem.

Thereafter, based on the weight value, the forward propagation process of the LSTM begins. The input gate layer takes data from the preceding concealed layer as well as the current input. As you can observe in the Equation (25). The information is then computed to obtain the following output:

\begin{array}{l} I_{t} = A c t (w_{i} . [h_{t - 1}, B_{t}] + b_{i}) \\ {\tilde{ς}}_{t} = A c t_{\tanh} (w_{c} . [h_{t - 1}, B_{t}] + b_{c}) \end{array}

(25)

where the value range of

I_{t}

is (0,1),

w_{i}

is the weight of the input gate,

b_{i}

is the bias of the input gate,

w_{c}

is the weight of the candidate input gate, and

b_{c}

is the bias of the candidate input gate.

The output of the forget gate has a similar computation formula as the input gate with different weights

w_{f}

and bias

b_{f}

as shown in Equation (26).

F_{t} = A c t (w_{f} . [h_{t - 1}, B_{t}] + b_{f})

(26)

where, the value range of

F_{t}

is (0, 1),

w_{f}

is the weight of the forget gate, and

b_{f}

is the bias of the forget gate,

B_{t}

is the input value of the current time, and

h_{t - 1}

is the output value of the last moment.

The step of updating from the previous cell state

ς_{t - 1}

to the current cell state (

ς_{t}

), is shown in the Equation (27):

ς_{t} = F_{t} * ς_{t - 1} + i_{t} * {\tilde{ς}}_{t}

(27)

here, the cell state

ς_{t}

’s range of value is considered as (0, 1).

As per Equation (28), the current input, memory cell, and output of the last hidden layer are governing its outcome.

O_{t} = A c t (w_{o} . [h_{t - 1}, B_{t}] + b_{o})

(28)

The output gate and cell state results are utilized for estimating the LSTM’s output value, which is derived in the following equation:

ℏ_{t} = O_{t} * \tanh (ς_{t})

(29)

where,

O_{t}

ranges between (0, 1),

w_{o}

denotes output gate weight, and

b_{o}

denotes output gate bias.

Based on the output, the loss function is evaluated for minimizing error. The loss function evaluation is carried out using Average Deviation-Based Square-Root Elastic Net Regularization (AD-SREnetReg), i.e.,

\min_{γ \in ℜ} {\frac{{‖ {\hat{O}}_{t} - (| \bar{B} - m (B) |) γ ‖}_{2}}{\sqrt{n}} + λ ‖ γ ‖ + \frac{n}{2} λ {‖ γ ‖}_{2}^{2}

(30)

where,

{\hat{O}}_{t}

denotes the predicted value,

\bar{B}

represents the mean value of the test samples,

m (B)

denotes the method which might be mean, median, mode for the respective feature set test value, γ and

λ

denotes penalty and learning rate.

Finally, by minimizing the loss function, the model is able to be trained perfectly by avoiding the problem of overfitting and underfitting. Hence, based on the HT-RLSTM model, the attacks are identified and validated. Thus, the outline of the proposed HT-RLSTM is illustrated in pseudo code form in Figure 3.

4. Results and Discussion

This section highlights the final outcome of the proposed framework through a detailed analysis. A performance analysis, as well as a comparative analysis, are used to demonstrate the effectiveness of this work. The proposed methodology is implemented using PYTHON 3.7, and the datasets NSL-KDDCUP99 and the ISCX intrusion detection are taken from the Kaggle website (https://www.kaggle.com/, accessed on 3 January 2022). The following experiments were set up on a 5PK8T Intel Core 11th Generation i7-1165G7 Processor (Quad Core, up to 4.70 GHz, 12 MB Cache), 64-bit Windows 11 OS.

4.1. Evaluation of Classification Technique Based on Attack Detection Rate

The proposed HT-RLSTM is evaluated based on the attack detection rate. Attack detection rate intimates how accurately the proposed classification detects the various attacks. Additionally, the final outcomes are compared with various existing works like K-Nearest Neighbour (KNN), Adaptive Network-Based Fuzzy Inference System (ANFIS), Support Vector Machine (SVM), and Artificial Neural Network (ANN) to state the worthiness of the model.

Table 1 demonstrates the performance analysis of the proposed HT-RLSTM with various existing methods, such as KNN, ANFIS, SVM, and ANN-based on attack detection rates. For this evaluation, several attacks like DoS, exploits, generic, reconnaissance, shellcode, zero-day attack, and worms are considered. The proposed method detects attacks with an average accuracy of 96.81%, whereas existing works are detected with an average accuracy rate of 91.33%, which is evident from the evaluation. Therefore, it is clear that the proposed method is highly efficient in terms of attack detection and also tends to be robust compared to the existing methods.

4.2. Performance Analysis of the Proposed HT-RLSTM

The proposed HT-RLSTM is evaluated in terms of various performance metrics, such as specificity, sensitivity, precision, and F-measure, and the final outcomes are compared with various existing works like KNN, ANFIS, SVM, and ANN in order to state the effectiveness of the model. The various performance metrics for attack detection are calculated as follows:

Accuracy = \frac{T P + T N}{T P + F N + F P + T N} Precision = \frac{T P}{T P + F P} Sensitivity = \frac{T P}{T P + F N} Specificity = \frac{T N}{T N + F P}

Table 2 comprises the value of the performance metrics, such as specificity, sensitivity, precision, and F-Measure of the proposed HT-RLSTM and the other existing works, like KNN, ANFIS, SVM, and ANN. The significance of the model is determined by the higher rate of performance metrics. As per the statement, the proposed method achieves 84.72% specificity, 95.45% sensitivity, 96.42% precision, and 97.23% of F-measure. However, the existing work obtains a specificity, sensitivity, precision, and F-Measure rate at an average of 90.34%, 89.04%, 86.37%, and 94.20%, respectively. This is lower as compared to the proposed work. Thus, the proposed HT-RLSTM mitigates various complexities and enhances the reliability of the cyber-attack detection process.

A clear view of Table 2 is given in Figure 4. Figure 4 shows the comparative analysis of the proposed work. This comparative analysis clearly states that the proposed HT-RLSTM tends to attain higher specificity, sensitivity, precision, and F-measure values that range between 94.72–97.23%, whereas the existing techniques such as KNN, ANFIS, SVM, and ANN attain metrics values that, overall, range between 82.68–95.48%, which is comparatively lower than the HT-RLSTM. Hence, the proposed HT-RLSTM technique outperforms the other state-of-art methods and delivers more prominent results under various complex circumstances.

Table 3 comprises the FPR, FNR, and MCC rates achieved by the proposed HT-RLSTM compared to existing works like KNN, ANFIS, SVM, and ANN. The significance of the model is determined by the low value of FPR and FNR rates and the high value of MCC. In accordance with the above-mentioned statement, the FPR and FNR rates of the proposed work are low, such as 1.82% and 3.03%, respectively. The MCC rate achieved by the proposed work is 95.16%, which is higher than the existing approaches. The existing approaches attain the FPR and FNR rates that, overall, range between 4.24–9.7% and 5.45–12.12%, respectively. Hence, the proposed method outperforms the other state-of-art methods and delivers better outcomes in the attack detection process.

Figure 5 compares the evaluation metrics, such as FPR, FNR, and MCC of the proposed work with existing works. The significance of the model is determined by the low value of FPR and FNR rates and the high value of MCC. In accordance with the above-mentioned statement, the FPR and FNR rates of the proposed work are notably lower and the MCC rate achieved by the proposed work is higher than the existing approaches. Hence, the proposed method outperforms the other state-of-art methods and delivers better outcomes under complex situations.

4.3. Evaluation of Classification Technique Based on Computational Time

The proposed HT-RLSTM is validated with respect to computation time, and the final outcomes are compared with various existing works like KNN, ANFIS, SVM, and ANN in order to state the efficiency of the model.

Figure 6 illustrates the comparison analysis of the proposed method and existing works with respect to computation time. For this comparison, several datasets, such as KDD99, NSL-KDD99, CIDDS-001, and UNSW-NB15, are used. The estimation time is zero, for the time as required by the model to detect the attack. The computation time of the proposed work to process the dataset, such as KDD99, is 79 s, NSL-KDD99 is 80 s, CIDDS-001 is 73 s, and UNSW-NB15 is 68 s. The existing KNN, ANFIS, SVM, and ANN tend to achieve a high computational time ranging between 85 s to 193 s. Hence, the time complexities of the existing works are high. Thus, the proposed technique remains to be robust for an attack detection system and detects the attacks with limited time.

5. Conclusions

With the fast advancement of AI and cyberspace security, the combination of these two disciplines opens an increasing number of application possibilities. However, as cyber dangers become more numerous and persistent, an intelligent cyber security architecture is required to manage and control future risks. For that reason, this work has developed an intelligent cyber security framework using the HT-RLSTM technique to detect attacks. This work handles missing values, performs scaling, and handles imbalanced as well as overlapped data to maintain the consistency of the data. The pre-processing of data helps to cope with uncertain data and avoids false attack detection. The framework helps to protect the system against new cyber threats like APT and zero-day attacks. The training of the HT-RLSTM is performed based on highly knowledgeable features that help to obtain a deep knowledge of the attack. The high chances of adversarial backdoor poisoning attacks due to catastrophic forgetting are fully arrested. Overall, the experimental analysis showed that this work obtains an Accuracy of, Precision of, Recall of, F-Measure of and avoids false attack detection by achieving an FNR of, FPR of. Hence, the proposed framework performs better attack detection compared to existing state-of-art methods.

Author Contributions

Conceptualization, M.D.; methodology, M.D. and N.N.; software, M.D., N.N. and D.D.; validation, M.D., N.N. and D.D.; formal analysis, M.D., N.N. and D.D.; investigation, M.D., N.N. and D.D.; data curation, M.D., N.N. and D.D.; writing—original draft preparation, M.D., N.N. and D.D.; writing—review and editing, M.D., N.N. and D.D.; visualization, M.D., N.N. and D.D.; supervision, M.D., and N.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

Not Applicable.

Acknowledgments

The authors would like to thank the referees for their valuable comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yu, W.; Xu, G.; Chen, Z.; Moulema, P. A cloud computing based architecture for cyber bsecurity situation awareness. In Proceedings of the IEEE Conference on Communications and Network Security (CNS), National Harbor, MD, USA, 14–16 October 2013. [Google Scholar]
Puri, S.; Agnihotri, M. A proactive approach for cyber attack mitigation in cloud network. In Proceedings of the International Conference on Energy, Communication, Data Analytics and Soft Computing, Chennai, India, 1–2 August 2017. [Google Scholar]
Sahi, A.; Lai, D.; Li, Y.; Diykh, M. An efficient DDoS TCP flood attack detection and prevention system in a cloud environment. IEEE Access 2017, 5, 6036–6048. [Google Scholar] [CrossRef]
Chen, C.-M.; Guan, D.J.; Huang, Y.-Z.; Ou, Y.-H. Attack sequence detection in cloud using hidden markov model. In Proceedings of the Seventh Asia Joint Conference on Information Security, Tokyo, Japan, 9–10 August 2012. [Google Scholar]
Patil, R.; Dudeja, H.; Gawade, S.; Modi, C. Protocol specific multi-threades network intrusion detection system (PM-NIDS) for DoS/DDoS attack detection in cloud. In Proceedings of the 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Bengaluru, India, 10–12 July 2018. [Google Scholar]
Al-Turkistani, H.F.; AlFaadhel, A. Cyber resiliency in the context of cloud computing through cyber risk assessment. In Proceedings of the 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), Riyadh, Saudi Arabia, 6–7 April 2021. [Google Scholar]
Aslan, O.; Ozkan-Okay, M.; Gupta, D. Intelligent behavior-based malware detection system on cloud computing environment. IEEE Access 2021, 9, 83252–83271. [Google Scholar] [CrossRef]
Youssef, B.C.; Nada, M.; Elmehdi, B.; Boubker, R. Intrusion detection in cloud computing based attacks patterns and risk assessment. In Proceedings of the Third International Conference on Systems of Collaboration (SysCo), Casablanca, Morocco, 28–29 November 2016. [Google Scholar]
Rani, D.R.; Geethakumari, G. Secure data transmission and detection of anti- forensic attacks in cloud environment using MECC and DLMNN. Comput. Commun. 2019, 150, 799–810. [Google Scholar] [CrossRef]
Alshammari, A.; Alhaidari, S.; Alharbi, A.; Zohdy, M. Security threats and challenges in cloud computing. In Proceedings of the 4th International Conference on Cyber Security and Cloud Computing, New York, NY, USA, 26–28 June 2017. [Google Scholar]
Chadwick, D.W.; Fan, W.; Constantino, G.; de Lemos, R.; di Cerbo, F.; Herwono, I.; Mori, P.; Sajjad, A.; Wang, X.; Manea, M. A cloud-edge based data security architecture for sharing and analyzing cyber threat information. Future Gener. Comput. Syst. 2019, 102, 710–722. [Google Scholar] [CrossRef]
Sarker, I.H. Cyber Learning effectiveness analysis of machine learning security modeling to detect cyber-anomalies and multi-attacks. Internet Things 2021, 14, 1–18. [Google Scholar] [CrossRef]
Challa, S.; Das, A.K.; Gope, P.; Kumar, N.; Wu, F.; Vasilakos, A.V. Design and analysis of authenticated key agreement scheme in cloud-assisted cyber-physical systems. Future Gener. Comput. Syst. 2018, 108, 1–25. [Google Scholar] [CrossRef]
Verma, C.; Dey, S. Methods to obtain training videos for fully automated application-specific classification. IEEE Access 2015, 3, 1188–1205. [Google Scholar] [CrossRef]
L-Ghamdi, M.I.A. Effects of knowledge of cyber security on prevention of attacks. Mater. Today Proc. 2021, in press. [Google Scholar] [CrossRef]
Cui, H. Handoff control strategy of cyber physical systems under dynamic data attack. Comput. Commun. 2021, 178, 183–190. [Google Scholar] [CrossRef]
Kanimozhi, V.; Prem Jacob, T. Artificial intelligence based network intrusion detection with hyper-parameter optimization tuning on the realistic cyber dataset CSE-CIC-IDS2018 using cloud computing. ICT Express 2019, 5, 211–214. [Google Scholar] [CrossRef]
Annarelli, A.; Nonino, F.; Palombi, G. Understanding the management of cyber resilient systems. Comput. Ind. Eng. 2020, 149, 1–18. [Google Scholar] [CrossRef]
Tian, Z.; Luo, C.; Qiu, J.; Du, X.; Guizani, M. A distributed deep learning system for web attack detection on edge devices. IEEE Trans. Ind. Inform. 2019, 16, 1963–1971. [Google Scholar] [CrossRef]
Agarwal, A.; Prasad, A.; Rustogi, R.; Mishra, S. Detection and mitigation of fraudulent resource consumption attacks in cloud using deep learning approach. J. Inf. Secur. Appl. 2021, 56, 1–14. [Google Scholar] [CrossRef]
Abdullayeva, F.J. Advanced persistent threat attack detection method in cloud computing based on autoencoder and softmax regression algorithm. Array 2021, 10, 1–11. [Google Scholar] [CrossRef]
Gali, A.M.R.; Koduganti, V.R. Dynamic and scalable virtual machine placement algorithm for mitigating side channel attacks in cloud computing. Mater. Today Proc. 2021, in press. [Google Scholar] [CrossRef]
Kushwah, G.S.; Ranga, V. Voting extreme learning machine based distributed denial of service attack detection in cloud computing. J. Inf. Secur. Appl. 2020, 53, 1–12. [Google Scholar] [CrossRef]

Figure 1. Proposed framework for intelligent cyber security.

Figure 2. Proposed HT-RLSTM architecture.

Figure 3. Pseudo code for proposed HT-RLSTM.

Figure 4. Comparative analysis of the proposed HT-RLSTM based on specificity, sensitivity, precision, and recall.

Figure 5. Comparative analysis of proposed HT-RLSTM with respect to FPR, FNR, and MCC.

Figure 6. Comparative analysis of the proposed HT-RLSTM based on computation time.

Table 1. Performance analysis of the proposed HT-RLSTM based on attack detection rate.

Techniques	Attack Detection Rates (%)
Techniques	DoS	Exploits	Generic	Reconnaissance	Shellcode	Zero Day	Worms
Proposed HT-RLSTM	97.65	98.46	96.12	95.36	94.75	98.56	96.82
KNN	96.15	95.71	95.26	93.84	92.68	96.12	93.51
ANFIS	93.48	94.28	92.58	91.73	90.35	94.34	91.26
SVM	91.63	92.18	89.82	89.72	87.37	92.59	89.03
ANN	89.04	90.63	85.17	87.91	85.38	89.72	85.79

Table 2. Performance analysis of the proposed HT-RLSTM based on specificity, sensitivity, precision, and recall.

Techniques	Performance Metrics (%)
Techniques	Proposed HT-RLSTM	KNN	ANFIS	SVM	ANN
Specificity	94.72	83.16	84.45	82.68	91.92
Sensitivity	95.45	86.46	87.44	85.65	92.66
Precision	96.42	91.23	88.56	86.35	94.48
F-Measures	97.23	93.35	91.13	87.12	95.48

Table 3. Performance analysis of proposed HT-RLSTM with respect to FPR, FNR, and MCC.

Techniques	Performance Metrics (%)
Techniques	FPR	FNR	MCC
Proposed HT-RLSTM	1.82	3.03	95.16
KNN	4.24	5.45	90.31
ANFIS	7.88	6.13	85.99
SVM	9.7	12.12	78.2
ANN	8.1	14.72	87.19

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dahiya, M.; Nitin, N.; Dahiya, D. Intelligent Cyber Security Framework Based on SC-AJSO Feature Selection and HT-RLSTM Attack Detection. Appl. Sci. 2022, 12, 6314. https://doi.org/10.3390/app12136314

AMA Style

Dahiya M, Nitin N, Dahiya D. Intelligent Cyber Security Framework Based on SC-AJSO Feature Selection and HT-RLSTM Attack Detection. Applied Sciences. 2022; 12(13):6314. https://doi.org/10.3390/app12136314

Chicago/Turabian Style

Dahiya, Mahima, Nitin Nitin, and Deepak Dahiya. 2022. "Intelligent Cyber Security Framework Based on SC-AJSO Feature Selection and HT-RLSTM Attack Detection" Applied Sciences 12, no. 13: 6314. https://doi.org/10.3390/app12136314

APA Style

Dahiya, M., Nitin, N., & Dahiya, D. (2022). Intelligent Cyber Security Framework Based on SC-AJSO Feature Selection and HT-RLSTM Attack Detection. Applied Sciences, 12(13), 6314. https://doi.org/10.3390/app12136314

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Cyber Security Framework Based on SC-AJSO Feature Selection and HT-RLSTM Attack Detection

Abstract

1. Introduction

2. Literature Survey

3. Proposed Framework for Intelligent Cyber Security

3.1. Dataset

3.2. Handling Missing Values

3.3. Scaling

3.4. Handling Imbalanced and Overlapped Data

3.5. Feature Selection

3.6. Dataset Split

3.7. Detection

4. Results and Discussion

4.1. Evaluation of Classification Technique Based on Attack Detection Rate

4.2. Performance Analysis of the Proposed HT-RLSTM

4.3. Evaluation of Classification Technique Based on Computational Time

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI