Cyber security has had a huge influence on a variety of essential infrastructures. Passive defense, on the other hand, is not the most efficient tool to safeguard against modern risks which occur in cyber security, like APT and zero-day assaults. Furthermore, as cyber threats grow more prevalent and long-lasting, the cost of deploying cyber threats is reduced by the variety of attack access values, high-level intrusion methods, and systematic assault tools. To maximize the safety level in the key system assets, it is critical to build a new safety protection procedure which manages a wide range of attacks. As illustrated in
Figure 1, the study produced intelligent cyber security defenses and protections using HT-RLSTM, which collects historical and current security status data and generates intelligent judgments for adaptive security management and control. Intelligent cyber security defenses and protections have been incorporated in this model utilizing HT-RLSTM, which is depicted in
Figure 1. This scheme collects historical and current security status data and makes intelligent judgments for adaptive security management and control.
3.4. Handling Imbalanced and Overlapped Data
The learning that is based upon the imbalanced data will give a powerful connection between the overlap and the class imbalance. The most critical steps in overcoming major cyber security issues are to deal with handling imbalance and overlapped data. When it comes to malware detection, domain reputation, or detecting network intrusion, unbalanced datasets are the most common. A model claiming “everything is harmless” may improve accuracy, but it will be useless since an uneven dataset is not discovered. The unbalanced difficulty is not unique to cyber security, but it is a key component of many cyber security issues. Using the given training inputs, the performance level is not achieved because of the large values of overlap and imbalance states. To remove the detrimental effects of the imbalance and overlap data, this work has developed a random over sample-based DBSCAN to overcome the imbalance and overlap data issues. The technique developed initially separates the overlapped data points into various clusters and thereafter, based on the count, the balancing of the class is performed. The developed approach helps to remove the noisy data points and handle the uncertain data in the ground truth.
Initially, density-based clustering is performed based on the data points. The approach depends upon the two parameters that are minimum points (MinPts) and Eps. The Epsilon () states the radius of a circle by considering data from the dataset. MinPts illustrates the point that satisfies the user condition to form a dense cluster or region over data.
Now, based on the Eps and MinPts, the important points formed are core points, boundary points, and noise points. A data point is said to be the core point if it satisfies the MinPts within the Eps distance. A data point is said to be a boundary point if it is a neighbour of the core point. Finally, if no points lie nearer to the core or boundary point, then it is said to be a noise point.
The data points are chosen randomly for detecting whether they are core points, boundary points, or noise data. This may lead to high computation time along with a high error rate. This work used the Deterministic Initialization Method (DIM) for selecting optimal data points.
In the Equation (6),
represents the optimal clusters point,
and
denotes the upper bound and lower bound, whose values ranges between [0, 1]. Now, core point (
), boundary point (
), and noise point are computed based on Euclidean distance for the optimal data point. The data point is checked by drawing a circle of distance
and a condition of satisfying the MinPts using Euclidean distance that is computed by the Equations (7) and (8):
The data cluster
is formed based on the core point and boundary points. The noisy data or outliers are stated as
. Finally, the overlapped classes are separated, and a standard dataset is obtained
. Now, the count of the classes is taken. If the classes are in an imbalanced proportion, then a random over sampler is performed. The ROS helps to pick up a sample of records from a dataset by improving the balancing rate. The random over sampler performs a copy of a set of data (
) that are minorly distributed (
) in a dataset. The Equation (9) provides the increased likelihood of overfitting and a high balancing rate.
Now, the data are balanced and preceded further for the selection of relevant features.
3.5. Feature Selection
Feature selection is the method used to select the optimal attributes to develop an effective classification model. Characteristic extraction reduces calculation time while improving model accuracy. It is more important to evaluate the impact of every feature at the end of the result. Inactivating the features is most important because they either support or do not support negatively on the result side. In this step, the selection of features with positive outcomes is necessary to reduce the APT and achieve more accurate outputs or any other attacks. The current feature selection method has a poor relevance rate when it comes to identifying the most significant features that define an assault.
In addition to this, the high computation time for validating the data leads to a high error rate. To overcome this issue, this work has developed a Sine Cosine-Based Artificial Jellyfish Search (SC-AJS) optimizer. The feature selection is based upon the common nature of jellyfish in the sea. The searching behavior of the jellyfish in the sea comes with extraordinary mechanisms, such as in a swarm of jellyfish, which means there is a lot of activity, the method to switch from one movement to another movement, and the blooming approach. This mechanism helps to search and select the best features to detect various attacks. However, the issue of high computational complexity, inaccurate multivariate feature selection, and poor convergence rate leads to a high error rate in selecting the high relevance features to detect the attack. To overcome this issue, this work has developed an SC-AJS optimizer.
The optimizer achieves the best fitness value
which tends to relate to the highly relevant feature for detecting an attack. The Equation (10) represents the best fitness value that depends upon the objective function that is given by:
The selection is performed on the working mechanism of jellyfish (features) that become attracted towards the ocean current (datasets), which contains much nutrition. The current best solution is calculated by taking the average value of all vectors of every jellyfish along the direction of the sea current. The formulation of ocean current is given denoted by the Equations (11) and (12) as,
where
is the number of jellyfish;
is the best location of the jelly fish in the swarm at the present stage;
is the attraction manage factor;
is the base location of each jellyfish;
is the difference between B and
.
The Equations (13) and (14) depict the probability when the average distance between the place of each jellyfish that occurs during the spatial distribution is considered,
where,
denotes the distribution’s standard deviation.
The latest location of all jellyfish is denoted by,
Here, in the Equation (15), is a distribution coefficient that associates with the length of .
The motion of the jellyfish is constrained within two groups that are A, B type of motion. The live location of every jellyfish is denoted by A type motion and the equivalent location update of every jelly fish is denoted by the Equation (16):
where
and
represent the upper bound and lower bound of search spaces, respectively;
is a motion coefficient related to the length of motion around jellyfish’s locations.
At B type motion, it states that a jellyfish is selected randomly and, based on the amount of food and change in position of the jellyfish, the motion is estimated along the direction of food, and, thereafter, the updating of the location is carried out using the Equation (17),
A time control technique is introduced to evaluate the motion type over time. It is used to control the movements of jellyfish in the sea current and the level of both A and B type motions in the swarm. As seen in the Equation (18), the time control function is an irregular value that varies from 0 to 1 over time.
where
represents the specified time in terms of iteration number and
is the maximum number of iterations, this is the starting parameter.
The jellyfish populations are created at random, which leads to slow convergence and becoming trapped at local minima. To improve the convergence speed, this work has used a bit-shift map. This will give more kinds of population, compared with the random selection method which produced a low probability of premature production. The bit shift mapping is formulated as within the range of
where
, and initially,
. Now, the iterated function of the mapping is given in the Equation (19) as:
If the binary notation is used to represent the repeated value, then the next repeated value will be calculated by shifting the binary point one bit to the right, and if the bit to the left of the new binary point is “one,” then it will be replaced by one zero.
Certain conditions are followed by the jellyfish, that is when the jellyfish return after circulating the entire ocean, the revised placement of the jellyfish is denoted in the Equation (20) by:
where,
is the location of the
jellyfish in
dimension;
is the updated location after checking boundary constraints.
Thus, after following all the conditions, at one point, the global best solution is obtained for the objective function. Now, to achieve the best features, updating the sine and cosine of the exploitation and exploration phase takes place in the Equations (21) and (22).
where,
and
denotes the random number that ranges between [0, 1].
The best optimal solution is obtained by using the SC algorithm, which makes use of both the sine and cosine wave functions. In existing AJS, the location of the best member is affected by the distance and movement of every feature. In addition, it leads to high computation complexity and inaccurate multivariate classification.
As shown in the Equation (23), the feature selection technique provides the most relevant features required to classify the attacks, and it has been framed out in a data frame.
3.6. Dataset Split
During the training of the detection model, the entire dataset is split into a training set and test set. In this method, it is trained and tested using the training and test sets. The difficulty with splitting is that, when the random state value is changed during the train-test split, for various random states there is the introduction of various accuracy, making it impossible to pinpoint the model’s accuracy precisely. Furthermore, random sampling prevents detailed training and testing of the characteristics, resulting in minimal bias and variation. To solve this problem, the researchers employed stratified K-Fold cross-validation.
The ordinary K-Fold cross-validation is extended into stratified K-Fold cross-validation to overcome the issues during classification; in the total dataset the ratio between target classes remains constant in every fold, rather than the splits being fully random.
As an example, the stratified sampling method uses a 64 negative class 0 training set () and 16 positive class 1 (80% of 20) samples, i.e., 64 {0} + 16 {1} = 80 samples in the training set, which represents the original dataset () in equal proportion, and the test set consists of 16 negative class 0 (20% of 80) and 4 positive class 1 (20% of 20) samples, which states that the total dataset is in the same proportion. The accuracy of this method of train-test split is excellent.
3.7. Detection
After splitting the dataset, the training
and testing
of the model are carried out in order to detect attacks. Detection is a crucial step that acquires knowledge from the features and predicts or detects the attack by providing intelligent cyber security. This work has developed Hyperparameter Tuning based on Regularized Long Short-Term Memory (HT-RLSTM). The existing LSTM leads to vanishing and exploding gradient descent problems due to poor initialization of weights. In addition, there are problems caused by high bias and low variance, low bias and low variance, and improper selection of regularization parameters that lead to a high error rate as well as computation time and cost.
Figure 2 shows that this work has tuned (initialized) the weight using a confidence interval (CI) and performed Average Deviation-Based Square-Root Elastic Net Regularization (AD-SREnetReg) in the LSTM to tackle this issue. Avoiding catastrophic forgetting was the added advantage generated by the detection technique, which aids in the detection of adversarial backdoor poisoning attacks.
Fundamentally, the memory unit in the RNN is added to store the data and this is improved by the LSTM, which improves on the original hidden layer neural nodes of RNN. Similarly, an input gate, an output gate, and a forget gate are added to the LSTM to assess if past information should be discarded. The RNN is more complicated than the layer cell design. This LSTM network comprises an input gate, an output gate, a forget gate, and a cell state. The introduction of new data is prevented by the input gate, the output data is prevented by the output gate, the stored information is controlled by the forget gate, and the valuable information is stored by using the cell state.
Initially, the weight initialization is carried out using a confidence interval, i.e.,
where,
denotes the training and testing samples,
indicates the 95% of confidence,
denotes the standard deviation, and
denotes the total dataset size. Based on the confidence interval, the weight achieves 95% surety that it belongs within this range, which keeps the weight within a moderate value and avoids the exploding or vanishing gradient descent problem.
Thereafter, based on the weight value, the forward propagation process of the LSTM begins. The input gate layer takes data from the preceding concealed layer as well as the current input. As you can observe in the Equation (25). The information is then computed to obtain the following output:
where the value range of
is (0,1),
is the weight of the input gate,
is the bias of the input gate,
is the weight of the candidate input gate, and
is the bias of the candidate input gate.
The output of the forget gate has a similar computation formula as the input gate with different weights
and bias
as shown in Equation (26).
where, the value range of
is (0, 1),
is the weight of the forget gate, and
is the bias of the forget gate,
is the input value of the current time, and
is the output value of the last moment.
The step of updating from the previous cell state
to the current cell state (
), is shown in the Equation (27):
here, the cell state
’s range of value is considered as (0, 1).
As per Equation (28), the current input, memory cell, and output of the last hidden layer are governing its outcome.
The output gate and cell state results are utilized for estimating the LSTM’s output value, which is derived in the following equation:
where,
ranges between (0, 1),
denotes output gate weight, and
denotes output gate bias.
Based on the output, the loss function is evaluated for minimizing error. The loss function evaluation is carried out using Average Deviation-Based Square-Root Elastic Net Regularization (AD-SREnetReg), i.e.,
where,
denotes the predicted value,
represents the mean value of the test samples,
denotes the method which might be mean, median, mode for the respective feature set test value,
γ and
denotes penalty and learning rate.
Finally, by minimizing the loss function, the model is able to be trained perfectly by avoiding the problem of overfitting and underfitting. Hence, based on the HT-RLSTM model, the attacks are identified and validated. Thus, the outline of the proposed HT-RLSTM is illustrated in pseudo code form in
Figure 3.