1. Introduction
Recently, cloud computing (CC) has developed as one of the common Internet-based technologies in the information technology (IT) field [
1]. There are three levels that compose CC, e.g., the system layer, platform layer, and application layer [
2]. Cloud security is considered the main challenge to cloud adoption by most enterprises [
3]. The open and entirely dispersed nature of the cloud platform makes it extremely susceptible to vulnerabilities and security attacks [
4]. Therefore, intruders have a high potential to carry out threats against cloud-linked devices or the cloud. Alternatively, cloud cyberattacks and services have a negative impact on CC platform performance and QoS requirements [
5]. Conventional security regulation, namely, firewalls and antivirus software, could protect cloud infrastructure from complex cyberattacks [
6]. By employing machine learning (ML), enterprises can improve their security actions and decrease the risk of data breaches. The possibility of ML to enhance attack detection and response is the major advantage of utilizing it for cloud security. Classical security regulations, namely, antivirus software and firewalls, are responsive and only react to known attacks [
7]. Conversely, the ML method can detect patterns in information, which may specify an attack, even when the attack is not yet known. According to previous information, the ML technique is created for identifying designs that point to security vulnerabilities [
8].
To overcome these problems and to improve the security of cloud services, providing a deep learning (DL)-based approach will be an adaptable solution [
9]. Currently, DL is employed in various organizations due to its excellent prediction power and pattern recognition [
10]. As DL utilizes a multi-mode neural network (NN) idea for simulating activities the same as the working model of the brain, it can be promoted and administered in a cloud-based infrastructure [
11]. Using DL methods to train massive databases in the cloud platform can perform the overall processes of computing highly efficiently with low latency. Major attacks like trust difficulties, malware identification, data privacy, and network intrusion could be monitored utilizing DL techniques in real time [
12]. Different from other standard security enhancers, DL approaches are learned and have intelligent abilities to offer disruptive outcomes in detecting attacks and improving cloud security in the constantly growing competitive world [
13].
This study introduces an Improved Sine Cosine Algorithm with a Deep Learning-Enabled Security Solution (ISCA-DLESS) technique for the CC environment. In the presented ISCA-DLESS technique, the selection of features takes place by the ISCA. Additionally, the chosen features are passed into a multiplicative long short-term memory (MLSTM) model for intrusion detection. To improve the anomaly detection rate of the MLSTM approach, the fruitfly optimization (FFO) algorithm can be utilized for the hyperparameter tuning process. The experimental result analysis of the ISCA-DLESS system has been tested on a benchmark database. In short, the key contribution of the paper is summarized as follows.
Automated anomaly detection using the ISCA-DLESS technique comprising ISCA-based FS, MLSTM-based detection, and FFO-based hyperparameter tuning for CC is presented. To the best of our knowledge, the ISCA-DLESS technique has never existed in the literature.
The ISCA-DLESS employs an ISCA-based FS technique with the integration of the oppositional-based learning (OBL) concept with SCA, which reduces the data dimensionality and enhances the detection performance.
Applying MLSTM-based detection, which has the capability of capturing sequential patterns, makes it appropriate to detect anomalies in time-series data.
Employing the FFO algorithm for hyperparameter tuning of the MLSTM model efficiently searches for optimal hyperparameter configurations.
The rest of the paper is organized as follows.
Section 2 provides the related works, and
Section 3 offers the proposed model. Then,
Section 4 gives the result analysis, and
Section 5 concludes the paper.
2. Related Works
Maheswari et al. [
14] developed an intrusion detection system (IDS) for web and CC platforms based on hybrid teacher learning-aided DRNN and cluster-based feature optimization. After feature extraction, the study used a Modified Manta-ray Foraging Optimization (MMFO) to select optimum features to detect further. A hybrid Teacher-Learning Enabled DRNN (TL-DRNN) is developed for the classification of web-cloud intrusion. In [
15], an Effective Optimum Security Solution for IDS (EOS-IDS) in a CC platform by using a hybrid DL technique was designed. Pre-processing was performed by the improved heap optimization (IHO) method. Next, the authors offer a chaotic red deer optimizer (CRDO) method for optimal feature selections. Later, a deep Kronecker NN (DKNN) is shown for cloud attack and classification and recognition of intrusion. Toldinas et al. [
16] devised an innovative technique for network IDS using multi-phrase DL image detection. The feature network was transformed into four-channel (Red, Green, Blue, and Alpha) images. Then, the images could be utilized for the classification to test and train the pretrained DL mechanism ResNet_50.
Srilatha and Thillaiarasu [
17] introduced a Network Intrusion Detection and Prevention Scheme (NIDPS) to prevent and detect a large number of network attacks. The effective IDPS was tested and implemented in a network environment using different ML approaches. In this study, an improved ID3 was developed for identifying abnormalities in network activities and classifying them. The authors in [
18] developed an IDSGT-DNN architecture to enhance security in cloud IDSs. The study incorporated defender and attacker systems for attack and normal data processing. In the DNN model, this technique could be implemented with IWA for the recognition of a better solution. Prabhakaran and Kulandasamy [
19] suggested a hybrid semantic DL (HSDL) model by incorporating the SVM, LSTM, and CNN frameworks. The semantic data existing in the network traffic were detected utilizing a semantic layer called a Word2Vec embedding layer. The proposed architecture categorized the intrusion existing in the text and its respective attack classes.
Ravi et al. [
20] presented a Cauchy GOA with DL for the Cloud-Enabled IDS (CGOA-DLCIDS) method. The proposed approach carried out feature subset selection by CGOA, which improved the recognition speed and decreased the feature subsets. Following this, the method exploited the attention-based LSTM (ALSTM) mechanism for accurate and automatic detection and classification of intrusion. Jisna et al. [
21] presented a cloud-based DL LSTM-IDS technique and assessed it to hybrid Stacked Contractive AE (SCAE) along with the SVM-IDS mechanism. DL techniques such as basic ML were constructed to simultaneously perform attack detection and classification.
Alghamdi and Bellaiche [
22] introduced an edge-cloud deep IDS technique in the Lambda framework for IoT security to overcome these problems. This approach minimized the time of the training stage by comparing it with standard ML methods and improved the accuracy of true positive-identified attacks. Moreover, the NN-layers’ main DL technique attained higher adaptability and performance compared with the standard ML technique. Alzubi et al. [
23] proposed an Effective Seeker Optimization algorithm along with an ML-assisted IDS (ESOML-IDS) approach for the FC and EC platforms. The ESOML-IDS algorithm mainly developed an innovative ESO-based FS technique for optimally selecting feature subsets to detect the existence of intrusions in the FC and EC platforms. Ali and Zolkipli’s study [
24] comprised a brief description of the IDS and presented to the reviewer some basic principles of the IDS task in CC, further developing a novel Fast Learning Network method for functions dependent upon intrusion detection.
Despite the availability of several anomaly and intrusion detection models, it remains a challenging problem. Due to the continuous deepening of the model, the number of parameters in DL models also increases quickly, which results in model overfitting. At the same time, different hyperparameters have a significant impact on the efficiency of the CNN model, particularly the learning rate. It is also needed to modify the learning rate parameter to obtain better performance. Therefore, in this study, we employed the FFO technique for the hyperparameter tuning of the MLSTM model.
3. The Proposed Model
In this manuscript, we have presented a novel ISCA-DLESS system for the effectual identification of anomalies and intrusions in the CC environment. The purpose of the ISCA-DLESS technique is to exploit the metaheuristic algorithms for FS and the hyperparameter tuning process. In the proposed ISCA-DLESS system, three main procedures are contained, such as ISCA-based FS, MLSTM-based classification, and FFO-based hyperparameter tuning.
Figure 1 depicts the entire flow of the ISCA-DLESS method.
3.1. Stage I: Feature Selection Using ISCA
To elect an optimal set of features, the ISCA was used. SCA is a recent metaheuristic optimization algorithm for resolving global optimization problems [
25]. Using SCA, a group of arbitrary populations of candidate performances with standard distribution could be produced to begin the optimizer technique. Then, the locations of candidate performances were upgraded by the following expression:
Now,
and
denote the position of
th solution candidate at the
and
iterations correspondingly.
show a uniform distribution of random numbers, and
shows the target point’s position at the
th parameter. The operator
is utilized to define the absolute value:
where
shows the uniformly distributed random value between 0 and 1
is a randomly generated vector that decides if the solution moves among the search space as well as a better solution. The vector
defines the distance of candidate performances to or in a better solution. The
third parameter describes arbitrary weighted over the better solution to define the micro search
and macro search
capabilities of this parameter. Due to this reason,
is highly useful to avoid early convergence. Evolution from cos to sin functions can be assisted by the
random vector. The range of the sin and cos function should be adaptively adjusted to achieve a proper balance between exploitation and exploration, as follows:
In Equation (4),
and
represent the present and maximal iteration, and
is a constant. The notion of the OBL method relies on an opposite number. Consider that
, whereas
, in which
represents the real number:
Also, this description could be stretched to high dimensions. The opposite number
for a number
was defined for
-dimensional search space as follows:
The concept of OBL was used for improving the micro search capability of the SCA.
In the ISCA, the initial population was randomly generated by the uniform distribution, and the fitness of possible solutions was evaluated. Then, the better candidate solution
was recognized. The OBL method attained a balance among micro as well as macro search capabilities by using the candidate solution. The linear adaptive operator was hybridized with the OBL model. This operator was capable of enhancing the convergence rate by fine-tuning the proper balance among the macro as well as micro search processes. This operator ensured the best exploration and exploitation as the number of generation’s problems of varying complexities. OBL was hybridized with linear adaptive (LA) operators to benchmark the function, as shown below:
In Equation (8), indicates the opposite solution candidate for the th parameter around the better solution at existing iteration. The fitness can be measured after defining the opposite location around a better solution.
The fitness function (FF) of the ISCA is assumed to be the classifier accuracy and FS counts. It minimizes the set dimensional of FSs and maximizes the classifier accuracy. So, the following FF can be employed for measuring separate solutions, as written in Equation (9).
whereas
indicates the classifier rate of errors employing the FSs.
is measured as the percentage of improper classifiers to the count of classifications made, stated as a value between zero and one.
mentions that the FS counts, and
implies the entire attribute counts from the new database.
is employed for controlling the impact of classifier quality and subset length.
3.2. Stage II: MLSTM-Based Classification
In this work, the MLSTM-based classification process could be employed. Classical ANN is constrained in its capability to obtain the sequential data required to handle sequence data in the input [
26]. RNN can be used to extract sequential data in the raw information while making predictions, for example, links among the words from the text. An evaluation of RNN future hidden layer (HL) is given in the following: consider the time stamp vector
, a future HL vector
an input
and an output
. Using the following equation, the HL vector is given:
For shows the activation function of the HL, and refers to the weight matrix.
The major problem of classical RNNs is that the backpropagation (BP) stage attenuates the loss function, which makes the number smaller, so it could not grant anything to learning. The gradient disappearing problem takes place once these layers gather a small gradient to enhance its weights and learning factors. The input, forget, and output gates are the gating mechanisms of the LSTM network. The forget gate will forbid or grant information and is estimated as follows:
In Equation (11), represents the weighted vector amongst the input and forget gate; shows the existing data; and indicates the weighted vectors amongst the forget gate and HL. If the accumulation of the variable is run with the activation function, the gate allows it to pass if the value is in the range of []. Otherwise, it removes the data.
Existing and prior outcomes are forwarded to the sigmoidal function that allows updating the cell state memory. At
time, the input vector was defined by the subsequent formula:
In Equation (12),
denotes the weighted vector of raw information, and
shows the weighted vector amongst current values and input gate. The cell layer introduces the existing cell layer, doubles the forgotten variable with the prior cell layer, and drops the variable if doubled by virtual
:
The second HL is defined by the output gate. At
timestamp, the resultant vector can be evaluated as follows:
Lastly, the hyperbolic activation function is represented as follows:
MLSTM is different from the typical LSTM frameworks that establish a gating mechanism named multiplicative connections. It can be planned to improve the learning and representation abilities of LSTM networks. It presents a novel gating mechanism named “update gate”, which is utilized for modulating the cell layer upgrade. The upgrade gate in an MLSTM is determined as the element-by-element product (Hadamard product) among the output of the forget gate as well as the input of the input gate. It implies that the upgrade gate controls several data in the preceding cell layer (determined by the forget gate), and a novel input (determined by the input gate) can be employed for updating the current cell layer. By utilizing element-by-element multiplication, the upgrade gate permits the LSTM to concentrate on particular sizes of the input and selectively upgrade the cell layer that is useful for sequence modelling tasks.
3.3. Stage III: Hyperparameter Tuning Using FFO Algorithm
To enhance the results of the MLSTM approach, the FFO system can be employed. The FFO algorithm is a new nature-inspired optimization approach [
27]. Due to its simple computation operation, FFO is easy to apply and comprehend like other metaheuristic approaches. This technique is an SI approach stimulated by the knowledge of the foraging behavioural patterns of FFs. The FF exceeds other species relating to olfaction and vision, which they mainly depend on—FFs can collect miscellaneous aerial smells, notwithstanding the food source being far away. In the scouring stage, the FF scouts and locates food sources near the swarm and evaluates the odour intensity for the food sources. Once the better position with the high odour intensity is identified, the swarm navigates toward it.
Undeniably, the procedure of effectual teamwork and communication between individual FFs is vital to accomplish the strategies of resolving optimization problems. The algorithm has four different stages:
Initialization;
Osphresis foraging;
Population evaluation;
Vision.
At first, the parameter is set—the maximal amount of iterations and size of populations. The solution, viz. FFs are randomly initialized as follows:
In Equation (16), denotes th solution, and th indicates the element’s location at the th solution. indicates a lower boundary, whereas shows an upper boundary, and denotes a uniformly distributed random integer.
Next, the location updating of the solution takes place according to the osphresis foraging stage. The solution is randomly distributed from the existing position as follows:
In Equation (17), denotes the new location, indicates the existing solution, , whereas refers to the iteration count. The smell and distance are calculated following the location update. Next, the calculation of odor intensity—the function of smell (FF)—for every solution follows. When the optimal FF of the solution is superior to the prior best, the novel location of the solution with the better FF values replaces each solution’s position afterwards. Or else, the older solution position will remain. This procedure signifies the vision foraging stage. The process continues until the ending condition is met and produces better outcomes.
Fitness optimal is a key feature of the FFO system. An encoded outcome can be deployed to assess the goodness solution of candidate outcomes. Presently, the accuracy value is the major condition deployed to design an FF.
In which and define the true and false positive values.
4. Results and Discussion
The proposed model was simulated using Python 3.6.5 tool (The source code will be made available once the funding project is complete). The proposed model was experimented on PC i5-8600k, GeForce 1050Ti 4 GB, 16 GB RAM, 250 GB SSD, and 1 TB HDD. In this section, the simulation validation of the ISCA-DLESS technique can be tested on the NSL-KDD database (available at
https://www.kaggle.com/datasets/hassan06/nslkdd (accessed on 13 July 2023)), containing 125,973 instances with five class labels, as represented in
Table 1. The ISCA-DLESS technique selected a total of 29 features from the available 42 features. The confusion matrices of the ISCA-DLESS algorithm on distinct databases are shown in
Figure 2. The simulation value implied that the ISCA-DLESS approach accurately recognized various classes proficiently.
In
Table 2 and
Figure 3, the overall outcome of the ISCA-DLESS system with 80:20 of the TR set/TS set is portrayed. The results suggested that the ISCA-DLESS technique reached enhanced performance in all classes.
With 80% of the TR set, the ISCA-DLESS technique attained average , , , , and MCC values of 99.69%, 89.64%, 90.60%, 89.99%, and 89.80%, respectively. Also, with 20% of the TS set, the ISCA-DLESS methodology accomplished average , , , , and MCC values of 99.67%, 89.97%, 93.14%, 91.41%, and 91.22% correspondingly.
In
Table 3 and
Figure 4, the overall outcome of the ISCA-DLESS methodology with 70:30 of the TR set/TS set is portrayed. The results suggested that the ISCA-DLESS system attained greater performance under all classes. With 70% of TR set, the ISCA-DLESS approach obtains average
,
,
,
, and MCC values of 99.40%, 71.13%, 75.67%, 73.01%, and 72.75% correspondingly. Then, with 30% of TS set, the ISCA-DLESS method gained average
,
,
,
, and MCC values of 99.45%, 72.34%, 76.13%, 73.98%, and 73.69%, respectively.
Figure 5 demonstrates the training accuracy
and
of the ISCA-DLESS system on 80:20 of the TR set/TS set. The
was defined by the assessment of the ISCA-DLESS technique on the TR dataset, whereas the
was calculated by estimating the performance on a separate testing dataset. The outcomes exhibited that
and
increased with an upsurge in epochs. As a result, the performance of the ISCA-DLESS system improved on the TR and TS datasets with a rise in the number of epochs.
In
Figure 6, the
and
curve of the ISCA-DLESS system on 80:20 of the TR set/TS set is depicted. The
defines the error among the predictive outcome and original values on the TR data. The
signifies the measure of the solution of the ISCA-DLESS technique on individual validation data. The results stated that the
and
tended to be lesser with rising epochs. It depicted the enhanced performance of the ISCA-DLESS technique and its ability to create an accurate classification. The reduced value of
and
established the greater performance of the ISCA-DLESS method in capturing patterns and relationships.
A comprehensive precision–recall (PR) analysis of the ISCA-DLESS system is displayed on 80:20 of the TR set/TS set in
Figure 7. The simulation value defined the ISCA-DLESS approach solution in greater PR values. Afterwards, it could be clear that the ISCA-DLESS algorithm attained superior performances of PR in five classes.
In
Figure 8, a ROC analysis of the ISCA-DLESS algorithm is defined on 80:20 of the TR set/TS set. The simulation value determined that the ISCA-DLESS approach led to maximal values of ROC. Next, the ISCA-DLESS system achieved greater outcomes in ROC in five classes.
In
Table 4, a detailed comparative result of the ISCA-DLESS methodology with recent systems is made [
28].
Figure 9 depicts the
and
outcomes of the ISCA-DLESS approach with other approaches. The obtained values inferred that the LKM-OFLS and PCA-NN models reached poor performance. At the same time, the K-means-OFLS, MLP, and FCM-OFLS models reported moderately improved results. Meanwhile, the IMFL-IDSCS technique attained considerable performance. Finally, the ISCA-DLESS technique showcased better performance, with a maximum
of 99.69% and
of 89.99%.
Figure 10 represents the
and
analysis of the ISCA-DLESS system with other methods. The simulation values implied that the LKM-OFLS and PCA-NN approaches attained worse outcomes. Then, the K-means-OFLS, MLP, and FCM-OFLS methods reported moderately enhanced performance. In the meantime, the IMFL-IDSCS system attained considerable outcomes. At last, the ISCA-DLESS system demonstrated optimum performance with maximal
of 89.64% and
of 90.60%. Therefore, the ISCA-DLESS technique could be utilized for enhanced cloud security.
5. Conclusions
In this study, we derived a novel ISCA-DLESS algorithm for effectual identification of anomalies and intrusions in the CC environment. The ISCA-DLESS technique applied the FS process with a hyperparameter-tuned classification model for anomaly detection. In the proposed ISCA-DLESS system, the three main procedures comprised ISCA-based FS, MLSTM-based classification, and FFO-based hyperparameter tuning. The application of the ISCA-based FS helped in reducing the high dimensionality problem and enhanced the classification performance. Moreover, the use of the FFO algorithm for the hyperparameter tuning of the MLSTM model aided in accomplishing an improved detection rate. The comprehensive analysis demonstrated an enhanced solution in the ISCA-DLESS technique with other recent approaches, with a maximum accuracy of 99.69%. Thus, the ISCA-DLESS technique could be applied for automated anomaly detection in the CC environment. In future, the proposed model could be extended to address cloud-specific threats, such as misconfigurations, data exposure, and supply chain attacks, in the context of anomaly detection. In addition, the proposed model could operate seamlessly across multiple cloud providers and hybrid cloud environments. This includes ensuring interoperability and consistent threat monitoring.