To ensure reliable results, all experiments were repeated 20 times. The average of these experiments provided a robust performance evaluation.
The evaluation process on the CICMalDroid-2020 dataset is consistent with the ClaMP dataset evaluation process.
5.4.2. Evaluation of LSTM-BO-SVM Model Efficiency
In order to evaluate the performance of the LSTM-BO-SVM model in malware detection in the SCC, the following models were selected in this paper for comparative analysis:
RNN-BO-SVM model: First, use an RNN for pre-classification, and then use the BO algorithm to optimize the SVM model; finally, use the optimized SVM model to complete the classification.
BiLSTM-BO-SVM model: First, use Bidirectional Long Short-Term Memory (BiLSTM) for pre-classification, and then use the BO algorithm to optimize the SVM model; finally, use the optimized SVM model to complete the final classification.
BO-SVM model: First, use the BO to optimize the SVM model, and then use the optimized SVM model to complete the final classification.
LSTM-SVM model: First, use the LSTM model for pre-classification, and then use the SVM model to complete the classification.
Basic LSTM model: Use the LSTM model alone to complete the final classification.
In this paper, to ensure the consistency and comparability of the experiments, the same values were used for the same parameters in each model.
In order to present the experimental results more comprehensively, this paper used the evaluation metrics introduced in
Section 5.3 to evaluate the six models mentioned above. In addition, in order to visualize the detection effect of the LSTM-BO-SVM model and to effectively reveal the correct and incorrect predictions of the model in each category, a heat map of the confusion matrix of the LSTM-BO-SVM model was drawn.
The confusion matrices are shown in
Figure 5 and
Figure 6 for the ClaMP dataset and the CICMalDroid-2020 dataset, respectively.
It is evident from
Figure 5 and
Figure 6 that the vast majority of samples are concentrated on the main diagonal of the confusion matrix. This phenomenon indicates that the LSTM-BO-SVM model can achieve a high degree of accuracy and discrimination when classifying samples. The sample points on the main diagonal represent correctly classified instances, while those on the off-diagonal represent misclassified instances. Therefore, a high proportion of samples in the main diagonal is intuitive evidence of the excellent detection performance of the model. The high accuracy of the LSTM-BO-SVM model on the two datasets further confirms its effectiveness in the SSC malware detection task.
Overall, the findings of the confusion matrix demonstrate that the LSTM-BO-SVM model achieves a good level of detection effectiveness.
- 2.
Accuracy Comparison
Figure 7 and
Figure 8 show the accuracy comparisons of the LSTM-BO-SVM with the six models mentioned above on the ClaMP and CICMalDroid-2020 datasets, respectively.
The experimental results reveal the significant advantage of the LSTM-BO-SVM model in terms of accuracy. The LSTM-BO-SVM model on the two datasets reached 98.2% and 98.6%, respectively, surpassing the other five models. For the ClaMP dataset, the LSTM-BO-SVM accuracy was higher than that of the five models, ranging from 2.9% to 9.9%. From this perspective, this shows that the LSTM-BO-SVM model possesses a better detection ability on this dataset. Regarding the CICMalDroid-2020 dataset, the LSTM-BO-SVM model also shows its superiority. The accuracy of the suggested model is greater than those five models, ranging from 2.2% to 14.7%.
The lower accuracy of the LSTM model may be attributed to its difficulty in adequately extracting features when the amount of data are insufficient. In addition, the LSTM-SVM model lacks an optimization algorithm to determine the hyperparameters of the SVM classifier, which may lead to its unsatisfactory detection effect. Although the RNN-BO-SVM model uses the BO algorithm for optimization, its accuracy is still lower than that of the LSTM-BO-SVM model. This may be due to the vanishing gradient problem faced by RNN when dealing with long-term dependencies, which affects its ability to capture long-distance dependencies in the data. In contrast, there is little difference in accuracy between the BiLSTM-BO-SVM model and the LSTM-BO-SVM model, which indicates that the bidirectional LSTM structure alleviates the gradient vanishing problem of RNN to some extent, thus improving the performance of the model.
- 3.
Precision Comparison
Figure 9 and
Figure 10 show the precision comparison of the six models on the ClaMP and CICMalDroid-2020 datasets, respectively.
As shown in
Figure 9, on the ClaMP dataset, the LSTM-BO-SVM model demonstrates superior precision at 98.7% for malware, ranging from 0.2% to 13.5% higher than the other models. For benign software, this model achieves 97.7% precision, which is 0.7% lower than the BiLSTM-BO-SVM model but still higher than the other models, ranging from 0.1% to 15.8%.
As shown in
Figure 10, on the CICMalDroid-2020 dataset, the LSTM-BO-SVM model demonstrates superior precision at 98.6% for malware, ranging from 0.6% to 15.2% higher than other models. For benign software, the precision is also 98.6%, which is lower than the LSTM-SVM and BiLSTM-BO-SVM models by 1.1% and 0.1%, respectively, but higher than the other models, which range from 0.1% to 16.8%.
The experimental findings demonstrate the high precision with which the LSTM-BO-SVM model can classify both malware and benign software. The BO-SVM model achieved good detection results on benign software, but its detection results on malware were worse than benign software. The precision of the LSTM model and the LSTM-SVM model varies greatly between benign and malware. In particular, on the ClaMP dataset, the LSTM-SVM model classifies malware with significantly higher precision than benign software, while on the CICMalDroid-2020 dataset, the situation is reversed. The LSTM model performs similarly to the LSTM-SVM model. In the experiments, the precision of the LSTM-SVM model obtained from each experiment fluctuated greatly, which may be partially attributed to the default hyperparameter settings of the SVM model. The RNN-BO-SVM model and the BiLSTM-BO-SVM model achieved better detection results, and the BiLSTM-BO-SVM model was better than the RNN-BO-SVM model, but still not as good as the LSTM-BO-SVM model.
- 4.
Recall Rate Comparison
Figure 11 and
Figure 12 exhibit the recall rate comparisons of the above six models on the ClaMP and the CICMalDroid-2020 datasets, respectively.
As shown in
Figure 11, on the ClaMP dataset, the LSTM-BO-SVM model has a recall rate of 97.9% for malware, which is 0.2% and 0.6% lower than the BO-SVM and BiLSTM-BO-SVM models, respectively, but still higher than the other models, with a range from 1.3% to 19.3%. The LSTM-BO-SVM model’s recall rate for benign software is 98.6%, which is 1% less than the LSTM-SVM model’s recall but greater than the other models’ recall rates, which range from 1.2% to 15.8%.
As shown in
Figure 12, on the CICMalDroid-2020 dataset, the LSTM-BO-SVM model’s recall rate for malware is 96.9%, which is 0.3% lower than the BiLSTM-BO-SVM model and 0.3% to 0.9% higher than the other models. For benign software, the recall rate of the LSTM-BO-SVM model is 99.4%, which is 0.3% to 8.5% higher than the other models.
According to the experimental findings, the LSTM-BO-SVM model demonstrates a high recall rate on both datasets, whether detecting benign software or malware. The BiLSTM-BO-SVM model comes second, while the LSTM-SVM model performs relatively mediocrely in terms of recall. The excellent recall performance of the LSTM-BO-SVM model is attributed to two key factors: the optimization of hyperparameters and the LSTM structure’s ability to efficiently process sequence data. However, the poor performance of the LSTM-SVM model may be due to the lack of effective hyperparameter optimization. Although the BiLSTM-BO-SVM model performs better, its performance is slightly lower than that of the LSTM-BO-SVM model, possibly due to the complexity of the model structure.
- 5.
F1-Score Comparison
Figure 13 and
Figure 14 exhibit the F1-Score comparisons of the above six models on the ClaMP and the CICMalDroid-2020 datasets, respectively.
As shown in
Figure 13, on the ClaMP dataset, the LSTM-BO-SVM model achieves an F1-score of 98.3% for malware, which is higher than the other models, ranging from 0.2% to 10.4%. For benign software, the F1-score of the LSTM-BO-SVM model is 98%, which is higher than other models, ranging from 0.1% to 10.7%.
As shown in
Figure 14, on the CICMalDroid-2020 dataset, the LSTM-BO-SVM model achieves an F1-score of 97.9% for malware, which is higher than the other models, ranging from 0.3% to 7.2%. For benign software, the F1-score of the LSTM-BO-SVM model is 99%, which is higher than other models by 0.1% to 3.2%.
According to the experimental findings, the LSTM-BO-SVM model has a high F1-score when detecting both benign software and malware. Although the BiLSTM-BO-SVM model performs slightly worse, it still outperforms the other models. In contrast, the LSTM-SVM model performs poorly in terms of the F1-score. In addition, the F1-score of benign software is higher than that of malware, which may be due to the fact that malware uses techniques such as code obfuscation in order to avoid detection, which makes it more difficult to detect.
- 6.
Training and Detection Time Comparison
This paper not only provides an evaluation of the detection performance of each model, but also records their training and detection times, which provides an important indicator of the computational efficiency of the models. This comparison not only reveals the differences in processing speed among the models, but also provides researchers with a reference for making trade-offs between time and performance when selecting models. This allows the selection of an optimal model that meets efficiency requirements and guarantees detection accuracy.
Table 5 displays the outcomes of the experiment.
According to the experimental findings, the training time of the LSTM-BO-SVM model is longer (136.94 s) on the ClaMP dataset, which may be related to the higher computational cost of the LSTM for processing data. On the other hand, the training time of the BO-SVM model is 34.68 s, which demonstrates its optimization efficiency. In terms of detection speed, the BO-SVM model is the fastest with a detection time of 0.22 s, while the BiLSTM-BO-SVM model has the longest detection time (2.89 s), which may be related to the complexity of its structure.
On the CICMalDroid-2020 dataset, the training time of all models was extended due to the higher dimensionality of this dataset, but the BO-SVM model still maintained the shortest training time (45.56 s). In terms of detection time, the BO-SVM model remained the fastest at 0.32 s, while the BiLSTM-BO-SVM model had the longest detection time (3.21 s).
In conclusion, the BO-SVM model shows higher efficiency in both training and detection speeds and may be more suitable for environments with higher real-time requirements. On the other hand, the LSTM-BO-SVM model may be more suitable for complex data analysis that requires high accuracy, although it takes longer to train.
- 7.
Comparison with Related Work
Many literature studies on SSC malware detection have been conducted using the ClaMP and CICMalDroid-2020 datasets. Mohamed et al. [
35] and Sawadogo et al. [
36] analyzed the CICMalDroid-2020 dataset using various machine learning models. Musikawan et al. [
37] proposed an effective improved deep neural network. Bhagwat et al. [
38] used the XGBoost model for detection. Kattamuri et al. [
39] proposed feature selection using the ACO algorithm followed by detection using DTs. Raju et al. [
40] used RFs for detection. Masum et al. [
2] analyzed SSC attacks using Quantum SVM (QSVM) and Quantum Neural Network (QNN) on a quantum platform.
This paper presents the accuracy and computation time of each study according to the dataset used, with the results ranked by accuracy as shown in
Table 6. Additionally, computation times are compared; “\” indicates that the computation time is not mentioned in the article.
According to
Table 6, the LSTM-BO-SVM model proposed in this paper achieved the highest accuracy on both datasets. Although the BO-SVM model is slightly less accurate, it offers a better computation time compared to other research methods listed in
Table 6. Some methods, while having higher accuracy, require a longer computation time. For example, the method proposed by Kattamuri et al. [
39] achieved an accuracy of 97.69% on the ClaMP dataset, with a computation time of 4365 s.