*4.3. Model Performance*

The highly efficient performance of machine learning and deep learning models guarantees the detection of Android malicious applications. The algorithms for intrusion detection were tested using two standard malware mobile datasets. The Drebin dataset contained 10,525 Android applications, and the CICAndMal2017 dataset contained 676 injections of various attack and normal packets.

### 4.3.1. Performance of the Machine Learning Models

In this work, the SVM, KNN, and LDA models were applied to identify Android malicious packets. The SVM algorithm achieved maximum accuracy (100%) with respect to all the performance measurements in the CICAndMal2017 dataset (Table 5). However, it achieved lower accuracy (80.71%) with the Drebin dataset.



The SVM method showed the efficiency performance with the CICAndMal2017 dataset and satisfying results in the Drebin dataset. The confusion metrics of the SVM method are presented in Figure 10. In the CICAndMal2017 dataset, the percentage of the normal data classified as true negative was 45.81%, whereas the true positive represented 54.19% and were classified as malware attacks. Furthermore, the false positive and false negative data were 0, indicating that the SVM method successfully detected malicious attacks in the Drebin dataset. The confusion metrics of the SVM approach applied on the Drebin dataset were as follows: 61.56% were classified as abnormal applications, 19.15% true negatives were classified as normal applications, whereas the true positive and false negatives were 18.62% and 0.67%, respectively. We conclude that the performance of the SVM method is good since the false positive is low.

**Figure 10.** The confusion metrics of the SVM method using the (**a**) CICAndMal2017 and (**b**) Drebin datasets.

Table 6 summarize the performance of the KNN method in the detection of malware attacks in both datasets. We considered the scope of the KNN method with (k = 5). In the CICAndMal2017 dataset, the KNN method achieved high accuracy (90%), contrary to the Drebin dataset (81.57%).


**Table 6.** Results of KNN algorithm.

Figure 11 show the confusion metrics for the KNN method. In the CICAndMal2017 dataset, 40.89% of the dataset was classified as true negative (normal applications), 49.26% as malware, and 4.93% as false positives (normal data classified as attacks). In the Drebin dataset, the KNN method classified 61.87% of the dataset as true positives (attacks), 19.71% as true negatives (normal), and the false positives were <0.80%. Overall, the KNN method achieved higher accuracy in the CICAndMal2017 dataset than in the Drebin dataset.

**Figure 11.** The confusion metrics of the KNN method using the (**a**) CICAndMal2017 and (**b**) Drebin datasets.

The results of the LDA method are presented in Table 7. Overall, the results were not adequate due to the complexity of the network dataset. The nonlinear algorithms are not appropriate for the analysis of network datasets. The accuracy of LDA was 45.32% in the CICAndMal201 dataset, a percentage that reached 81% in the case of the Drebin dataset.


**Table 7.** Results of the LDA method.

The confusion metrics of the LDA method are presented in Figure 12. The percentage of true positives was high (49%), whereas that of true negatives (classified as normal applications) was low (44.83%) in the CICAndMal2017 dataset. The percentage of false positives was high (53.69%), showing that the LDA model is not appropriate for this dataset. In the Drebin dataset, the confusion metrics showed that 19.15% were true negatives and 1.02% false positives, classifying normal applications as malware. Overall, the LDA method had good performance with the Drebin dataset.

**Figure 12.** The confusion metrics for the (**a**) CICAndMal2017 and (**b**) Drebin datasets.

4.3.2. Performance of the Deep Learning Models

In this section, the results of the deep learning algorithms, namely LSTM, CNN-LSTM, and AE, are presented. The dataset was divided into 70% training and 30% test data. Table 8 show the results of the LSTM, CNN-LSTM, and AE models. The performance of the CNN-LSTM model achieved high accuracy (95.07%) compared with the LSTM and AE models in the CICAndMal2017 dataset.

**Table 8.** Results of the deep learning algorithms in the CICAndMal2017 dataset.


Figure 13 show the accuracy performance of the LSTM, CNN-LSTM, and AE algorithms using the CICAndMal2017 dataset. The performance plots show that the CNN-LSTM model achieved an accuracy of 99.9% in the training phase, and in the validation

phase, the initial 75% accuracy reached 95.07%. The LSTM model achieved good performance in the training phase (99%) and the validation phase it reached 94.58%.

**Figure 13.** Performance of the deep learning models with the CICAndMal2017 dataset. (**a**) LSTM. (**b**) CNN-LSTM.

The binary\_crossentropy method was used to calculate the accuracy loss in the training and testing phases. Figure 14 show the validation accuracy of the deep learning models. The accuracy loss of the LSTM model in the validation phase changed from 0.5 to 0.2, while in the case of the CNN-LSTM model, this changed from 0.6 to 0.2.

**Figure 14.** Accuracy loss of the deep learning models in the CICAndMal2017 dataset. (**a**) LSTM. (**b**) CNN-LSTM.

Table 9 show the results of the LSTM, CNN-LSTM, and AE models using the Drebin dataset. The LSTM model achieved high accuracy (99.40%). Furthermore, the CNN-LSTM model showed high accuracy of 97.20%, and the performance of the AE model was satisfying.

**Table 9.** Results of the deep learning models using the Drebin dataset.


Figure 15 show the accuracy performance of the deep learning models. The validation accuracy of the LSTM model started from 97% and reached 99.40% with 20 Epochs. The LSTM model in the training phase achieved an accuracy of 100%. The performance of the CNN-LSTM model was 97.20% in the validation phase.

**Figure 15.** Performance of the deep learning models in the CICAndMal2017 dataset. (**a**) LSTM. (**b**) CNN-LSTM.

Figure 16 show the validation loss of the deep learning models. In the LSTM model, the validation loss changed from 0.10 to 0.7, whereas for the CNN-LSTM model, it changed from 0.7 to 0.1 with 20 Epoch.

**Figure 16.** Accuracy loss of the deep learning models in the CICAndMal2017 dataset. (**a**) LSTM. (**b**) CNN-LSTM.

The accuracy performance of the AE model using the CICAndMal2017 and Drebin datasets is presented in Figure 17. The performance of AE was not satisfying, with the accuracy in the training phase being 79% and in the validation phase 75.79% for the CICAndMal2017 dataset. For the Drebin dataset, the accuracy in the validation phase was 56%. The accuracy percentage of the LSTM and CNN-LSTM models outperformed the AE model.

**Figure 17.** Accuracy of the AE model in the (**a**) CICAndMal2017 and (**b**) Drebin datasets.

Figure 18 display the accuracy loss of the AE model in both datasets. The accuracy loss was high (from 0.70 to 0.55) for the CICAndMal2017 dataset. Furthermore, the validation loss changed from 0.9 to 0.4 in the case of the Drebin dataset. Overall, the validation loss of the AE model was high; therefore, the AE model's performance is not appropriate for the detection of Android malicious attacks.

**Figure 18.** Accuracy loss of the autoencoder model in the (**a**) CICAndMal2017 and (**b**) Drebin datasets.
