*4.4. Sensitivity Analysis*

Sensitivity analysis is an approach used to measure the influence of uncertainties of the input data variables. Analyzing the input data is very useful in extracting the patterns from the dataset. The Pearson's correlation coefficient was applied to find the correlation between the input features and the classes. Some features had significant relationships between the classes (normal and attacks) [70,71].

We selected the features that had a relationship >50% between the class. Figure 19 show the features that have a significant correlation with the classes variables in the CI-CAndMal2017 dataset. We considered four features with correlation >50%. The correlation coefficient results for the Drebin dataset are presented in Figure 20. It was observed that the Drebin dataset revealed a strong correlation between classes, while in the CICAndMal2017 dataset, they were <50%.

**Figure 19.** The correlation coefficient results using the CICAndMal2017 dataset.

**Figure 20.** The correlation coefficient for the Drebin dataset.

We applied the statistical metrics mean absolute error (MAE), MSE, RMSE, and R<sup>2</sup> to identify the prediction error between the target class and the predicted values. The prediction error of the machine learning algorithms is presented in Table 10. The SVM algorithm displayed fewer prediction errors, and the R<sup>2</sup> between the predicted values and the target values was 100% for the CICAndMal2017dataset. The KNN method showed fewer prediction errors (MSE = 0.1842), and the relationship between the predicted and target values was 33.35%.


**Table 10.** Statistical analysis of the machine learning algorithms' results using the CICAndMal2017 dataset.

Table 11 show the prediction potential of the SVM, KNN, and LDA methods. The prediction performance of the KNN method was R<sup>2</sup> = 33.35, achieving the best correlation between the predicted and target values in the Drebin dataset. Overall, the prediction results of the machine learning algorithms were satisfactory.

**Table 11.** Statistical analysis of the machine learning models using the Drebin dataset.


The prediction errors of the deep learning algorithms are summarized in Table 12. The LSTM model achieved lower prediction levels (MSE = 0.0054), and the correlation between the predicted and target values was 88.25% in the CICAndMal2017 dataset. In the Drebin dataset, the LSTM model showed lower prediction levels (MSE = 0.0059) and high correlation (R<sup>2</sup> = 97.39%). The prediction performance of LSTM was good in both datasets.


**Table 12.** Statistical analysis of the deep learning models.
