4.2.1. Logistic Regression

As a baseline for the evaluation process, and before using the nonlinear models, the LR classification model was used. The LR model with the five feature sets (Base, Robust Base, TCP, BRTCP, BTCP) was trained. Table 4 shows that the different feature sets resulted in similar accuracy rates. However, the accuracy rate measures how well the model predicts (i.e., TP + TN)with respect to all the predictions (i.e., TP + TN + FP + FN). Thus, given the unbalanced dataset (75% of the dataset are benign and 25% are malicious domains), ~90% accuracy is not necessarily a sufficient result for malware detection. For example, the *TCP* feature set has high accuracy and, at the same time, a very poor F1-Score, due to the high precision rate and poor recall rate. As the recall is low for all features sets, the accuracy rate is not a good measure in this domain. Consequently, we focused on the F1-score measure, the harmonic mean of the precision, and the recall measures.

### 4.2.2. Support Vector Machine (SVM)

Compared to the results of the LR model (Table 4), the results of the SVM model (Table 5) show a significant improvement in the recall and F1-score measures; e.g., for *Base*, the recall and the F1-score measures were both above 90%. It should be noted that the model that trained on the *Base* feature set resulted in a higher recall (and F1-score) compared to the one trained on the *Robust Base* feature set. Nonetheless, it is also noteworthy that the *Robust Base* feature set is robust to adversarial manipulation and uses less than half of the features provided in the training phase with the *Base* feature set. This discussion also applies to the *BRTCP* and *BTCP* feature sets. Another advantage of including the novel features is that models converge much faster. The results are based on the analysis of a non-manipulated dataset. As stated above, the *Base* feature set includes some non-robust features. Hence, an intelligent adversary can manipulate the values of these features, resulting in the wrong classification of malicious instances (to the extreme of 0% recall). However, an intelligent adversary will need to invest much more effort with a model that was trained using the *Robust Base* or *TCP* features since each was specifically chosen to avoid such manipulations. In order to find models that were also efficient on the non-manipulated dataset, the two sophisticated models were examined in the analysis, the ELM model Shi et al. [23] provided and the ANN model.


**Table 4.** Model performance—logistic Regression.


**Table 5.** Model performance—SVM.
