4.2.4. ANN

The architecture of the neural network was as follows: one input layer, three hidden layers, and one output layer. Activation function: first layer—ReLU; first hidden layer— RELU; second hidden layer—LeakyReLU; third hidden layer—Sigmoid. Batch size: −150, with a learning rate of 0.01; solver: Adam with *β*1 = 0.9 and *β*2 = 0.999. Similar to the ELM results, the ANN results (Table 7) show high performance with all feature sets. For the "basic" feature sets (i.e., *Base* and *Robust Base*), the ELM models resulted in higher recall and F1-score. Nevertheless, the main focus was in the *BTCP* feature set and, more specifically, on the *BRTCP* variant, where the ANN models resulted in a higher recall and F1-score.

**Table 7.** Model performance—ANN.


Our analysis concludes with Figure 7, which depicts the F1-scores of the feature sets for all the models.

All the results provided in this article are based on clean data (i.e., with no adversarial manipulation). Naturally, given an adversarial environment where the attacker can manipulate the values of the features, models which are based on the *Robust Base* or *TCP* feature sets will dominate models that are trained using the *Base* dataset. Thus, by showing that the *Robust Base* feature set does not dramatically decrease the performance of the classifier using clean data and that adding the novel feature improves the model's performance as well as its robustness, it leads to the conclusion that malicious domain classifiers should use this feature set for robust malicious domain detection.
