**4. Evaluation**

### *4.1. Evaluation Metrics*

Accuracy, recall (detection rate) and precision are common metrics normally used when evaluating a classifier. Accuracy, which indicates the percentage of correctly classified data items, is a poor metric for a task where the dataset is largely skewed in favor of normal samples. Similarly, a high detection rate implies fewer chances of a model missing alarming anomalies, while a high precision highlights the ability of a model to classify normal data. We consider detection rate (DR) to be more significant in our case than precision; for example, if a model predicts a normal sample as an attack, it is easier to correct the prediction of the model through domain knowledge, since there are few attack samples. However, if the model fails to detect an attack, it is difficult to find the attack in such a huge dataset. Nonetheless, we consider all three metrics; when comparing our approach with baseline models, we only considered DR since it is more significant in our situation.
