5.1.3. Metrics

Receiver Operating Characteristic (ROC) curve is a popular way to validate performance of classifier on imbalanced datasets and widely applied by [3,11,12,35]. It evaluates how fast the True Positive Rate (TPR) increases with the increase of the False Positive Rate (FPR). Commonly, AUC score, the area under the ROC curve, is used as primary metric. However, [36] mentioned that Precision-Recall Curve is the better choice rather than AUC score. [10] opted for the F1 score to evaluate the performance of algorithm. In order to compare with the above mentioned methods, we choose all of above metrics to completely evaluate the performance of the algorithm. Those metrics are defined as follows:

$$GeneralAccuracy = \frac{TP + TN}{TP + FP + TN + FN} \tag{18}$$

$$TPR = \frac{TP}{TP + FN} \tag{19}$$

$$FPR = \frac{FP}{TN + FP} \tag{20}$$

$$Precision = \frac{TP}{TP + FP} \tag{21}$$

$$Recall = \frac{TP}{TP + FN} \tag{22}$$

$$F1 = \frac{2 \times Precision \times Recall}{Precision + Recall} \tag{23}$$

where TP(True Positives) is the number of NTL samples was correctly detected, FP(False Positives) are the number of NTL samples be classified as normal, and TN(True Negatives) means the number of normal samples be classified correctly, FN(False Negatives) represent the number of normal samples be classified as NTL samples. It should be noted that the decision threshold in *GeneralAccuracy*, *Precision*, *Recall*, and *F*1 is 0.5.
