*4.3. Metrics*

There are many ways to evaluate the classification accuracy. The evaluation metrics used in this paper are accuracy rate, precision rate, recall rate and F1.

The above four metrics were calculated based on the confusion matrix shown in Table 3.


**Table3.**Confusionmatrix.

In this paper, our purpose is to detect electricity theft. Therefore, we define electricity theft samples as positive samples, and normal samples as negative samples. Furthermore, metrics true positive (TP), true negative (TN), false positive (FP) and false negative (FN) can be obtained from the confusion matrix. TP and TN indicate that the actual attribute of the sample is the same as the classified one, which means the classification result is accurate. FP indicates that the sample is actually negative, but the classified result is positive. FN indicates that the sample is actually positive, while the classified result is negative. The contrast between actual and classified results reflects the inaccuracy of the classification model.

Accuracy rate (AR) is the proportion of correctly classified samples in all samples. It is the most intuitive and commonly used criterion to measure the classification effect of the model. The formula is as follows:

$$AR = \frac{TP + TN}{TP + TN + FP + FN} \times 100\% \tag{14}$$

However, most samples in the training set are normal, and only a few of them committed electricity theft, which means that there are far more actual negative samples than actual positive samples. If the model classifies all actual positive samples into negative, the accuracy rate of the model will still be very high. Therefore, only using the AR criterion to evaluate the accuracy is not comprehensive.

Precision rate (PR) refers to the proportion of actual positive results in the classified positive samples, which indicates the classification accuracy in the classified positive samples. The formula is as follows:

$$PR = \frac{TP}{TP + FP} \times 100\% \tag{15}$$

Recall rate (RR) is defined as the proportion of classified positive results in the actual positive samples, which means the classification accuracy in the actual positive samples. The formula is as follows:

$$RR = \frac{TP}{TP + FN} \times 100\% \tag{16}$$

F\_score is the harmonic mean of the precision rate and the recall rate, so it is more comprehensive to evaluate the accuracy. The formula is as follows:

$$F\_{\text{-}score} = \frac{\left(\alpha^2 + 1\right) \times PR \times RR}{\alpha^2 \times (PR + RR)} \times 100\% \tag{17}$$

where α is a parameter greater than 0. In particular, when α equals one, the F\_score is expressed as F1, which is the most representative criterion in common use. The formula is as follows:

$$F1 = \frac{2 \times PR \times RR}{PR + RR} \times 100\% \tag{18}$$

All in all, we construct a confusion matrix and four indicators AR, PR, RR and F1 to comprehensively consider the accuracy of the classification model. In the next section, we will analyze different models in different datasets based on the proposed metrics.

#### **5. Results and Analysis**

In this Section, we present the experimental results and analysis. We compare the performances of the proposed model with those of other methods first. Then, we study the influences of the parameters on the results. Last, we discuss the effectiveness of the proposed data augmentation method.
