*4.1. Evaluation Indexes*

After the classification of unbalanced data, all test sets were divided into four cases: TN (true negative), TP (true positive), FP (false positive), and FN (false negative). These indicators constituted a confusion matrix, as shown in Table 2. Confusion matrix is a way to evaluate the model performance, where the row corresponds to the category to which the object actually belongs and the column represents the category predicted by the model.


FP is the first type of error and FN is the second type of error. Through confusion matrix, multiple evaluation indexes can be extended.

(1) Accuracy (*ACC*): *ACC* is the ratio of the number of correct classifications to the total number of samples. The higher the value of *ACC*, the better is the performance of the detection algorithm. Mathematically *ACC* is defined as:

$$\text{ACC} = \frac{TP + TN}{TP + FP + TN + FN}.\tag{12}$$

(2) True Positive Rate (*TPR*): *TPR* describes the sensitivity of the detection model to PD. The higher the value of *TPR*, the better is the performance of the detection algorithm. *TPR* is defined as:

$$TPR = \frac{TP}{TP + FN}.\tag{13}$$

(3) False Positive Rate (*FPR*): *FPR* refers to the proportion of data in ND, which actually belongs to ND, and is wrongly judged as PD by the detection algorithm. *FPR* is defined as:

$$FPR = \frac{FP}{FP + TN}.\tag{14}$$

(4) True Negative Rate (*TNR*): *TNR* describes the sensitivity of the detection model to ND, which is defined as:

$$T\text{NR} = \frac{TN}{TN + FP}.\tag{15}$$

(5) *G-mean* index: *G-mean* index is used for the evaluation of classifier performance [54]. Large *G-mean* index reveals better classification performance. The value of *G-mean* depends on the square root of the product of the accuracy of PD and ND. *G-mean* can reasonably evaluate the overall classification performance of unbalanced dataset, and it can be expressed as:

$$G - mean = \sqrt[n]{TPR \* TNR} \tag{16}$$

(6) Receiver operating characteristic (ROC) and area under the ROC curve (AUC): Receiver operator characteristic chive (ROC) was originally created to test the performance of a radar [55]. ROC curve describes the relationship between the relative growth of FPR and TPR in the confusion matrix. For values output by the binary classification model, the closer the ROC curve is to the point (0, 1), the better the classification performance. Area under the ROC curve (AUC), is an index to evaluate the performance of the detection algorithm in the ROC curve. The AUC value of 1 corresponds to an ideal detection algorithm.

#### *4.2. Unbalanced Processing of User-Side Data*

Random oversampling, SMOTE, and K-SMOTE were used to oversample the datasets, and the results are shown in Figure 5, of which the black circle represents the normal users, the red asterisk represents the electricity theft users, and the blue box represents the data generated after oversampling.

**Figure 5.** Schematic diagram of samples generated by different oversampling methods. (**a**) Raw data, (**b**) random oversampling, (**c**) synthetic minority oversampling technique, (**d**) improved SOMTE based on K-means. red: electricity theft users; Blue: data generated after oversampling; GREY: normal users.

In addition, Table 3 shows the repetition rate of artificial data and original data generated by several oversampling algorithms.


It can be observed from Figure 5 that a large amount of duplicated data were included in the result of random oversampling algorithm, and some data were never selected. From Table 3, the data repetition rate of random oversampling was 95.02%, which indicates that the oversampling e ffect was not ideal. The data repetition rate of SMOTE was 30.5%, as can be seen from Figure 5. Data generated by SMOTE were scattered with other data and introduced noise points. The problem of data overlap still existed and could not be ignored. K-SMOTE can generate data near the center, and use representative points to limit the boundaries of the generated data to avoid introducing noise. Data generated by K-SMOTE generally follows the original distribution. Further, as shown in Table 3, the data repetition rate was only 15.84%.

#### *4.3. Electricity Theft Detection Based on Improved RF*

#### 4.3.1. Determination of the Number of Decision Trees

The number of decision trees is relevant to the accuracy of the algorithm.

In this paper, 80% of the user data were set to form a training set and 20% to form a test set. The optimal number of decision trees can be determined by minimizing the OOB error. The relationship between the OOB error and the number of decision trees, *nTree*, is shown in Figure 6.

**Figure 6.** Value of OOB error varies with the number of decision trees.

It can be observed that when the decision tree number was larger than 368, the OBB error almost converged to a minimum level. If the number of decision trees was too small, the accuracy was low. On the other hand, too many decision trees did not improve the accuracy further and the algorithm burden was increasing. Therefore, the decision tree number was set to 368.

#### 4.3.2. Detection Results of RF

The above-mentioned electricity users' dataset processed by K-SMOTE and not processed by K-SMOTE oversampling were, respectively, detected by RF. In order to make the simulation results more convincing and avoid randomness, three independent tests were carried out for each detection. ACC values of test data are listed in Table 4, and ROC curves are shown in Figures 7 and 8. In Figures 7 and 8, the three di fferently colored curves of red, green and blue represent the ROC curve of three independent tests. According to these results, when K-SMOTE was not used for unbalanced data processing, the mean value of ACC in RF was 85.53%, while the average value of ACC in RF after K-SMOTE was 94.53%. In addition, it can be concluded that ROC curve detected by RF with K-SMOTE was obviously closer to (0.1) than ROC curve detected RF algorithm without K-SMOTE. That is, the area under the former ROC curve was larger.

Moreover, the AUC index of the former was obviously better than that of the latter, which shows that it is necessary to use K-SMOTE to deal with unbalanced data before the detection of electricity theft behaviors. In addition, the detection performance of RF was also ideal.


**Table 4.** Accuracy value of random forest.

**Figure 7.** Receiver operating characteristic curve detected by RF with K-SMOTE.

**Figure 8.** ROC curve detected RF algorithm without K-SMOTE.

#### 4.3.3. Comparison of Detection Performance of Different Algorithms

The electricity users' data processed by K-SMOTE were tested by BPN and SVM. Again, in order to make the simulation results more convincing, the same dataset was used and three independent tests were performed. The testing results are shown in Table 5 and Figures 9 and 10. In Figures 9 and 10, the three differently colored curves of red, green and blue represent the ROC curve of three independent tests.


**Table 5.** Accuracy value of three algorithms.

**Figure 9.** ROC curve detected by support vector machines with K-SMOTE.

1

**Figure 10.** ROC curve detected by back-propagation neural network with K-SMOTE.

It can be concluded from the above test results that:

(1) Without K-SMOTE, the ACC value and AUC of RF detection method were relatively low. However, with K-SMOTE, the ACC and AUC value of three detection methods were obviously improved, which was increased about 10%. This indicates that unbalanced datasets would affect the accuracy of detection algorithm, and K-SMOTE plays an effective role in improving machine learning accuracy.

(2) The electricity user data processed by K-SMOTE were tested by BPN and SVM. The ACC mean values of SVM and BPN were 71.26% and 84.87%, respectively, and the mean values of AUC in SVM and BPN were 0.7236 and 0.8716, respectively. These indexes were lower than the ACC and AUC of RF, which were 94.53% and 0.9513, respectively. Thus, the performance of RF was superior to SVM and BPN.
