*3.3. Data Augmentation*

When using CNN to cope with the classification problem, it requires a large amount of data in various categories for training to obtain a more accurate classification result. Therefore, multiple methods are used to increase the image samples in the image classification problem [34,35]. In realistic datasets, since most users do not carry out electricity theft, there are less electricity theft data compared with the normal data. The imbalance of the datasets would a ffect the classification result easily, which could contribute to low accuracy or overfitting. Therefore, we propose a data augmentation method to address the imbalance problem.

The data augmentation is illustrated in Figure 7. Assuming the date of electricity theft is found on day *DT*, then [*DT* − *T*, *DT*] is an electricity theft sample. Due to the continuation of the theft behavior, electricity theft also occurs during the time [*DT* − *T* − 1, *DT* − 1]. Therefore, [*DT* − *T* − 1, *DT* − 1] is also an electricity theft sample. If the intercepted window slides *AG* times, one electricity user datum can be transformed into *AG* + 1 samples. So far, the electricity theft samples can be increased e ffectively through this method. It is noted that the value of *AG* needs to be chosen appropriately. If *AG* is too large, it will classify the normal data into the theft samples and a ffect the classification result.

**Figure 7.** Diagram of the data augmentation method.
