**1. Introduction**

Non-technical losses (NTL) are one of the most major problems pertaining to the power grid, and have been for quite a long time. Unlike technical losses which are generally caused during generation and distribution, NTL are anomalies which include installation errors, faulty meters and electricity theft, etc. Referring to World Bank reports, NTL represents a significant part of the total power losses in both developing and developed nations [1]. A survey from the Northeast Group LLC shows that more than \$89.3 billion is lost every year worldwide due to NTL [2]. Besides financial losses, NTL also causes a decrease of stability and reliability of the power grid.

Presently, over 80% of the global population has access to electricity [1]. However, in total electricity consumption, industrial and large commercial customers contribute approximately 55% in Spain [3]. Similarly in China, the ratio of industrial customers is more than 65% [4]. Naturally, detecting NTL among industrial customers is more interesting than residential customers to electricity providers. Hence, this paper aims to detect NTL among industrial customers.

Conventional NTL detection methods depend on the in-field inspection, where both the costs and efficiency can not satisfy electricity providers. With the appearance of the smart grid comes a grea<sup>t</sup> deal of smart meter (SM) data and extra opportunities to solve NTL. Hence, a lot of data oriented methods have been proposed recently, due to the development of machine learning and ease of implementation [5]. Researchers adopt methods of different fields of knowledge with machine learning, such as anomaly detection, cybersecurity, etc. Generally, these approaches can be classified as supervised, unsupervised and ensemble methods. Through studying anomaly behaviour in electricity consumption, some of them can help to identify NTL indeed [6]. However, they only go<sup>t</sup> better effect on residential customers rather than industrial customers. The primary reasons are listed as follows:


Therefore, it is more difficult to detect NTL only depends on electricity consumption among industrial customers. The key challenges are reflected on the follows:


Therefore, this paper proposes a deep learning-based Semi-Supervised AutoEncoder (SSAE) model, and attempts to solve the above problems and achieve an ideal NTL recognition accuracy. In this work, we focus on three-phase industrial customers with a contracted power higher than 80 kVA. We design a deep semi-supervised neural network to learn advanced features from massive SM data includes voltage, current, active power, etc. The extracted features cover both principle of electricity measurement and consumption behavior through knowledge embedding. Our model has been trained, validated and tested using real in-field inspected data. Overall, the main contributions of this paper can be summarized as follows:

1. Based on SM data, this paper designs a domain knowledge embedded data model to enhance linear separability of normal samples and abnormal samples.


The remainder of the paper is organized as follows. Section 2 presents a brief overview of NTL detection. Section 3 presents the problem analysis and introduces knowledge embedded data model. Section 4 presents deep semi-supervised model. Experiments are conducted and the evaluation results are shown in Section 5. Finally, we provide conclusions and future work in Section 6.
