**2. Implementation Principle**

#### *2.1. Principle and Implementation Structure of Non-Intrusive Load Monitoring*

Non-intrusive load monitoring collects a user's power consumption data at the residential power entrance outside the user. All power consumption information of the user is concentrated in the obtained mixed signal data, so the load identification is very important to acquire the detailed power consumption of each load. The signal of each switching load should be separated from the collected mixed signal, and then the load identification is implemented supported by the information in the signature library.

The signature library is the premise of e ffective identification. In the actual monitoring process, it is di fficult to ge<sup>t</sup> the independent information data of the load directly, because it is impossible to make the load run independently without disturbing the user. Furthermore, there are various brands of loads and di fferent circuit environments, which may cause a variety of changes on load waveforms, as shown in Figure 1. It is impossible to realize load identification by a priori signature library containing all varied waveforms of di fferent users. Moreover, the classification of separated loads will seriously a ffect monitoring e fficiency. Therefore, the construction method of a library should be universal for the users with di fferent loads. At the same time, complex pre-collection and intervention should be avoided to make the process of NILM automatic and e ffective.

**Figure 1.** Current waveforms of di fferent categories of electrical appliances with the same load.

The identification accuracy of the unsupervised-based algorithm is unsatisfactory, while the supervised-based depends on the labeled training data. In the existing study of NILM, as the prior knowledge (which is used as the training data) of the load is required in advance, the methods are feasible for a specific user and lack universality for the changes of load data in di fferent users. Thus, a method that can adapt to these changes is required.

Sampling with a high frequency, waveform-based identification method is recommended due to the high accuracy and processing e fficiency. Complete waveform and signature information can be obtained in the high-frequency acquisition mode. However, in actual acquisition processes, large amount of data is di fficult to store, and the noise and harmonics on the grid side have an impact on the load waveform and signature information, which degenerates the accuracy of identification using one-dimensional data. Considering the limitation of communication capability and economic cost, the identification process is suitable for online execution and, thus, the identification accuracy depends on the real-time processing.

In response to the abovementioned problems in NILM, this paper studies a method that includes two stages: The first step is to analyze the collected mixed signals, classify and determine unknown loads without supervision, and build the load signature library specifically for users adaptively during the short-term process; the second step is to train the convolutional neural network model based on the data of users' libraries constructed in the first step to form the identification model suitable for each independent user, so as to realize load identification. The implementation structure of this paper is shown in Figure 2.

**Figure 2.** Non-intrusive load identification implementation structure.

#### *2.2. Principle of Electrical Signal Separation*

The collected current signal is summed by the current signals on each load branch in operation. It is assumed that the collected current data are denoted by *I*(*t*). When *M* loads operate simultaneously, the mixed current in non-intrusive mode can be shown as Formula (1).

$$I(t) = \sum\_{i=1}^{M} I\_i(t) + n(t) \tag{1}$$

where *Ii*(*t*) is the current signal when the load *i* operates separately at the time *t*; *n*(*t*) is the noise in the circuit of the user.

The waveform separation is carried out for the electric data which are collected from the power entrance of the independent user. The collected data include the signal of loads of only one user. Two runs of the load switching action cannot be completed simultaneously as there would be a certain time difference between them; thus, the current separation model can be established as the sum of the current signals at two different moments (i.e., before and after the moment of load switching). At the last moment, the load current in the circuit is recorded as *I*(*t*). When the load *k* is switched, the mixed current *Inew*(*t*) is superimposed by *I*(*t*) and the current *Ik*(*t*) when the load *k* runs alone, as shown in Formula (2).

$$I\_{new}(t) = I(t) + I\_k(t) \tag{2}$$

where *Ik*(*t*) represents the current when the latest switching load *k* runs independently. *I*(*t*) can be treated as if there are only two signals in the circuit at this time. They are the circuit signal *I*(*t*) before the moment of load switching, and the circuit signal *Inew*(*t*) after the moment of load switching.

#### *2.3. Construction Principle of Load Signature Library*

After load separation, the independent waveforms *Ik*(*t*) and *Uk*(*t*) can be obtained for load *k*, but the categories of the waveforms are unknown. The information in the library should include waveform value and the corresponding category label. It is inevitable to judge the load categories of load *k*, and then a dynamic load signature library can be constructed. There are no prior available data if the library is constructed automatically without interference of users. Moreover, for loads of the same type, the change of model, brand, and even operation environment will cause the variation in load waveform. The load waveform has an infinite variety of forms theoretically, so the problem of load label attachment is attributed to the classification with infinite classes under unsupervised conditions. However, when the label category is infinite, it is very difficult to classify.

Although the waveforms of appliances are variable, the common appliances categories are enumerable. If the loads have the same category, the changing waveforms will share some common signatures. For an independent user, the physical model and waveform of the same load type are fixed, and the operating environment and habits of loads are relatively stable, so the infinite category classification problem is transformed into a limited category problem. It is suitable for each independent user to construct a signature library, which is focused on in this paper. To ensure that the method of library construction is universal for users, the signatures extracted from the unknown load are used as the criteria for load category labeling, so as to realize the classification. The load labeling problem becomes a supervised classification.

The independent load waveforms and signatures of unknown categories in users can be obtained through the proposed method in Section 2.2. The load classifier determines the category of unknown loads by the separated waveform and extracted signatures only. Thus, without prior knowledge, the problem of category classification is transformed into the posteriori knowledge-solving problem, in which the load category under the condition that the load waveform and signatures are known needs to be determined. Here, the Bayesian classification model is a suitable method. On the premise of known sample characteristics, the Bayesian classification model can quantify the probability of samples from each category, and then select the category with the largest posterior probability as the classification result. In addition, due to the variety of independent load waveforms and signatures extracted from different users, the generalization is very important for the classification model. Considering the limited load categories and quantities in one user, the data scale can be limited. With strong generalization, the Bayesian model performs well for limited scale data. Therefore, the Bayesian classification model is established for loads classification in [20]. As for the posteriori knowledge, the signatures are used to calculate the prior probability of the load category. It is assumed that *Fk*is the signature calculated

from the signal *Uk* and *Ik*. The probability of load *k* that belongs to the category ω*n* can be obtained by Formula (3).

$$P(\omega\_n|F\_k) = P(F\_k|\omega\_n)P(\omega\_n)/P(F\_k)n = 1,2,\ldots,N\tag{3}$$

where *N* is the category number of the user. Formula (3) shows that the prior probability *P*(<sup>ω</sup>*n*) is converted to the posterior probability *P*(<sup>ω</sup>*n* | *Fk*) by the obtained stable signature vector *Fk*, that is, when the class of load *k* under the known condition of *Fk* belongs to the probability of ω*i*, the most probable category is the label of load *k*, as shown in Formula (4).

$$L\_k = \text{argmax} \mathbf{x} P(\omega\_n | F\_k) \tag{4}$$

where *Lk* represents the classification result, which is the category label of load *k*. In this way, unknown category loads separated in succession can be labeled. Then, the waveforms, signatures and categories can be recorded in the library to complete the adaptive library construction of an independent user.

#### *2.4. Convolutional Neural Network Identification Model*

After forming the user's library, the independent load waveform of a user is continuously identified based on the data in the library, so as to determine the user's load operation status at the current moment. Stored in the library, the *N* kind of loads and the corresponding record information are expressed as follows.

$$\begin{cases} \quad \{\hat{\Omega}\_{1\prime} \quad \hat{I}\_{1\prime} \quad L\_1\} \\ \quad \{\hat{\Omega}\_{2\prime} \quad \hat{I}\_{2\prime} \quad L\_2\} \\ \quad \vdots & \vdots & \vdots \\ \quad \{\hat{\Omega}\_{N\prime} \quad I\_{N\prime} \quad L\_N\} \end{cases} \tag{5}$$

where the information includes the separated signals of voltage *U*ˆ , current ˆ*I*, and label *L*.

The library has been constructed completely at this time. Real-time identification belongs to a supervised classification problem. The data in the library are suitable for the unique independent users, and the load to be classified is the load in the library. Thus, as the training data, the information in the library enables the classification model for supervised training, which can greatly reduce the invalid sample data for training, build a useful classification model specifically for the independent user, and cut down the impact of over-fitting on the identification results.

Due to the existence of noise and harmonics on the power grid side, extracted from the one-dimensional data of the load waveform, the signature information fluctuates greatly, subsequently affecting the identification process. However, as the load circuit is composed of non-linear components such as diodes, thyristors, transistors, motors and so on, it will also cause the distortion of load current waveform. The harmonic and distortion of the current influenced by the non-linear components in the circuit can also be regarded as the typical signatures of load identification [6–9,20,21]. Direct filtering may destroy the original load signatures useful for identification. Therefore, it is di fficult to determine whether the distortion in the current is caused by noise or harmonics accurately and filter it directly. Considering that the waveform of the same independent load is not only relatively stable in a steady state under the high frequency acquisition mode, but is less disturbed by noise and grid side harmonics, the one-dimensional waveform data of a load current stored in the dynamic library are transformed into two-dimensional image data. The image can keep the basic shape and outline of the original current waveform. The amplitude values of current waveform are transformed into the pixel values in the image. The waveform distortion caused by noise or harmonic alters only the position of the pixel points in the original waveform locally and slightly (i.e., changes a few local pixel values of the image), rather than the shape and outline of the current waveform. In the image recognition process, the two-dimensional data can be recognized mainly based on the image features, including contour, shape, contrast and relative position of the marked features. Therefore, the dimensional converting of the waveform for recognition will reduce the influence of noise or harmonics on the recognition results.

Since the one-dimensional current data are transformed into two-dimensional image data, the load identification problem is transformed into the identification problem of the two-dimensional image data. Convolutional neural networks have outstanding performance on two-dimensional image data processing. Images can be input into the network directly to avoid the complexity of data reconstruction in the signature extraction and classification processes. Convolutional neural networks automatically extract multiple image signatures through multiple convolution kernels, approximate complex mapping functions through multi-layer non-linear transformation, and then classify current waveform images to realize real-time load identification. Besides, the distortion of the current waveform caused by noise or harmonic only alters the position of the pixels in the original waveform, resulting in the local translation, rotation and scaling of the waveform position. However, these influences can be weakened by the convolutional neural network with the characteristic of translation invariance. (Invariance means that when the input data are changed locally and slightly, most of the outputs after the pooling function will be not changed. It is extremely significant when we focus on whether a feature appears in two-dimensional data rather than at its location.) As an important layer of the convolutional neural network, the function of the pooling layer is that the output of the network at a certain location in image is replaced by the statistical characteristic output of the pixel value in the surrounding area of that location. Contrastingly, in the constructed convolutional neural network, the average value of the surrounding pixel is extracted by the pooling layer, which weakens the influence of the pixel points a ffected by the noise and harmonics in the image. Furthermore, multiple pooling layers in the convolutional neural network gradually reduce the influence caused by noise and harmonics. Moreover, in convolution operation, parameter sharing ensures that it is unnecessary to learn a set of parameters for each position in the two-dimensional data, which reduces the computational complexity, training time and storage space of the parameters.

In this paper, the one-dimensional data in a labeled library are converted into the two-dimensional waveform image as a training set, and the convolutional neural network model is trained by supervised learning. The test data consist of the two-dimensional waveform image converted from the one-dimensional data of the separated current signal which is obtained from the mixed signal in real-time. Finally, the load can be identified online by the convolutional neural network, of which the model structure includes convolutional, pooling, a fully connected layer and non-linear function, as shown in Figure 3.

**Figure 3.** Convolutional neural network model structure.
