**3. Methodology**

In order to achieve the above idea, this paper will introduce the following two stages. In the first stage, switching events are detected and load signals are separated from the collected mixed data. Then the categories of separated loads are judged to form a load signature library. Then, the convolutional neural network is trained by the data in the library. After short-term model training, the classification model suitable for the load signature library is automatically formed, so that the separated load signals separated can be identified in real-time.

#### *3.1. Adaptive Construction of a Load Signature Library*

#### 3.1.1. Event Detection and Load Separation

The constructed signature library needs to include the waveform information of each load. However, in the non-intrusive mode, the collected electrical signal of a user is summed by multiple signals of different electrical appliances which are simultaneously in the open state. Thus, the mixed signal requires signal separation. The load switching events can be measured by the current intensity. In [20], if there is an obvious difference in the current intensity of a period compared with that of the previous one, the load switching event can be considered to be increasing.

Then, considering the difference of the load phase when the loads are switched into the user circuit, direct extraction of a switching load signal may result in an error regarding the information of the independent load. The method in [20] extracts the mixed signal on the sample points of the same voltage value in different electrical periods—before and after the switching events. Using the one-dimensional data processing method, signal separation can be realized. The load signal separation method is simple and effective. Therefore, this paper adopts the method in [20] to separate the load electrical signal quickly.

#### 3.1.2. Category Determination of Unknown Load Waveform

After obtaining the separated waveform of load *k*, the load waveforms require pre-classification. All the unknown loads separated from the mixed signal of independent users are clustered rapidly by load signatures. The process can not only avoid the load waveforms detected and separated repeatedly due to the actual multiple switching by users, but also narrow the range of load categories, so as to reduce the operation burden of subsequent load category determination.

In the initial stage of library construction, the number of load categories and operation modes in each user are unknown. Correspondingly, the number of different categories of waveforms is also unknown. However, for the same user, the waveform of the same load is relatively fixed, and the difference of signatures extracted from the waveform of the same load in each switching is small, while the difference between the waveform of different loads is relatively large. Thus, the unknown load is clustered quickly by the inherent signatures extracted from the separated waveforms, which can greatly reduce the repeatability of the extracted load waveform.

After load signature normalization, the cluster can be achieved by discriminant function, as shown in Formula (6).

$$D\_{k, \omega} = \|F\_k^\* - \delta\_{\omega}"\|\tag{6}$$

where *Fk*\* is the normalized signature and δω\* is the signature of the ω-th load stored in the current time signature library. *Dk*,<sup>ω</sup> is the discriminant distance of the signature between load *k* and category ω. If the minimum value of *Dk*,<sup>ω</sup> is less than δ, the type of load *k* has already been stored in the library. If the minimum value of *Dk*,<sup>ω</sup> is greater than δ, the new load is found and its waveform and signatures are recorded in the library.

The method can ensure that the load information in the library has no repeatability. At this time, the category of unknown load waveforms needs to be determined, so as to complete the load information in the signature library. In practice, there are abundant load brands and models among the actual users. Various waveforms cause a signature value fluctuation in different degrees. The method in [20] considers the fluctuated load signatures and calculates the probability of different load signatures. Then, using the Bayesian model and the multiple signatures, the category probability of loads belonging to different load categories is obtained. Eventually, by Formula (4), the most possible load category is selected to solve the problem of category determination for unknown load waveforms.

#### *3.2. Load Identification of the Convolutional Neural Network Based on the Signature Library*

After the first stage of the short-term adaptive library building process, the second stage of the sustainable load monitoring begins. Based on the data in the library, this paper transforms a one-dimensional load current signal into two-dimensional image data of periodic current, and the detailed process is divided into the four following steps, where the load current vector ˆ *Ik* in one period of steady state is taken as the basic data:


Then, the input sample of the convolutional neural network can be obtained. In this way, one-dimensional current waveforms can be transformed into two-dimensional image data. Meanwhile, the contour and shape of the waveform can be preserved completely, meaning that they can be directly input into the constructed convolutional neural network for identification.

The training process of the convolution neural network includes forward and backward propagation, in which forward propagation completes signature extraction and sample classification, and backward propagation completes classification error calculation and weight updating.

In forward propagation, signature extraction is realized by convolution and pooling. The upper output is taken as the input of the layer. After calculating in the convolution kernel, the output of the layer is obtained by further calculation with the non-linear activation function, as shown in Formula (7). 

$$X\_j^l = f(\sum\_{i \in \mathcal{M}\_j} X\_j^{l-1} \* \kappa\_{i\bar{j}}{}^l + b\_{\bar{j}}{}^l) \tag{7}$$

where *Xj<sup>l</sup>*−<sup>1</sup> represents the *j*-th signature graph of layer *l* − 1, κ*ijl* represents the convolutional kernel function of the *j*-th signature graph mapped from the (*l* − 1)-th to the *l*-th layer, *f*() is the activation function, *bjl* is the bias parameter, and \* represents the convolution. Pooling layer calculation is shown in Formula (8).

$$H\_{\mathbf{j}}^{\,I} = \mathbf{f}(w\_{\mathbf{j}}^{\,I} \text{sample}(X\_{\mathbf{j}}^{\,I-1}) + b\_{\mathbf{j}}^{\,I}) \tag{8}$$

where *Xjl* represents the *j*-th signature graph of layer *l*. *wjl* and *bjl* are the parameters of weight and bias, sample is pooling function and f is activation function. The pooling layer aims to map the signatures to a smaller range and reduce the dimension of the convolutional signature map. Signature information of load signals with relatively weak power is susceptible to noise and harmonic interference from the power grid. In this case, the maximum pooling will result in only extracting the affected signatures while ignoring the actual signature information of the signal, which in turn will impact the identification effect. Therefore, this effect is weakened by replacing the maximum pooling layer with the average pooling layer.

Softmax layers are usually used as output layers in multi-classification problems, which can output classification results directly in the form of probability vectors. The calculation formula is shown in Formula (9).

$$S\_{\bar{\jmath}} = \epsilon^{a\_{\bar{\jmath}}} / \sum\_{k=1}^{N} \epsilon^{a\_k} \tag{9}$$

where *j* represents the result of class *j* classification and *N* represents the category of load.

Back propagation depends on the error between the classification result of forward propagation and the given sample label. According to the chain rule, the weight and the error in each layer are updated, as shown in Formulas (10) and (11).

$$
\partial E\_{\text{total}} / \partial w = (\partial E\_{\text{total}} / \partial \text{out})(\partial \text{out} / \partial \text{net})(\partial \text{net} / \partial w) \tag{10}
$$

$$w^{\rm new} = w - \xi(\partial E\_{\rm total} / \partial w) \tag{11}$$

where ∂*Etotal*/∂*w* represents the partial derivative of the loss function *Etotal* to the parameter *w*, which is updated in each iteration, and ξ represents the learning rate of the convolutional neural network, which determines the magnitude of each adjustment.

The model is trained iteratively with the labeled data in the library, and the connection weights of each layer and the parameter matrices are fully adjusted until the data in the database are exhausted. When the model is trained completely, the load identification can be realized in real-time. The separated waveforms of independent loads will be successively acquired in real-time according to the load separation method in the Section 3.1.1. The waveform data will be converted into two-dimensional images and identified online by the convolution neural network as test data.

The implementation process of the steps in this chapter is shown in Figure 4.

**Figure 4.** Overall algorithm flow chart.

#### **4. Experiment and Analysis**

Actual user data are collected and used for validity verification in this paper. Figure 5 shows the schematic diagram of the experimental system, which is designed for non-intrusive data acquisition from the actual user. By the proposed method in Section 3, the collected data are processed to realize the e ffective load monitoring.

**Figure 5.** Schematic diagram of the experimental system.

The specific experimental parameters are as follows: the access voltage of the acquisition device is 220 V, and the sampling frequency is 10 kHz. Identification objects include a rice cooker (EC); an electric kettle (EK); a water heater (WH); a water dispenser (WD); a laptop computer (LA); a television (TV); air-conditioning systems A (AC-A), B (AC-B), C (AC-C); a vacuum cleaner (VC); a refrigerator (RE); a microwave oven (MO). Table 1 lists the detailed value of the related threshold involved in the experiment.

**Table 1.** Threshold parameters.


#### *4.1. E*ff*ectiveness Verification of Library Construction*

Figure 6 shows the current separation signal of the load in the experimental environment, and gives the template current for comparison. Blue lines denote the current separation signals which are the periodic current waveforms operating in a stable state separated by the method in Section 3.1.1. Red lines denote the standard currents which are obtained by only switching on a single electrical appliance in the experiment. The high coincidence between the separated signal and the standard current indicates that the load separation has high accuracy. Since the categories of separated load waveforms are unknown before the load labeling, a-l is used to represent the waveforms of loads in this paper.

**Figure 6.** (**<sup>a</sup>**–**l**) shows comparison diagram of the separated load current and the template current.

After e ffective load separation and clustering of independent load waveforms, the categories of loads are judged, and the library of independent users is constructed. Table 2 shows the categories, numbers and actual pre-classification of electrical appliances involved in this paper. The load number is used to represent the category discrimination label of the load.


**Table 2.** Electrical appliances classification label and pre-classification.

The probability distribution of the several unknown loads separated from load separation parts is shown in Figure 7. It shows the label probability of one load belonging to each category of electrical appliances in the experiment, and quantifies the possibility of the load category by the label probability. The label corresponding to the maximum probability is determined as the category label of the load. The threshold ν of unknown class is denoted by the red straight line. If the maximum label probability is still lower than the threshold line, the load will directly be placed in the "unknown load" category.

**Figure 7.** Labeling probability of load.

The probability of possible load labels under their pre-classification is shown in Figure 7. The probability outside their pre-classification is 0. In this paper, the label with the highest probability in the pre-classification of unknown loads is regarded as the result of its category. The waveforms and label probability results are shown in Figure 8. It compares the labels and their probabilities for unknown loads with their real labels. It can be seen that the waveforms are labeled correctly.

#### *4.2. E*ff*ectiveness Verification of Load Identification Based on the Convolutional Neural Network*

Before the real-time load identification, the convolutional neural network model is trained by the category-labeled data from the established library. After being obtained, the one-dimensional current data (as shown in Section 4.1) are transformed into two-dimensional image data through the data dimension conversion method described in Section 3.2. Because the maximum current of most common household inserts is limited to 10 A, the maximum operating current of most electrical appliances is usually less than or close to 10 A. Therefore, the vertical axis range of the coordinate axis

selected in this paper is from −11 A to +11 A, and the horizontal axis range is the sampling point of one current period when the load is in steady-state operation. The maximum current of a few electrical appliances exceeding 10 A applies the same operation, and the identification result is not affected.

**Figure 8.** Labeling results of individual appliances.

The labeled appliances in the library are re-numbered to represent the categories of the load in the process of convolutional neural network identification. The load re-numbering of the library established in Section 4.2 is shown in the Table 3.


**Table 3.** Label number of load image identification.

The numbers of convolutional layers and pooling layers are extremely important for the classification accuracy of the model. Figure 9 shows the classification accuracy under different numbers of convolutional and pooling layers. When the number of convolutional and pooling layers is less, the parameters are insufficient for the accurate classification of the sample. With an increasing number of layers, the effectiveness of the model's classification process is clearly improved. However, when the layers continue to increase, the increased training parameters raise the difficulty and time of model training. Limited by the current training methods, the increase in layers is more likely to make the classification results fall into the local optimum, leading to over-fitting and other problems. As shown in the figure, when the convolution and the pooling layer are both set as 3, the model is most effective, and the classification accuracy of the test sample can reach 96.73%. Therefore, considering the accuracy and training time of the model, the number of both the convolutional and pooling layers in the convolutional neural network model is determined as 3. In addition, under this optimal layer structure, the kernels 1, 2, and 3 are set as convolutional layers 1, 2, and 3 respectively, as shown in Table 4.

**Figure 9.** Classification accuracy of the model under different numbers of convolutional and pooling layers.

**Table 4.** Kernel sizes and numbers in this paper.


In general, 3 × 3 is the popular choice of kernel size in the convolutional neural network, which is determined by the empirical value in the experiment. Specifically, the kernel size of the convolutional kernel is set to be larger than 1 × 1 to enhance the receptive field. A kernel with an even size cannot ensure the same size of the feature map in the input and output. In the case of the same receptive field, the required parameters and computation are increased with the size expansion of the convolutional kernel. Thus, the kernel size of 3 × 3 is used for the first convolutional layer. In order to extract the output image feature of the previous convolutional layer further, the kernel size of the convolutional layer increases gradually, so the second convolutional layer size is set as 5 × 5. In addition, the third layer is the last convolutional layer, followed by the fully connected layer. The input of the fully connected layer needs to be one-dimensional data, which have the same dimension with the output of the third convolutional layer. However, the output dimension of the convolutional layer depends on the input data dimension and kernel size. Considering that the input data dimension of the third layer is 12 × 12, the kernel size of the third convolutional layer is set as 12 × 12. Besides, the size of 12 × 12 is set to reduce the parameter numbers of the fully connected layer significantly. As for kernel numbers, if there are less numbers in the convolutional layers, the extracted image features are not enough for identification, and the model struggles to achieve the desired performance. On the contrary, if the kernel number is set to be oversized, it will incur the problem of model parameters and training speed increasing significantly, as well serious over-fitting problems. Thus, the kernel numbers are empirical values obtained by repeated experiments.

After the determination of the model structure, the model parameters become significant factors in the training process. The parameters of learning rate and epoch are related to the convergence and training speed of the model. The learning rate μ represents the amount of weight updating in each time. If the set value of learning rate is too high, the loss function and model will struggle to converge. On the contrary, if the learning rate is too small, the updating of weights and the change to the model cost will be very small each time, resulting in significantly more epoch times. Epoch times are the training times of all sample data. Figure 10 shows the cost value of the convolutional neural network model training under different learning rates. It can be seen that the identification model tends to

converge and the convergence speed is faster at a learning rate of 0.05. When the number of epoch is 500 (i.e., epoch = 500), the loss values in the model are all below 0.003.

**Figure 10.** The convergence process of cost under different learning rates.

Figure 11 illustrates the parameters of the convergence process in the model. Six parameters are selected for display. The parameters *kernel\_c1* and *bias\_c1* are one of the weights and one of the biases in the second convolution layer, respectively. The parameters *kernel\_f1* and *bias\_f1* are one of the weights and one of the biases in the third convolution layer, respectively. Besides, *weight\_f1* is one of the weight parameters between the third convolution layer and the fully connected layer. The parameter of weight output is one of the weights between the fully connected layer and the softmax layer. The detailed values of the above six parameters under different epoch times are shown in Table 5.

**Figure 11.** Convergence process of the convolutional neural network model parameter.


**Table 5.** Parameter values under different epoch times.

It can be seen that the parameters show a gradual increasing trend as the epoch time increases. There is no significant change in the above parameters when the epoch value is greater than 500. It can

be considered that the model is trained to converge when the epoch reaches 500. Thus, this paper chooses the number of epoch as 500 to ensure the training efficiency of the model.

In order to display the model identification results, 12 separated current waveforms in Figure 6 are identified by the convolutional neural network model. Figure 12 shows the model input data after dimension conversion by the method proposed in Section 3.2.

**Figure 12.** Input image of 12 separated current waveforms.

In the process of identification using the convolutional neural network model, the signature maps extracted from the input data after the first convolutional operation are shown in Figure 13. It can be seen from the figure that the contour edge and other features of each load current waveform in Figure 12 are strengthened and extracted by the kernel in the convolutional layer.

**Figure 13.** Feature image after the first convolution operation.

After processing of pooling and activation further, the signature maps are shown in Figure 14. It reduces the feature dimensions of the images in Figure 13, and makes non-linear mapping on the feature image, so as to extract the advanced features for identification.

Table 6 gives the classification confusion matrix of the algorithm. The column of the confusion matrix represents the identification label of each load category, the row represents the real label, and the diagonal value represents the accuracy of the correct classification of the load. The identification accuracy increases with the background color deepening.

**Figure 14.** Feature image after pooling and non-linear operation.


**Table 6.** Load image classification confusion matrix.

In order to present the accuracy of results of the proposed method, the collected data of day1–day3 are identified by the proposed method, and the power consumption ratio of different loads is presented in Figure 15. As a comparison, smart sockets are installed to the monitoring appliances to obtain the true power consumption, which is shown in the right part of the figure. It can be seen that the total consumption difference between the calculated one and the true one is less than 0.3 kW. The consumption ratio of each load is nearly the same as the true value given by the socket, and the load has the correct label.

**Figure 15.** Comparison chart between identification load and actual power consumption results of day1–day3.

In addition, the algorithms proposed by Chao et al. [19], Srinivasan et al. [21], Ahmadi et al. [22] and the genetic algorithm are selected to compare against the proposed method. The above typical algorithm shows good performance on NILM and the feasibility of the proposed method is proved by a comparison of the algorithms. In reference [21], the neural network is also applied to NILM in [21] as proposed by Srinivasan. A typical neural network is used to verify the e ffectiveness of load identification through the model training of neural networks. Di fferent from our work, the neural networks are trained to extract harmonic signatures from the current for load identification. Then, Ahmadi et al. [22] propose a graph signal processing (GSP) approach for NILM. The graph is also formed by steady-state signatures of loads. It poses the load disaggregation problem as a single-channel blind source separation problem to perform low-complexity classification for load identification. It proves that the load can be identified by processing the signal of a load graph, but it is di fferent from the transformation method of the graph signal in the proposed method. Similarly, with our method, the convolutional neural network is also applied to NILM to form a three-step non-intrusive load monitoring system (TNILM) in Chao' work [19]. Due to the purpose, dimension, structure, input and output data of the convolutional neural network, the proposed algorithm outperforms that in Chao' work. In addition, the traditional intelligent algorithm is widely used in non-intrusive load identification. Genetic algorithm optimization is a conventional intelligence algorithm. Thus, as a supplement, genetic algorithm optimization is used as another method of load identification after the library construction to replace the convolutional neural network in this paper for comparison.

Figure 16 shows the performance comparison curves with the above mentioned method. The comparison of the algorithms' accuracy of load identification is shown in Figure 16a. The increasing load categories have less influence on the algorithm in this paper. The operational e fficiency curves are shown in Figure 16b. In the actual stage of load identification, the proposed method has higher operational e fficiency and a stable time of load identification. Represented by the violet line, the TNILM in Chao' work [19] includes the convolutional neural network and a multi-label classifier, so it has a relatively long operation time. Denoted by the blue line, the running time of the algorithm in Srinivasan' work [21] increases rapidly with a rising load number. Represented by the green and orange lines, respectively, the GSP algorithm and the genetic algorithm optimization have more stable operational efficiency, but are overall slower than the algorithm in this paper.

**Figure 16.** Performance comparison of algorithms. (**a**) Load identification accuracy comparison of di fferent algorithm under di fferent category numbers; (**b**) Operation e fficiency comparison of di fferent algorithm under di fferent load numbers.
