1. Introduction
Load disaggregation technology is a key technology in smart grids [
1]. Traditional load monitoring adopts intrusive methods, which are able to obtain accurate and reliable data with low data noise [
2], but they are difficult to be accepted by users due to their high implementation costs. Non-intrusive methods can provide detailed information for residents in time, and have the advantages of low cost and easy implementation. According to this technology, the power consumption behaviors of users can be analyzed, and users can be guided toward a reasonable consumption of electricity and hence reduce their power consumption costs. With the continuous development of power demand side management [
3], big data analysis, and other technologies, non-intrusive load disaggregation is attracting more attention.
The microgrid is an important manifestations of the smart grid. With the development of clean energy, such as solar and wind energy, and energy internet technology, the microgrid has emerged. It is a small power system with distributed power sources, which can realize a highly reliable supply of multiple energy sources and improve the quality of the power supply [
4]. As NILM technology becomes more mature, the intelligent dispatching of the microgrid can be realized through automation in the future to improve the effective utilization of power resources, ensure the stable economic operation of a power system, and avoid the unnecessary waste of power resources. Therefore, NILM technology is important.
The concept of non-intrusive load monitoring was firstly proposed by Hart [
5]. It mainly uses non-intrusive load disaggregation (NILD). In this method, the total power consumption is disaggregated to each individual electrical appliance. Hart proposed the concept of “load characteristics”, which he defined as the information change of the electrical power of an appliance in operation. Hart further used the steady-state load characteristics [
6] to design a simple NILD system to decompose power. However, effective features extracted by the algorithm were limited, and large disaggregation errors occurred easily.
At present, combinatorial optimization (CO) methods and pattern recognition algorithms are the main algorithms for realizing non-intrusive load disaggregation. Among them, NILD based on a combinatorial optimization algorithm [
7] determines the power consumption value of each appliance by investigating load characteristics as well as error comparisons between power states of combined appliances and the total power. Chang [
8] and Lin [
9] used the Particle Swarm Optimization (PSO) algorithm to solve the disaggregation problem based on the steady state current on a few electrical appliances, but the disaggregation result error was large. In order to solve the NILD problem, Piga [
10] proposed a sparse optimization method to improve the disaggregation accuracy. The combinatorial optimization method is essentially an NP-hard problem, so its efficiency is a challenge. In addition, the optimization theory could be only used to analyze discrete states of electrical appliances, so it is difficult to model loads with large load fluctuations.
With the development of machine learning, pattern recognition algorithms have been applied to NILD. Batra [
11] solved the depolymerization problem of low-power appliances using K-nearest neighbor regression (KNN) [
12], but the algorithm could not solve the problem of the large power difference between appliances. Kolter [
13] used the sparse coding algorithm to learn the power consumption models of each electrical appliance and used these models to predict the power of each electrical appliance. Johnson [
14] used unsupervised learning for NILD, and this model had a high training speed. However, compared to the supervised algorithms, the ability of Johnson’s method’s to identify complex continuous state loads was limited because of the lack of prior knowledge. Kim [
15] used the multi-factor hidden Markov algorithm to disaggregate the continuous value of each electrical appliance according to the given total power data. Some excellent machine learning algorithms, such as the support vector machine [
16] and the adaboost algorithm [
17], achieved certain processes, but these methods shared the same problem: a large number of load characteristics were required for identification, a requirement that was often difficult to meet in practice. Different from traditional methods, the deep learning method [
18,
19] is able to automatically extract features from original data [
20]. In Kelly’s [
21] experiment, various NILD algorithms using deep learning were proposed, such as the Delousing AutoEncoder (DAE), the long-short term memory network (LSTM), the gatedrecurrent unit (GRU), the Factorial Hidden Markov model (FHMM), and the CO method. The DAE algorithm was proven to have good disaggregation results. Zhang [
22] used two convolutional neural network algorithms for load disaggregation. Compared with Kelly’s method, the two CNN methods, sequence-to-sequence and sequence-to-point, achieved better performance [
23], but their layer numbers were small, and hence there were unable to extract higher level load characteristics. In the above methods, the CO algorithm, the DAE, and the two CNN methods were all trained by low-frequency data from the REDD dataset, which was first processed by the NILM-TK toolkit. The sampling interval of the data was 3 s. With an improvement of the model structure, Yang [
24] proposed a semisupervised deep learning framework based on BiLSTM and the temporal convolutional network for multi-label load classification. Akhilesh [
25] proposed a multilayer deep neural network based on the sequence-to-sequence methodology, and the algorithm, by reading the daily load profile for the total power consumption, could identify the state of the appliances according to the device-specific power signature. Since the neural networks were only trained for each appliance and the computational cost was high, Anthony [
26] proposed UNet-NILM for multi-task appliances’ state detection and power estimation, which had a good performance compared with traditional single-task deep learning.
The innovation of the algorithm proposed in this paper lies in the following: The multi-scale structure is used to extract different load information according to the characteristics of load disaggregation. The attention mechanism is used to fuse the load information at different scales to further enhance the feature extraction ability of the network, especially for the extraction of electrical features that are not frequently used. The overall architecture uses a skip connection of the residual network [
27] to improve network performance. Experimental results on two benchmark datasets show that our method is superior to other present methods.
4. Result
This experiment used the Keras neural network framework. The computer processor was AMD2600, and the graphics card was 1060 6G. After data was standardized, the length of the sliding window was set to 200, the learning rate of the network was set to 0.001, and the Adam optimizer was selected as the network optimizer.
Kelly’s experiments indicate that the DAE algorithm performs well in NILD, and Zhang C’s work also shows a good performance of CNNs in sequence-to-sequence and sequence-to-point load disaggregation. From the WikiEnergy data, we selected the air conditioner, fridge, microwave, washing machine, and dishwasher from Household 25. From the UK-DALE dataset, the kettle, fridge, microwave, washing machine, and dishwasher of Household 5 were selected. In order to verify the effectiveness and stability of the algorithm proposed in this paper, four approaches were compared with the MSA-Resnet: the KNN, the DAE, the CNN sequence-to-sequence learning (CNN s-s), and the CNN sequence-to-point learning (CNN s-p). Firstly, the WikiEnergy dataset was tested.
Figure 7 shows the disaggregation effect diagrams of five appliances of WikiEnergy from Household 25, and the actual power consumption data of these appliances. The figure compares the four disaggregation methods with the MSA-Resnet proposed in this paper.
In order to verify the effectiveness of the proposed method, two evaluation indexes were selected to evaluate the performance of the algorithm: the Mean Absolute Error (
) and the Signal Aggregate Error (
). The
evaluation index was used to measure the average error of power consumption and the actual power consumption of individual electrical appliances disaggregated at each moment. The
is expressed as the following:
where
represents the actual power consumed by an appliance at time
t,
represents the disaggregation power of the appliance at time
t, and
T represents the number of time points.
Equation (
14) is the expression of the
, where
and
e represent the power consumption predicted by disaggregation within a period of time and the real power consumption within a period of time. This index is helpful for daily electricity reports.
Figure 7 describes disaggregations of Household 25 in the WikiEnergy dataset. It can be seen that the above algorithms can basically achieve effective load disaggregation for the air conditioner. In the load disaggregation diagram of the fridge, the DAE and CNN s-s algorithms fluctuate greatly in the mean area of the appliance, compared with other algorithms. The KNN algorithm has the worst load disaggregation effect on the last three kinds of electrical appliances, so it could not realize an effective disaggregation of mutation points. For these three low-frequency electrical appliances, the load disaggregation of CNN s-s and CNN s-p algorithms are stable compared with the other two algorithms, but the load disaggregation of the CNN s-p method fluctuates greatly in the region of low power consumption. In summary, compared with other methods in load disaggregation, the MSA-Resnet shows the best performance on each electrical appliance, based on the power consumption curve.
Table 1 shows comparisons of
and
indexes of Household 25 load disaggregation in the WikiEnergy dataset. It can be seen that MSA-Resnet has obvious advantages in the disaggregations of the air conditioner, fridge, microwave, washing machine, and dishwasher. According to the
index, the MSA-Resnet performs better than the other four methods. For the
, the MSA-Resnet achieves the lowest value on the fridge, washing machine, and dishwasher, and accurate disaggregation of energy is achieved over a period of time. Combined with
Figure 7 and
Table 1, it can be inferred that the shallow CNN s-s and CNN s-p have difficulty accurately disaggregating the total power into the appliances with lower frequency. Compared with KNN and MSA-Resnet, the disaggregation errors of CNN s-s and CNN s-p are larger, because the structure of shallow CNNs is not able to extract deeper and more effective load characteristics, and their disaggregation effect is not as good as that of MSA-Resnet. There are two reasons for this: firstly, the residual is used to deepen the network and better enhance the ability to learn unbalanced samples; secondly, the ability to deal with low frequency appliances by multi-scale convolutions is strong. As can be seen in
Figure 7, the overall disaggregation effect of the KNN on the washing machine is not good, but the disaggregation error is small in terms of two indicators. To explain this phenomenon, certain interval periods are selected for comparative analysis, as shown in
Figure 8, the disaggregation comparison diagram shows each algorithm on each electrical appliance with a finer scale. The figure reflects the ability of the KNN to detect peak values. It can be seen in
Figure 8b,c that the KNN is not able to accurately disaggregate mutation points, but it could process regions with a power close to 0.
After the disaggregation of load, power thresholds of electrical appliances were used to distinguish the on/off states, so as to calculate their evaluation indexes. The thresholds of the air conditioner, fridge, microwave, washing machine, and dishwasher were set to 100 W, 50 W, 200 W, 20 W, and 100 W, respectively.
rate,
rate,
rate, and
F1 values [
41] were used to further evaluate the performance of the different algorithms in their on/off states.
represents the probability of predicting correctly in the instance with a positive label:
where True Positive (
) represents the number of predicted states that are disaggregated as “on” when their ground truth state is “on”, and False Negative (
) denotes the number of predicted states that are “on” when their ground truth state is “off”. There are two possibilites: one is to predict the original positive class as a positive class (
), and the other is to predict the original positive class as a negative class (
).
refers to the proportion of samples that are predicted to be in an “on” state and are indeed in an “on” state:
where False Positive (
) represents the number of states that are actually “off” when their predicted states are “on”.
refers to the ratio of the number of samples correctly predicted to the number of the total dataset:
where
P is the number of positive samples, and
N is the number of negative samples.
F1 can be expressed as
Table 2 is a comparison of the evaluation indexes for judging “on” or “off” states of Household 25 electrical appliances. It can be seen from
Table 2 that for
and
F1, MSA-Resnet achieves the best performance in various electrical appliances.The disaggregation diagrams of the microwave, the washing machine, and the dishwasher are in
Figure 8, which shows that, in the actual power consumption of these three electrical appliances, their proportion of “on” states is significantly lower than that of the first two electrical appliances. In such unbalanced sample data with a small sample size, the “on” states of the washing machine cannot be effectively predicted using the CNN s-s and CNN s-p, whereas the MSA-Resnet presents better results.
In order to prove the effectiveness of the Leaky-Relu function, under the same conditions, comparative experiments are conducted with WikiEnergy’s Household 25 using the Relu function. According to the experimental results in
Table 3, at the two indicators of the
and the
, the algorithm using the Leaky-Relu function is better.
For further verification, we selected five electric appliances from Household 5 in the UK-DALE dataset for additional experiments.
Figure 9 shows the results of disaggregation. The figure shows that all of the above algorithms are able to achieve effective disaggregation for the kettle, an electrical appliance that is used often. For the fridge, the KNN and the DAE work worse than the CNN s-s, the CNN s-p, and the MSA-Resnet. For the microwave, the washing machine, and the dishwasher, which are infrequently used and have a low power consumption, the MSA-Resnet has better disaggregation results than the other two deep learning algorithms, mainly because it could better detect peaks and state changes.
Table 4 shows Household 5’s load disaggregation evaluation index in the UK-DALE dataset.
Table 4 shows that the MSA-Resnet does better in
and
compared with other methods. For the
, the MSA-Resnet performs better with respect to the kettle, the fridge, the washing machine, and the dishwasher. The MSA-Resnet has smaller
values in the kettle, the fridge, and the washing machine.
Table 5 shows the judgement results of “on” and “off” states of Household 5 in the UK-DALE dataset. The thresholds of the kettle, the fridge, the microwave, the washing machine, and the dishwasher were set to 100 W, 50 W, 200 W, 20 W, and 100 W, respectively.
Table 5 shows that the Recalls of the washing machine and the dishwasher using the CNN s-s and the CNN s-p are low, the number of positive samples is small, and its ability to predict the “on” state is poor. If the task of judging the electrical state is considered as classification, appliances with a high utilization rate have better classification results.
Figure 10 shows load disaggregation comparisons of these five methods over a period of time. It can be seen from the figure that, compared with other algorithms, the MSA-Resnet could better disaggregate equipments, whereas the KNN and the DAE have the worst decomposition abilities. For the low-frequency washing machine and dishwasher, the MSA-Resnet could still well fit the power curve, because of its network structure. It uses multi-scale convolutions to obtain rich load characteristics, and it improves the performance of the network through the attention mechanism and the residual structure.
In order to prove the effectiveness of the Leaky-Relu function, a comparative experiment with the Relu function was also done on the UK-DALE dataset.
Table 6 can prove that the Leaky-Relu function is still the best.