1. Introduction
Between 25% and 40% of the global energy consumption and the corresponding amount of carbon dioxide emissions comes from residential buildings [
1,
2,
3,
4]. It is estimated that in the next two decades the average number of electrical devices used in houses is going to rise [
4]. In parallel, climate change and urbanization are affecting the energy load of urban buildings, with the energy load demand growing two times faster than the expansion of urbanization [
5] have shown that roughly 20% of households consumed energy is due to faulty equipment or poor operational strategies [
6,
7,
8]. Therefore, to detect faulty device operation and improve operation strategies, optimization techniques in terms of device detection and load scheduling have been developed to find optimal and suboptimal operational strategies [
9]. Additionally, significant progress in smart grids, smart systems, and smart devices was made in the last few decades, considering optimized energy generation and distribution [
9,
10]. Accordingly, energy management and the deployment of Information and Communication Technologies (ICT) in residential buildings increased as well, in order to reduce households’ energy consumption without decreasing living quality levels or violating consumers personality rights and privacy [
11,
12]. In general, the amount of information gathered is increased progressively with respect to consumer behavior. Especially, usage of energy is monitored to reduce overall energy consumption and peak loads, while improvement of the well-being of consumers is tried to be achieved as well [
13].
Studies have shown that for achieving significant decrease in energy consumption smart energy management, smart grids, fine-grained energy monitoring, as well as load forecasting on household level are indispensable [
14,
15]. However, nowadays energy monitoring is mostly done via an aggregated measure of energy on monthly bills and does not offer detailed information about energy monitoring. Therefore, to accurately measure energy consumption, smart meters are utilized usually measuring with sampling frequency equal to 1 Hz or more. Smart meters are devices used to measure energy consumption of electrical appliances, based on voltage and current measurements. The energy consumption is calculated at periods of time which usually are every 1 second or more frequently, e.g., up to 30,000 samples per second [
16]. The more frequently energy consumption is calculated the more detailed is the captured information of energy consumption; however, increasing the sampling frequency will linearly increase the data to be stored, processed, or transmitted which in turn increases the hardware cost exponentially [
17]. Therefore, most recent studies focus on low sampling frequency data, as the majority of commercial smart meters collect data usually at 0.1 Hz or up to 1 Hz to minimize the hardware cost of smart meters and to address the transmission and data-storage capacity limitations [
18,
19]. Energy saving enhancement can be achieved on device level by detecting faulty device operation and inefficient operating strategies [
7]. Knowledge about the appliances’ consumption can lead to a reduction of total consumption through increased awareness of energy consumption [
20]. Recent studies have shown that households are usually bad at estimating individual power consumption (e.g., overrating small appliances consumption and under-rating the amount of energy for heating) [
21]. This means that the energy consumption must be either measured on device level, which disadvantageously results in increased cost due to wiring issues and data acquisition [
19], or that the aggregated energy (consumed energy measured centrally for each household) must be split to appliance level automatically, which is called energy disaggregation. Energy disaggregation as defined in [
22] is the Non-Intrusive Load Monitoring (NILM) determining the consumption of energy from each individual appliance of a house, performed by processing of measurements of the current and voltage of the overall household’s load. The term non-intrusive is used to point out the distinction to Intrusive Load Monitoring (ILM) methods utilizing several measurements and smart meters and set the focus on determining the per device consumption. In other words, NILM is extracting electrical energy consumption at appliance-level based on one central measurement, thus to identify the onsets
(switch-on times) and
(switch-off times) of appliances from the aggregated energy signal in order to find the corresponding consumptions per appliance [
23].
Several methods to solve the NILM energy disaggregation challenge can be found in the bibliography. These methods are briefly classified in methods using Source Separation (SS) algorithms and in approaches that do not use SS algorithms. Common for all NILM approaches is that they use measurements of the aggregated energy consumption of a household with a sampling frequency
fs in the order of a sample per second up to few tens of kHz [
16]. NILM methods may use macroscopic signal parameters (e.g., active/reactive power [
24,
25]) or microscopic ones (e.g., transient energy and harmonics [
26,
27,
28]), depending on the sampling rate
, to split the aggregated signal in appliance level [
29]. Appliance identification methods not using SS algorithms are based mainly on supervised methods and the extraction of features, which will be used either for training a Machine Learning (ML) algorithm (e.g., Support Vector Machines (SVM) [
30], Artificial Neural Network (ANN) [
31], Decision Tree (DT) [
32], K-Nearest Neighbours (KNN) [
33]), or defining a set of rules or thresholds [
28]. As regards appliance identification methods using SS algorithms, they are based on single-channel source separation and solve the task with optimization criteria. Approaches using source separation extract the power consumption characteristic pattern of every appliance from the aggregated signal using an optimization algorithm with constrains [
19,
34,
35]. Commonly reported SS algorithms in the NILM task are Independent Component Analysis (ICA) [
36], Non-Negative Matrix Factorization (NMF) [
37], and Sparse Component Analysis (SCA) [
38]. Source Separation-based NILM approaches are unsupervised; however, a priori knowledge is needed as only the aggregated signal measurements are used, thus making them semi-unsupervised [
19], in contrast to the NILM approaches without using SS algorithms, which are supervised. Furthermore, cutting edge technology in machine learning has led to a number of recently proposed in the literature deep learning approaches using big datasets, like the Almanac of Minutely Power Dataset (AMPds) [
39]. Methodologies using Convolutional Neural Networks (CNNs) [
40,
41,
42], Recurrent Neural Networks (RNNs) [
43,
44] and Long Short-Term Memory (LSTM) architectures [
44,
45], denoising autoencoders (dAEs) [
46], and Gated Recurrent Units (GRUs) [
40] can be found in the bibliography. Furthermore, additional questions regarding consumer privacy and real-time capability arise with the high frequent measurements of energy consumption, and have been discussed in [
47,
48] for security relevant issues and in [
17] and [
49] for low cost disaggregation and real-time capability.
There is still no established approach for solving the NILM problem and literature reports multiple solutions with and without source separation. There are numerous electrical devices which have steady state behavior [
22] and are typically modeled as finite state machines [
22,
50] as well as electrical devices with non-steady behavior, which have nonlinear and/or continuous characteristics [
51,
52]. The identification of such appliances when working in parallel or showing strong time-dependent behaviors [
53] is still an unsolved problem, especially for nonlinear and continuous devices. In this paper a two-stage fusion approach is proposed aiming at representing different device combinations and their time varying behavior more accurately. The proposed methodology is based on supervised learning and utilizes low frequency data as well as steady-state features, similar as in [
54,
55,
56].
The remaining of this article is organized as follows.
Section 2 presents the proposed two-stage fusion methodology. In
Section 3 and
Section 4, the experimental set-up and the experimental results are given, respectively. Finally, in
Section 5 conclusions are provided.
2. Two-Stage Fusion Methodology
The NILM energy disaggregation task can be described as the problem of estimation of the power consumption of each electrical appliance using the measurements acquired from one central smart meter, within time windows (frames or epochs). In detail, given a set of
known appliances each consuming power
, with
, the aggregated power
measured by the central smart meter will be
where
is a “ghost” power consumption, which is usually consumed by one or more unknown appliances. In NILM, the aim is to calculate estimations
of the power consumption of each electrical appliance
using an estimation method
with minimal estimation error and
, i.e.,
As Equation (2) is practically impossible to be solved using an analytical solution, most energy disaggregation methodologies are based on segmentation of the aggregated signal into frames and estimation of the power consumption on device level within each frame using a machine learning based model, which can either be one model per device following the “one vs. all” approach [
57] or a multi-class device identification model [
58]. The architecture of the baseline one-stage NILM approach based on regression estimators of power consumption is presented in
Figure 1.
Specifically, the one-stage NILM methodology consists of preprocessing, feature extraction, and a regression model for estimating the appliances power consumption
. During preprocessing the aggregated signal is initially filtered, in order to remove peaks as proposed in [
59], frame blocked in time frames
of length
, and a feature vector
,
, is calculated for each frame
, where
and
is the last frame of the aggregated signal. Finally, a regression model is used to estimate power consumption values
for each of the
devices. The estimation of each device’s power consumption can be done either using in parallel one regression model per device or using one regression model with
output-estimations.
In this work, the one-stage NILM methodology is extended to two stages. In detail, the first stage consisting of classifiers (device detectors) processing the aggregated signal in parallel and each of them producing a binary device-specific detection score, while the second stage consists of regression fusion models for estimating the power consumption of each appliance using as input the stage I results concatenated with the feature vector. The architecture of the proposed two-stage methodology is presented in
Figure 2.
In detail, during stage I the feature vectors are initially processed by a set of
classification models
, one for each of the
known devices and one for the unknown ghost-power according to the “one vs. all” approach. The output before the last layer of stage I,
is the classification score for each of the
devices:
where
is the classification model for the
th device and
is the feature vector as calculated in the feature extraction stage. The predicted class is the one with the highest score
. To get the binary decision at the end of stage I, a threshold
is applied to transform the initial classification scores
to their binary representation, thus labeling if a device is working (1) or not (0):
Subsequently, the initial binary estimations,
with
, from stage I are concatenated together with the feature vector,
to an new feature vector
, so as to estimate the power consumptions of the
appliances. Specifically, in the second stage
fusion models,
with
, are receiving as input the new feature vector
, giving a numerical estimation (regression) for the appliance power consumption for each of the
devices.
The initial binary estimates of device operation from the first stage are used from the regression models of the second stage to model any power consumption correlations between the different appliances, i.e., the devices that are likely to work simultaneously within the time frame . Additionally, the restriction on Equation (5) assures that the prediction of power consumption for each single device at frame instance t cannot exceed the aggregated power consumption within that frame.
The proposed methodology combines binary device estimates from a first classification stage with a second regression fusion stage, thus any complementary information from the first stage will be captured and learned by the fusion model. Moreover, with the existence of ghost power in the first level, the output of the binary classifiers will be used as a feature for the detection of unknown devices, which offers advantage to the present methodology in real set-up evaluations where unknown devices exist quite often.
4. Experimental Results
The NILM methodology described in
Section 2 was tested based on the experimental protocol presented in
Section 3 using the parameter optimization results of
Table 2. To evaluate NILM accuracy on electrical appliance level, Equation (6) was modified by removing the sum across the M appliances, thus resulting to
The experimental results in terms of
EACC (%) for all evaluated datasets, all evaluated classification algorithms and for both the one-stage and proposed two-stage architecture are tabulated in
Table 3. The best performing energy disaggregation scores per dataset are indicated in bold for both one- and two-stage results.
As shown in
Table 3, the best performing classifier amongst all tested datasets, when using the one-stage architecture, is
RF outperforming all other classifiers except for the case of
iAWE dataset where the
SVM classifier achieves significant higher performance in terms of energy disaggregation. Furthermore, the results in
Table 3 show that the two-stage fusion methodology improves the overall
EACC performance across all evaluated datasets. In terms of average improvement per dataset
EACC increases between 0.6% and 4.1% depending on the dataset and the classifier. The most significant improvements in terms of relative performance were observed when using
DNN as classifier where performance was improved by 4.1% (
REDD-2 dataset). The improvement in terms of absolute
EACC values, i.e., the average increase in estimation accuracy when considering the best experiment for the first stage as the baseline performance, ranges between 0.6% and 3.4% when using SVM and RF as classifiers and the results were statistically significant when comparing their accuracy scores on frame level of the one-stage and the two-stage fusion architectures. In detail, RF outperformed SVM in ten out of eleven datasets with exception of the iAWE database, which is probably due to the significant higher proportion of continues appliances which is in line with results in literature reporting high accuracies for SVM in case of appliances with strong time varying behavior [
73,
74]. The evaluation results demonstrate the validity of the proposed method as it has offered improved performance when tested in several and highly dissimilar (with respect to the sampling rate
, the number and the type of devices) datasets as presented in
Section 3 and shown in
Table 1.
In a next step we performed analysis of energy disaggregation performance on device level for one dataset out of each database.
Table 4 tabulates the
EACC on device level for the ECO-2, REDD-2, and iAWE datasets. The choice for the three datasets was made according to the characteristics of the datasets shown in
Table 1. Specifically, datasets which have roughly the same number of appliances (<10) and are similar in their collection of appliances thus having appliances of the same type were chosen.
As can be seen in
Table 4, there is a relation between performance improvement and appliance category with one/multi-state devices without significant power peak signature showing no performance improvement and nonlinear and continuous appliances as well as one-state appliances with significant power peak showing significant performance improvement. Depending on the dataset, the performance increase varies up to 0.4% for one/multi-state devices without power spikes, up to 7.4% for devices with power spike, up to 10.1% for nonlinear devices and up to 4.9% for continuous devices respectively. In detail the highest performance increase in the three tested datasets was observed for nonlinear appliances namely the TV (10.1%) and the Entertainment (7.7%) in the ECO-2 dataset. Significant increase in performance was also observed for devices with power spikes (PS) in their signature, like the Fridge, the Freezer, and the A/C with maximum improvement equal to 7.4%, 3.9%, and 4.9%, respectively. The lowest or no performance improvement was observed for one-state appliances without power spikes, e.g., resistive lamps or disposal.
In order to directly compare the proposed methodology with other approaches proposed in the literature we additionally tested our method on five selected loads from the REDD-2 dataset, namely the refrigerator, lighting, dishwasher, microwave, and furnace. These loads were used in [
55] because they carry a large percentage of the overall consumed energy and they have been used in other publications [
67,
75]. Furthermore, the disaggregation results were evaluated both in a noisy (with ghost data) and a noiseless (with synthetic data) setup as in [
75] for both the one-stage and the proposed two-stage fusion architecture. The results are tabulated in
Table 5.
From
Table 5 it is seen that the presented two-stage fusion model outperforms the baseline one-stage system in both the noisy and noiseless setup with 93.4% (2.7% improvement) and 95.7% (2.5% improvement), respectively. Moreover, the largest improvements can be observed for the appliances with significant power spikes and nonlinear behavior, i.e., the fridge and the light with 13.0% (6.7%) and 2.8% (3.7%), respectively. For the purpose of comparison with previously published NILM approaches the summary of methods using the same databases and the
EACC performance metric presented in [
76] was used. Furthermore, the summary of results of [
76] was updated by incorporating very recent results found in the literature utilizing deep learning. However in the latest published deep learning approaches many researchers started utilizing databases with even lower sampling frequency and longer monitoring duration (e.g., AMPds [
39] or UK- DALE [
77]) as in [
41,
42,
44,
78], or utilizing different accuracy metrics (e.g., normalized RMSE in [
45]) making direct comparison impossible. The results are tabulated in
Table 6.
From
Table 6, it is shown that the two-stage fusion methodology achieves higher accuracy than all other published methods evaluated on the REDD datasets 1–4 and 6. As regards the experimental setup using five appliances of the REDD- 2 dataset (initially proposed in [
55]) the proposed fusion architecture performs better than all reported NILM methods, except the method of Makonin et al. [
75] utilizing HMM sparsity which achieved 1.4% higher accuracy than our proposed fusion methodology in the noisy set-up; however, the energy data used in [
75] were manually modified to time align data acquired from two different smart meter devices while we have used the original data from the REDD-2 dataset without any modification. Moreover, for the approach presented in [
75], the performance on the full REDD dataset with all 18 appliances across all houses (1, 2, 3, 4, and 6) has not been reported in the literature and thus direct comparison with our approach is possible only using the REDD-2 dataset with five devices. Regarding the results presented in [
40] are not directly comparable with our approach (which performs 8.8% better) as a modified training/test setup has been used. To compare our performance with the one reported in [
45] we calculated the normalized RMSE used in [
45]. Our proposed methodology has normalized RMSE equal to 0.24, which is 0.11 better than the score reported in [
45]. Considering the results from
Table 3 and
Table 4, the proposed two-stage fusion methodology demonstrated improvements both in average and per device performance across all evaluated datasets with all evaluated classifiers, demonstrating the validity of the methodology. As regards the effect of different datasets when using the same classifier, the improvement in terms of
EACC varies between 0.6% and 4.1% as can be seen in
Table 3. The main reasons are the different number of devices in each dataset and the distribution of appliance types, i.e., how many appliances of a specific type (e.g., one-state or nonlinear) can be found in each dataset. Considering the results in
Table 3 in combination with the database categorization in
Table 1 it can be seen that datasets with small number of appliances (e.g., ECO-1 or REDD-2) have a slightly higher improvement in estimation accuracy and show improvements of approximately 1.0–4.1%, while datasets with larger number of appliances (e.g., REDD-1 or REDD-3) show improvements of up to 1.6%. Moreover, the datasets including significant number of continuous appliances or nonlinear appliances (e.g., ECO-2 or iAWE) benefit more from the two-stage fusion architecture. Continuous or nonlinear devices may have high correlation with the daily routine of the users/consumers as well as they may have dependencies between them, e.g., the Entertainment appliances which in the general case are interconnected with the TV. For electrical appliances having dependencies with other devices or depending on residents’ everyday routine, the a priori information of the devices operating together or following similar everyday routine patterns, e.g., most of the times working or not working at the same time, can boost the estimation of the power consumption of those devices. For such appliances, power consumption estimation can be improved from the proposed two-stage fusion methodology in which estimates of the operation of other devices (identified at the first stage of the proposed architecture) are utilized. In addition, energy consumption estimation for appliances presenting power spikes, i.e., peaks that appear during the switching on of electrical motors, e.g., in fridges or freezers, was found to get improved by the fusion stage of the proposed NILM architecture, given that the existence of a power spike in a frame changes the total amount of energy to be disaggregated. Therefore, it is beneficial having an initial estimate of which appliances are likely to be working (calculated from the first stage in the two-stage architecture), to discriminate power spikes from appliances with constant high-power consumption.
It was shown in
Table 3,
Table 4,
Table 5 and
Table 6 that the two-stage fusion methodology improved the estimation accuracy across all datasets. Especially in
Table 4, it was shown that the two-stage fusion methodology shows higher performance increase for appliances with power spikes as well as nonlinear and continuous appliances. In
Table 5, the results were compared to state-of-the-art literature for five selected appliances for both one-stage and proposed two-stage architecture, while a comparison of average estimation accuracy scores was presented in
Table 6, showing the improvement of the method when using the complete dataset.