*2.2. CNN Classifier*

In the proposed methodology, the transient response generated by an appliance's turning-on is used as the load signature for appliance classification [7]. Whenever a target appliance is turned on, a transient response can be detected in the aggregated active power waveform. Besides appliance classification, this load signature presents two additional advantages. Firstly, for a given appliance, the turn-on transient response pattern is unique and relates only to the operational characteristics of the appliance [43]. Consequently, the identification algorithm's performance is independent of the simultaneous operation of other types of appliances, even when a large number of devices is considered [7,44]. Secondly, the proposed algorithm can successfully treat various types of appliances, even though presenting similar consumption levels at steady-state, since classification is performed based on the unique appliance transient characteristics instead of calculating steady-state features.

The same principle can detect specific operational states by identifying transient responses caused by a state transition regarding multi-state appliances. For example, for a washing machine or a dishwasher, the water heating process's transient response can be used to identify this specific state, being of primary interest as the most energy-intensive process during an operation cycle.

In order to associate a given transient response, *P*tr, with a specific target appliance behavior, a CNN classifier is utilized. In this sense, for each target appliance, a dedicated CNN classifier is used, identifying *P*tr as positive when related to the target appliance or negative otherwise.

Different types of appliances generate transient responses with distinct characteristics, primarily when a high sampling frequency, e.g., at 100 Hz, is used. Suppose a user was initially given an example of such a response corresponding to a specific appliance. In that case, he/she could later recognize a new response of the same appliance by simple visual inspection. However, the implementation of such a recognition algorithm is not an easy task.

Inspired from the area of computer vision, where CNN models are used for image recognition, and classification [45], a similar approach has been adopted in this paper. Convolutional layers can automatically extract useful features from the input data without user supervision [45]. Thus, there is no need to implement specific algorithms; instead, by training a CNN model, the classification problem can be successfully solved. A block diagram of the proposed CNN architecture is depicted in Figure 2.

Initially, min-max normalization is applied to *P*tr by means of (3); the resulting normalized vector, *P*norm, is forwarded as input to the CNN model.

$$P\_{\text{norm}} = \frac{P\_{\text{tr}} - \min(P\_{\text{tr}})}{\max(P\_{\text{tr}}) - \min(P\_{\text{tr}})} \tag{3}$$

**Figure 2.** Convolutional neural network (CNN) block diagram.

Since the CNN input model is an one-dimensional signal, one-dimensional convolutional layers are used to extract the useful features from *P*norm. In particular, three consecutive 1-d convolutional layers are used in combination with an 1-d max-pooling layer. All convolutional layer parameters have been set to 32 filters, kernel size equal to 3, strides equal to 1, 'same' padding, and rectified linear unit (ReLU) activation function. The ReLU function is defined as

$$\text{ReLU}(\mathbf{x}) = \max(\mathbf{x}, \mathbf{0}) \tag{4}$$

for *x* ∈ R. For max-pooling layers, the pool size was set to 2. Generally, at each 1-d convolutional layer, a number of filters is applied to the corresponding input, **x**conv. Assuming that the size of **x**conv is *M*conv × *N*conv and a single filter, **f**, is of 3 × *N*conv, the output of the convolution between **x**conv and **f** will be a *M*conv × 1 matrix. The resulting **y**conv is calculated as

$$\mathbf{y}\_{\text{conv}}(m) = \max \left( \sum\_{t=1}^{3} \sum\_{n=1}^{N\_{\text{conv}}} \mathbf{x}\_{\text{conv}}(m+t-2, n) \, \mathbf{f}(t, n), \, 0 \right) \tag{5}$$

for *m* ∈ [1, ..., *<sup>M</sup>*conv], where **x**conv(0, n) and **x**conv(Mconv+1, n) are considered zero for any *n* ∈ [1, ..., *<sup>N</sup>*conv] as a result of zero-padding. In our case, where 32 filters are used in a convolutional layer results **y**conv, are stacked as columns, forming a *M*conv × 32 matrix.

Each layer is followed by a max-pooling layer to down-sample the extracted features of the input signal. In this sense, a summarized version of the extracted features (half the size) is created, maintaining the most important features and is further used as input to the next layer. Assuming **<sup>x</sup>**pool, with size *<sup>M</sup>*poo<sup>l</sup> × *<sup>N</sup>*poo<sup>l</sup> is the max-pooling layer input matrix, the output, **y**pool, has a size of (*<sup>M</sup>*pool/2) × *<sup>N</sup>*poo<sup>l</sup> and is calculated as

$$\mathbf{y}\_{\text{pool}}(m,n) = \max\left(\mathbf{x}\_{\text{pool}}(2m-1,n), \,\mathbf{x}\_{\text{pool}}(2m,n)\right) \tag{6}$$

for *m* ∈ [1, ..., *<sup>M</sup>*pool/2] and *n* ∈ [1, ..., *<sup>N</sup>*poo<sup>l</sup>].

Following the three convolutional/pooling pairs, a flattening layer is applied, transforming its input to a single vector by column-wise stacking. Finally, two dense layers are used of 20 and 1 output nodes, respectively. For the first dense layer, the ReLU activation function is applied; for the last layer, the sigmoid activation function defined in (7) for

*x* ∈ R is used to compute the probability of the transient response to correspond to the positive class.

$$S(\mathbf{x}) = \frac{1}{1 + \mathbf{e}^{-\mathbf{x}}} \tag{7}$$

Generally speaking, a dense layer with *M*dense input nodes and *K* output nodes includes two trainable parameters, i.e., a weight matrix, **w**, with size *M*dense × *K* and a bias vector, **b**, with size *K*. Given an input vector, **<sup>x</sup>**dense, with *M*dense elements, the output **y**dense of size *K* is calculated as

$$\mathbf{y}\_{\text{dense}}(k) = F(\sum\_{m=1}^{M\_{\text{dense}}} \mathbf{x}\_{\text{dense}}(m) \, \mathbf{w}(m,k) + \mathbf{b}(k)) \tag{8}$$

for *k* ∈ [1, ..., *<sup>K</sup>*], where *F* is the corresponding activation function. Before each dense layer, a dropout layer [46] is used. Its value is set to 0.2 to prevent model over-fitting.

A standard backpropagation algorithm is used during training to optimize the binary cross-entropy loss between the predicted probabilities and the actual labels. Assuming that the predicted probabilities are *p*1, *p*2, ..., *pB* for *B* samples and the actual labels are *q*1, *q*2, ..., *qB*, the binary cross-entropy loss is

$$L = -\frac{1}{B} \sum\_{b=1}^{B} [q\_b \log\_2 p\_b + (1 - q\_b) \log\_2 (1 - p\_b)].\tag{9}$$

The CNN classifier is trained for a maximum of 50 iterations. The Adamax optimizer [47] was selected assuming an initial learning rate of 0.01 and batch size 32. In order to avoid over-fitting, early stopping with patience is used. The training process stops once the validation accuracy does not improve after five consecutive iterations.

#### *2.3. Consumed Energy Estimation Algorithm*

The last module is related to the real-power estimation of the target appliance. The implemented algorithm considers the appliance end-uses as pulses of constant power; this approximation is well-suited for single-state appliances such as microwave oven, kettle or toaster. In the case of appliances with operating cycles comprising of multiple pulses, the algorithm considers each pulse as a new appliance end-use and not as a single enduse event of several pulses. An example is the oven turning-on and off controlled by a thermostat and the dishwasher, where several water heating pulses may occur depending on the selected program. In this sense, the proposed algorithm performance may degrade for multi-state appliances. They are characterized by varying power consumption and cannot be approximated with a constant power pulse. However, such appliances present a predominant energy-intensive process during a full operating cycle while the rest operating states are less critical regarding the total energy consumption. For example, washing machine or dishwasher cycles include energy-intensive water heating processes and low energy-consuming processes, e.g., water pumping. Therefore, regarding multistate appliances, the proposed power estimation algorithm focuses on the estimation of the energy-intensive processes neglecting the effect of the minor consuming ones.

When the CNN classifies a transient response as positive, it is implied that the appliance has been turned-on. The calculated power increase, *P*init is considered equal to the appliance power consumption and assumed constant during the total time of operation of the appliance. When a power decrease between two consecutive seconds in *P*d inside the interval [0.8 *P*init, 1.2 *P*init] is detected, the appliance is considered to be turned-off. The pseudo-code of the energy consumption estimation algorithm is shown in Algorithm 2, having as input the time (in seconds), t, when the target appliance is turned-on and *P*d.


## **3. Evaluation Methodology**

*3.1. Dataset*

The proposed NILM system is based on the fact that each household appliance presents a transient response pattern with distinct characteristics, becoming more noticeable as the sampling frequency increases. In this paper, the selected sampling frequency is 100 Hz; at this frequency the transient characteristics are captured in contrast to lower sampling rates where such information may be lost. In Figure 3, turn-on transient responses at 100 Hz and 1 Hz for five appliances are depicted. It is evident that the frequency of 100 Hz reveals unique details that are lost when sampling at 1 Hz. More specifically, Figure 3a presents the turn-on response of a high-power consumption (~1.2 kW) fridge compressor with a duration of fewer than two seconds. Figure 3b visualizes the water heating process of a washing machine, which corresponds to a steep power step-up. Next, Figure 3c illustrates the transient response of a microwave oven as a high-power spike followed by a smooth power increase. In Figure 3d, a stove turn-on presenting a smooth and convex power increase is shown, and finally, in Figure 3e visualizes the transient response from a heat pump dryer appliance, including a high-power spike at motor starting time.

An extensive set of transient responses for each target appliance is required to train the CNN classifier. For this purpose, a private dataset that includes transient responses of different household appliances sampled at 100 Hz from different installations is used. The type of appliance and the number of samples for each case are summarized in Table 1. Note that, the duration of the transient responses contained in the dataset ranges from 12 s to 1 min.

**Figure 3.** Comparison of the turn-on transient response with sampling frequency at 100 Hz and 1 Hz for (**a**) fridge, (**b**) washing machine, (**c**) microwave oven, (**d**) stove and (**e**) heat pump dryer.


**Table 1.** Private dataset: Number of transient responses for the appliances of interest.

In this study, three appliances are selected to test the proposed methodology's performance, i.e., fridge, washing machine, and microwave oven. Pulses can approximate the end-use of these appliances without significant error in power estimation. Furthermore, such appliances are considered typical for most households, corresponding to substantial total energy consumption. The selected appliances represent a larger group of appliances since both single-state and multi-state appliances are considered. Additionally, detailed results regarding the analysis of such appliances can be found in several relevant works [19,20,22,25–27]; thus, a comprehensive comparative analysis can be performed. Finally, low energy-consuming appliances such as game consoles and phone chargers have not been investigated, being of trivial importance and hard to be identified in terms of NILM algorithm application [19].

For each target appliance, a binary classifier is implemented and trained. During training, the transient responses of the appliance under consideration are labeled positive; the responses corresponding to a different appliance are labeled negative. Balancing of the positive and negative classes is performed in order to prevent bias towards the class with the most samples; the number of negative responses is the same as the number of positive ones. A training/validation/testing split is used assuming a ratio of 60%/20%/20% to avoid over-fitting for each class separately.

However, because the number of samples per appliance is small, augmentation techniques are used. These techniques aim to increase the number as well as the diversity of the training samples by artificially introducing variations in existing transient responses. Specifically, for each transient response, 15 samples with the required length of 6 s are created. Assuming that the time-series that contains a response is z, each one of the 15 samples is generated by means of the following steps:


The number of samples for training, validation and testing the sets per appliance is shown in Table 2.


**Table 2.** Number of positive samples per set.

## *3.2. Performance Metrics*

The proposed methodology is evaluated in terms of the event detection algorithm, the CNN classifiers as well as the overall system performance. For each case, different metrics are used.

#### 3.2.1. Metrics for Event Detection Evaluation

For the event detection algorithm the true positive rate (*TPR* = *TP*/(*TP* + *FN*)), the false positive rate (*FPR* = *FP*/(*FP* + *TN*)) and false negative rate (*FNR* = *FN*/(*TP* + *FN*)) are calculated; *TP*, *FN*, *FP* and *TN* are the number of true positives, false negatives, false positives and true negatives, respectively. Here, a sample (a time instant) is positive if it is an actual event (there is an appliance turning-on or off) and negative if not.

#### 3.2.2. Metrics for Classifier Evaluation

To evaluate the classifier, the most common metrics used in classification and NILM problems are adopted [18,29,32,34,35]. Specifically, the accuracy, precision, recall and *F*1-score, defined in (10)–(13), respectively, are calculated

$$accuracy = \frac{TP + TN}{TP + TN + FP + FN} \tag{10}$$

$$precision = \frac{TP}{TP + FP} \tag{11}$$

$$recall = \frac{TP}{TP + FN} \tag{12}$$

$$F\_1 = 2 \cdot \frac{precision \cdot recall}{precision + recall}.\tag{13}$$

In this context, for a transient response classifier, a sample (i.e., transient response of 6 s) is positive if the transient response corresponds to the target appliance. Otherwise, it is assumed negative.

#### 3.2.3. Metrics for Overall NILM System Evaluation

The overall proposed NILM system is tested by using the same metrics as previously, i.e., accuracy, precision, recall, and F1-score to evaluate the predicted status of the appliance (ON or OFF). Thus, a sample (i.e., a time instant) is considered positive if the appliance is ON and negative if not. It should be mentioned that an appliance is considered turned-on if the measured active power is higher than 5 W. Additionally, for energy estimation, the mean absolute error (*MAE*) and the root mean square error (*RMSE*) in (14) and (15), respectively, are computed

$$MAE = \frac{1}{N} \sum\_{n=1}^{N} |y[n] - \mathcal{Y}[n]| \tag{14}$$

$$RMSE = \sqrt{\frac{1}{N} \sum\_{n=1}^{N} (y[n] - \hat{y}[n])^2} \tag{15}$$

where *y*[*n*] and *yˆ*[*n*] is the original and the estimated power response with N samples. Moreover, the relative error in total energy (RE), defined in (16), is calculated

$$RE = \frac{|E - E|}{\max(E, \hat{E})} \tag{16}$$

where *E* and *Ê* is the original and the estimated total energy consumption of the appliance.
