*2.4. Extracting Events*

Evaluating event-based NILM algorithms requires having ground truth data for all events in the dataset. The authors of the UK-Dale [11] dataset therefore recorded appliance turn on/off events for house 1 using switchable sockets. If the user pressed the button on such a switchable socket. The current timestamp, device, and state of the socket is logged. We particularly see three drawbacks of such an approach: (1) Devices that are hardwired to the mains like the stove or lighting cannot be equipped with such a socket. (2) Only on/off events can be logged. Most of our household appliances are multi state devices that have more than just a binary state *on* or *off*. (3) Devices that change their state without user interactions can not be labeled (e.g., a kettle turns off automatically if the water is boiling).

We build on the idea of a post-conducted labelling approach introduced by Pereira et al. in [21] and developed a semi-automatic labeling algorithm that consists of three steps: event detection, unique event identification, and high variance filtering.

#### 2.4.1. Event Detection

The event detector utilizes the Log-Likelihood Ratio (LLR) test introduced by Pereira et al. in [21]. It has been enhanced by adaptive thresholding by Völker et al. in [23]. The detector calculates the likelihood (*L*(*i*)) that an event has happened at sample *i* by using a detection window over the power signal (*S*(*i*)). The detection window splits into two sub-windows, the *pre-event* window [*a*, *i*[, and the *post-event* window [*i*, *b*]. *L*(*i*) calculates as

$$L(i) = \ln\left(\frac{\sigma\_{[a,i]}}{\sigma\_{[i,b]}}\right) + \frac{\left(S(i) - \mu\_{[a,i]}\right)^2}{2 \cdot \sigma\_{[a,i]}^2} - \frac{\left(S(i) - \mu\_{[i,b]}\right)^2}{2 \cdot \sigma\_{[i,b]}^2},\tag{1}$$

where *<sup>σ</sup>*[*<sup>a</sup>*,*<sup>i</sup>*[, *<sup>σ</sup>*[*<sup>i</sup>*,*b*], *μ*[*<sup>a</sup>*,*<sup>i</sup>*[ and *μ*[*<sup>i</sup>*,*b*] are the standard deviations and means of the pre-event and post-event window, respectively. This signal is cleaned using an adaptive threshold (*thresi*). If the change of the mean value between the pre- and post-event drops below this threshold, *L*(*i*) is forced to zero using

$$L(i) = \begin{cases} L(i), & \text{if } \left| \mu\_{[a,i]} - \mu\_{[i,b]} \right| > \text{thres}\_i \\ 0, & \text{otherwise} \end{cases} \tag{2}$$

*thresi* is defined as

$$thres\_i = tlres\_{min} + m \cdot \mu\_{[a,i]},\tag{3}$$

with *thresmin* being the minimum power change of interest and *m* a linear coefficient.

This coefficient causes a linear increase of *thresi* with the current power drawn (power of the pre-event window). Typically, the variance in the power signal is proportional to the amount of power drawn. This effect is caused by increasing noise in the appliance or the analog frontend of the electricity meter. If a fixed small threshold *thresi* is set, a large number of false events may occur at regions where more power is drawn. If a fixed high threshold is set, low power events may be missed. Pereira et al. used a relative large threshold of 30 W [28] which does not allow for detecting state changes of low power devices such as battery chargers or lights. We use the linearly increasing threshold as it adapts to possible larger fluctuations, preventing false events and missed events.

If an event is detected at sample *i*, the likelihood will also be non-zero around that sample as a mean change is still observable in close proximity to the event depending on pre-event and post-event window sizes. The exact sample at which the event occurred is identified using a *voting window*. This window is applied to the signal *L*. Inside the window, only the maximum of the absolute value of *L* is kept. We further restrict the minimum distance between two events with an additional parameter *l*.

This algorithm has six adjustable parameters: the duration of pre-event, post-event, and voting window, the minimum detection threshold *thresmin*, the linear coefficient *m*, and the minimum distance between two events *l*. A user should specifically adjust the parameters *thresmin* and *l* according to prior knowledge of the data: a low threshold *thresmin* is required if events with small mean changes are expected, and a short *l* should be chosen if events can happen close in time. Values that seem to work quite well across different devices are: pre-event window = 1 s, post-event window = 1.5 s, voting window = 2 s, *thresmin* = 3 W, *m* = 0.005 and *l* = 1 s.

#### 2.4.2. Unique Event Identification

To further simplify the labeling effort, we try to identify similar events of the appliance to label them accordingly. We therefore utilize the fact that most of our home appliances draw different but constant power before and after an event (e.g., the kettle after switched on) which represent constant states of the home appliance (e.g., *off* and *on* for the kettle). Depending on its complexity, an appliance can easily have more than ten unique states (e.g., a dishwasher).

The data is split at each event and the mean power demand between these splits is calculated. Unique mean values (representing unique appliance states) are then identified using hierarchical clustering with a distance threshold determined by *thresmin*. Each cluster is given a textual ID which is used to assign labels to each event (*S*0, *S*1, ..., as shown in Figure 11). As some appliances show a higher rush-in power followed by a power settling due to moving parts in the appliance, we remove the 10 % of the highest and lowest values before calculating the mean value.

#### 2.4.3. High Variance Filtering

Appliances such as PCs or televisions draw variable power depending on the current context (i.e., calculations of the PC, content of a television). This causes a large number of false events using the LLR test. To filter these false events, we first identify regions in the signal that show such high variance and afterwards remove all events found in those regions. We therefore calculate the mean (*μ*(*i*)) and variance ( *σ*(*i*)) of a sliding window. If *σ*(*i*) is larger than *n* · *μ*(*i*), the window is marked. If the length of consecutively marked windows exceeds a certain length ( *w*), all events in these windows are removed. The parameters *n* and *w* can be adjusted. Values which show good results were found empirically as *w* = 4 s and *n* = 0.005.

By using the event extraction algorithm, an appliance power signal can be pre-labeled. Each found event is marked and a unique label is assigned. The extensive task of labeling can therewith be reduced to supervision and inspection: Remaining falsely classified events (FP) need to be removed, events not found (FN) need to be added, and each unique state label should be changed to a meaningful label representing the state of the appliance.
