**2. Fundamentals**

As we assume that the reader is knowledgeable about both NILM and Deep Neural Networks, the following sections only skim the corresponding subjects and the reader is referred to relevant literature.

#### *2.1. The Disaggregation Problem*

The aggregate active power *x<sup>a</sup> t* of a set of appliances measured at time *t* can be formally defined as:

$$\mathbf{x}\_t^a = \sum\_{m=1}^M \mathbf{y}\_t^m + \underbrace{\sum\_{k=1}^K w\_t^k + \epsilon\_t}\_{=c\_t} \tag{1}$$

where *y<sup>m</sup> t* are the contributions of individual appliances *m* that have been metered at the time of data acquisition, and *M* is their total number. The sum over *k* corresponds to the contribution of *K* further appliances *w<sup>k</sup> t* not sub-metered during the measurement campaign. *t* is a noise term originating from the measurement equipment. In the literature, the NILM problem is typically stated such that the noise term *et* includes the sum over non measured equipment. We explicitly separate the two contributions, as their nature is quite different. We can assume that the measurement noise *t* is well behaved, i.e., it follows approximately a standard distribution and is small compared to the actual signal. On the contrary, no such assumption can be made about the term ∑ *w<sup>k</sup> t* . The contribution from non sub-metered appliances *w<sup>k</sup> t* typically amounts to a major part of *x<sup>a</sup> t* and the power distribution is non-Gaussian. From the point of view of disaggregation, the sum over *m* denotes the appliances that are disaggregated, and the sum over *k* consists of all the remaining appliances in the aggregate signal. If only a single appliance *y<sup>m</sup> t* is disaggregated, then *M* = 1.

One goal of energy disaggregation is to determine the individual *y<sup>m</sup> t* only based on the measurement of the aggregate signal. If machine learning or in particular deep learning is used to solve the problem, this leads to a so-called regression problem. While many authors work with the active power component *x<sup>a</sup> t* only, other information from the aggregate signal such as, e.g., apparent power, reactive power, or the current can also be used to solve the disaggregation challenge. In the particular case of countries where the residential power supply is fed on three phases, features from the aggregate power can even be available on all of these three phases.

A second, slightly less challenging goal of energy disaggregation is to find the on/off state *s<sup>m</sup> t* of appliance *m* at time *t* from the aggregate signal. If machine learning is used, this leads to a so-called binary classification problem. In this problem formulation, only the state of the machine will be output. After recognizing the on and off states, the run-time of an appliance can be calculated. By multiplying the run-time by the average energy consumption of a machine, one can still obtain an energy estimation. Such an estimate will be more in line with use cases that require only the average consumption over a certain time period.

#### *2.2. Deep Neural Networks*

Deep neural networks are a vast subject, and the focus of this review is merely their application to NILM. In this text, we therefore refrain from giving an introduction on DNNs and refer the reader to the following books:


The references mentioned are the books we found useful in our work. The selection is of course a small subset of the many excellent resources available on the topic.
