*3.2. Data Processing*

Raw datasets can be employed differently for training and evaluating DNN-NILM approaches. Below, we review different aspects.

#### 3.2.1. Training and Evaluation Scenarios

Training and evaluation of NILM algorithms can be done under different scenarios. Typical scenarios appearing in the literature are defined in the following.

OBSERVED VS. SYNTHETIC: In a *synthetic* scenario, the term ∑ *w<sup>k</sup> t* in Equation (1) is set to zero.

Corresponding data are typically created by summing up the power consumption from individual appliances. Only the measurement noise *t* is therefore included in the noise term *et*. In an *observed* scenario, the noise term *et* also includes further appliances that have not been measured individually, i.e., ∑ *w<sup>k</sup> t* = 0. The *synthetic* scenario can be

considered a laboratory setting for a basic assessment of algorithms. Data in a real scenario will typically be *observed*.

We use here the terms *observed* and *synthetic* scenario equivalently to *noised* and *denoised* scenario, as these scenarios are commonly referred to in the literature. We introduce this new nomenclature because we believe that (i) the original terms are rather misleading for readers with less experience in the field, and (ii) the proposed terms express the essential difference between the two scenarios much more precisely.

SEEN VS. UNSEEN VS. CROSS-DOMAIN TRANSFER: The terms seen and unseen are used in the context of the evaluation of NILM algorithms. In the *seen* case, an algorithm is evaluated on new data from households that it has already been trained on. The resulting score gives, therefore, an indication on how well the trained algorithm can detect a particular appliance. *Unseen* means, that the algorithm is evaluated on data from a new household that was not available in the training data. This scenario tests the capability of algorithms to detect an appliance type [145]. Corresponding test results indicate the performance of a pre-trained model that is deployed on data from houses previously not seen during training. For the *cross-domain transfer learning* [91] scenario, the unseen house is taken from a different dataset. This scenario tests the transferability of the tested approach to an even more diverse setting as in the unseen case: Data could have been metered by different electrical meters or could originate from a different country. To our best knowledge, this scenario has only been investigated in [39,42,63,71,91]. The different scenarios are illustrated in Figure 2. The column 'Evaluation Scenario' in Table 2 lists the scenarios employed for the reviewed references.

**Figure 2.** Different NILM evaluation scenarios: *seen*: the algorithm is evaluated on new data from a house that was already available during training; *unseen*: the algorithm is evaluated on data from a house not seen during training; *cross-domain transfer learning*: the algorithm is evaluated on data from a different dataset.
