3.2.2. Preprocessing

Before data can be used by the DNNs, the raw data are transformed. Below we discuss typical data transformation steps employed in the literature.

RESAMPLING, FORWARD-FILLING, AND CLIPPING: The sampling frequencies of the published datasets are given in Table 3. As datasets exhibit missing values due to measurement or transmission equipment failures and a jitter in the timestamps, resampling is used to obtain evenly sampled data. While the range of sampling frequencies in the reviewed literature extends from 1 3600 Hz [92,107] to 10 Hz [75], the large majority of the reviewed works employ either 160 Hz or values between 1 and 110 Hz. It is noteworthy that in two cases, data were upsampled to have a higher frequency than the original dataset [36,112]. Results on the influence of the sampling frequency on disaggregation results are presented in different studies [51,58,75,77]. Most of these studies find a marked dependence on

the device [51,58,75]. This can be attributed to certain devices exhibiting more frequent fluctuations that ge<sup>t</sup> lost at lower resolution. Ref. [75] analyzes sampling rates from 10 Hz down to 0.03 Hz for on/off classification and energy estimation for TV, washing machine, rice cooker. They find that to prevent performance loss for the classification and regression tasks, the sampling rates should be at least 1 Hz and 3 Hz, respectively. Ref. [58] compares results obtained with 10 s and 1 min sampling intervals. The authors find "that the performance for dishwashers remains comparable while the performance for washing machine and washer dryer deteriorates dramatically". The publication [51] focuses exclusively on the influence of the sampling rate on the performance. The authors conclude that data sampled at 1/30 Hz might be sufficient to run NILM at high accuracy. It is important to note, that [51] did, contrary to [58,75], fix the number of inputs to the DNNs instead of the temporal window. Consequently, the temporal window seen by the network differs in this study depending on the sampling rate. Finally, [77] investigates the influence of the sampling rate in case of appliance on-event detection.

Short spans of missing data attributed to WiFi connectivity problems are forwardfilled by many authors with the last available measurement. Typically, up to three minutes of missing data are filled in this manner [14]. In case of measurements exceeding the rating of the employed meter, values are clipped.

NORMALIZATION In the DNN-NILM literature, the input normalization for the DNNs comes in two main flavors:

$$\mathbf{x}\_{stdScal} \qquad = \frac{\mathbf{x} - \mathbf{x}}{\sigma(\mathbf{x})} \tag{2}$$

$$
\mathbf{x}\_{minmaxScal} = \frac{\mathbf{x} - \mathbf{x}\_{min}}{\mathbf{x}\_{max} - \mathbf{x}\_{min}} \tag{3}
$$

where *x* and *xScaled* are the input windows (see Section 3.3.1) before and after normalization. *x*¯ corresponds to a mean value over the input. Different strategies have been employed: Most approaches calculate the mean over the complete training set so that the training data are centered. Other strategies center the data per house (see, e.g., [75]) or per input window (see, e.g., [14,107]). *σ*(*x*) denotes the standard deviation, which is typically calculated on the complete training set. Alternatively, each input window was divided by the standard deviation from a random subset of the training data [14]. *xmax* and *xmin* correspond to maximal and minimal values. These values can be maximal or minimal values of the training dataset, parameters fixed by the authors [53], or quantile values [40]. In order to make the statistics of the data less sensitive to outliers, [44] transformed them with an *arcsinh* before normalizing. Some authors also normalized the target values for the training of the DNNs. While some publications mention that different normalization strategies were tried out, only two studies report on the influence of normalization strategies on training efficiency and testing performance: [34] finds that instance normalization [146] performs better than batch normalization [40,147] concludes that L2-normalization works best.
