3.2.4. Data Augmentation

A common strategy in deep learning to overcome few labeled data samples or underrepresented classes employs data augmentation. It describes the process of transforming existing measured data or creating synthetic new data in order to achieve DNNs that generalize better. Recent overviews for data augmentation in the domains of computer vision and time series can be found in [148–150].

In the reviewed DNN-NILM literature, we see different data augmentation variants: First, some approaches train in a synthetic scenario, see Section 3.2.1. Only synthetic data consisting of the summed up loads from appliance sub-meters are used to train and test the algorithms. Such publications are denoted with a '*dn*' in the column 'data augmentation' of Table 2. A second group of publications train on measured aggregate data but add synthetic data—also created by summing up sub-metered load curves—to increase the size of the training set. Four authors added individual activations from appliances to a measured aggregate [6,58,66,115]. Finally, some authors employed specialized strategies: The authors of [35] found that by adding varying offsets specifically to the on state of the fridge, they were able to greatly enhance the corresponding disaggregation performance. So-called 'background filtering' has been proposed by [69] to remove all windows in the aggregate load curve that contain the target appliance. Activations from the target appliance are then added randomly to the filtered aggregate to create synthetic data for training. The authors of [44] use data obtained from SMACH [151], a tool that generates synthetic data based on time of use surveys and real appliances signatures. They compare scenarios with different amounts synthetic data and find good generalization performance for models trained only on synthetic data. We are not aware of any study that compares different data augmentation strategies.
