*4.5. Applied DNN-NILM*

The best results of current DNN-NILM approaches are very promising, see Section 4.1. However, there are different aspects of relevance for an actual deployment of DNN-NILM approaches that have not ye<sup>t</sup> been well investigated in the literature. In the following subsections, we motivate corresponding aspects and subsequently point out connected research gaps.

## 4.5.1. Data Scarcity

For a practical application of NILM, we see one of the main challenges in the scarcity of labeled data. While this challenge is not specific to DNN approaches, we think that recent developments in semi-supervised deep learning might be adaptable to NILM and could then be a grea<sup>t</sup> opportunity to tackle the problem of data scarcity for practical applications. In the following, we will first detail the challenge and subsequently formulate possible future research directions.

Net2Grid is a company providing NILM services to utilities. In a presentation, they stressed the point that "accurate NILM requires [...] a lot of high-quality data" [174]. Specifically, the company bases their NILM service on data from *hundreds* of houses. They also emphasized the point that machines with different programs or settings exhibit very variable load patterns and therefore need many observed cycles. The authors of [75] investigate how the disaggregation error of a DNN-NILM approach depends on the number of distinct houses used for its training. In agreemen<sup>t</sup> with Net2Grid, they find that for the washing machine, an appliance with a variety of programs, the disaggregation error steadily decreases with each house added to the training dataset without any sign of saturation until 40 houses, which was the maximum used for the investigation. Thus, both sources indicate that complex machines require a large variability in the training data to successfully generalize on unseen data. This observation is at the core of what we call 'data scarcity'.

These findings indicate that a company that wants to start a NILM service first has to obtain data from hundreds of houses. A possible source are public NILM datasets including aggregate and appliance consumption, see, e.g., [18,127] for an overview. While for some simple appliances these public datasets are certainly sufficient, the data are too restricted if we want to disaggregate appliances with variable load patterns [75,174]. Furthermore, appliances such as heat pumps or charging stations for electric vehicles are almost absent from public datasets. The only alternative is to engage in a measurement campaign involving *hundreds* of houses. This is an expensive and time consuming undertaking, as the metering, appliance-specific sub-metering, and the corresponding infrastructure has to be installed and maintained. It is also worth noting that, even if large datasets covering a large variety of appliances would be recorded, new devices come to the market continuously. This means that the effort for data collection is actually a recurrent one.

As DNNs are particularly data hungry, the problem of a shortage of labeled data has recently obtained a lot of attention by the research community in the computer vision and natural language processing (NLP) domain. A promising remedy to the situation is semisupervised deep learning [175]. We see a lot of potential transferring these developments to the NILM domain as unlabeled data—the data obtained from the smart meter—are relatively easy to access compared to sub-metered ground truth data.

In the reviewed DNN-NILM literature, semi-supervised deep learning techniques have so far been employed by the following authors: Ref. [103] used the ladder network [176]. The presented results give no clear indication that unsupervised data actually improved the performance. These results might be caused by the relatively simple DNN. The authors of [66] trained an autoencoder on unlabeled data to subsequently use the learned embedding in a supervised training setting. This work was done on aggregate data with 15 min resolution that naturally led to large estimation errors. Ref. [81] derived their classification approach from the mean teacher approach [177], while ref. [67] adopted virtual adversarial training [178]. Both works present evidence that the disaggregation performance of the semi-supervised approaches improves compared to a strictly supervised settings. However, experiments were only conducted on data from houses already seen during training, and no conclusion can be drawn about improved generalization on previously unseen houses.

There is an increasing field of newer semi-supervised DNN approaches from the vision domain ready to be adapted to the NILM problem [175]. A particularly successful [179] semi-supervised strain of research is called *consistency learning* [175]. The method's main assumption is that a small perturbation or realistic transformation applied to a data point should not have an influence on the prediction. DNNs are then trained to provide a consistent output for an unlabeled data point and its perturbed version. Most recent publications demonstrate that for image classification it is feasible to ge<sup>t</sup> close to the performance of a supervised approach with one order of magnitude fewer labeled samples [180,181]. While some of the consistency learning approaches seem to be well adaptable to NILM, there remain many open questions—to name a few: What type of consistency loss should be used in case of NILM? What types of data augmentation strategies should be employed? The last question is of particular interest because [180] demonstrated that the 'quality' of data transformations is the key for significant performance gains. While data augmentation was used in various DNN-NILM approaches, see Table 2, we are not aware of any work that did a detailed investigation of this aspect.

In summary, we see in the application of semi-supervised DNN approaches many worthwhile research questions and a grea<sup>t</sup> opportunity to tackle the problem of data scarcity for practical NILM applications.

#### 4.5.2. NILM on Embedded Systems

If one imagines a NILM deployment at scale, the amount of data to transfer, store, and process becomes an important factor. In the reviewed DNN-NILM literature, different aspects of such a deployment have been addressed. The work of [93] investigated data reduction policies: Different sampling strategies for data compression ( 14 to 120 compression) in combination with DNN inference are tested. The authors found that the best sub-sampling policies outperform results with original sampling rates. Another option is to process the data directly in or close to the electric meter and only relay disaggregated high level information. That means that the DNN-NILM inference has to work on an embedded system, even though that can be quite challenging in terms of computational, storage, and energy resources. This direction has been investigated by [47–49,65,106]: Ref. [106] is to our knowledge the first to publish the implementation of DNN-NILM inference on an embedded device. Both [106] and later [65] used for that purpose a Raspberry Pi computer. Ref. [47] uses an efficient MobileNet [182,183] inspired DNN for disaggregation and compresses it by lowering the precision from32 to bit floating point (used for training) to 8-bit integer representation by means of the TensorFlow Lite library. The resulting model was then evaluated with the Android SDK. The authors report that "disaggregation accuracy deviates up to ≈9.4% from original disaggregation model, but, on average, remains satisfactory". Both refs. [48] and [49] investigate different pruning methods based on the network from [119]. Pruning methods aim at reducing neurons in the network that contribute little to the final output. The final goal is to result with sparse networks that have lower storage and computational requirements, but similar performance compared to the original networks. Both publications found that networks can be heavily pruned with only a slight decrease in performance: Ref. [48] reports a reduction of the number of network weights by 87% and [49] reports a 100-fold reduction in model size and a 25-fold reduction in inference times. Ref. [49] additionally investigates multi-task learning and vector decomposition as further paths towards efficient computations in embedded systems.

While the DNN-NILM community has taken first steps towards an implementation on embedded devices, the corresponding research field for DNNs in the vision and speech domain is vast, see, e.g., [184]. There remain therefore a multitude of research questions in this direction. From our perspective, an interesting question would be to see how best performing approaches (see Section 4.1) could be adapted to embedded devices because the architectures of these approaches are more elaborate than the ones used by [47–49,106].

#### 4.5.3. 3-Phase Data

In some European countries, such as, e.g., Switzerland, residential power supply arrives in three phases at the master distribution board (breaker panel) and is then split into single phases. As a consequence, measurements from the electrical metering infrastructure are in principle also available on three phases. With respect to a practical NILM application, this additional information makes the problem at first glance easier to solve, as there are

on average one third as many devices connected to each phase compared to households attached to a single phase. However, the challenge comes in the form of multi-phase appliances such as heat pumps, pool pumps, electrical heat storage radiators, or charging stations for electrical vehicles. These appliances require NILM algorithm to combine information from all three phases. When considering an approach that should perform on any households, the main challenge is that multi-phase devices can be connected in arbitrary permutations. Thus, the result of the DNN-NILM approach needs to be invariant to these permutations.

We are not aware of any DNN-NILM publication that works on 3-phase data and tackles the raised challenge (This might partially be because there are currently only few datasets with 3-phase information. We are aware of iAWE [185], ECO [140], and BLOND [186]). The desired permutation invariance is analog to the required rotational invariance in computer vision: An object needs to be recognized as such independently of its orientation in the image. This analogy points also to possible future research questions: Could permutation invariance be obtained by training a DNN with augmented data? Could the symmetry be directly anchored in the layer of the neural network via Group Equivariant Convolutions, see, e.g., [187,188]? These are convolutional layers specially designed to produce the same result for data subject to a group of symmetry operations. How do these two solution approaches compare to each other with respect to performance and complexity?
