*5.3. Complexity Analysis*

The results for the complexity analysis between the baseline and the proposed CNN multi-label learning are presented in Figure 12. As expected, since the proposed method is an eager learner (i.e., a model is created in the training phase), it takes significantly longer to train than the MLKNN baseline (Figure 12a). In contrast, the proposed method has a much shorter inference time since the model was already created in the training phase. Furthermore, from Figure 12b it can be observed that the proposed method achieves better performance even with less training data, which is positive if one considers that labeled data is scarce and often hard to acquire.

**Figure 12.** (**a**) Distributions of type errors the model makes. (**b**) Number of correct predictions for single, double and triple activations.

#### *5.4. Comparison with State-of-the-Art Methods*

Table 1 provides an overview of the results obtained in other related works. As it can be observed, there are many differences that make a fair and objective comparison impossible to achieve. For instance, while our approach uses current waveforms extracted from high-frequency power measurements, the results presented in [26] were obtained on low-frequency data, and on a different dataset. Yet, they also used the MLkNN multi-label classifier, achieving considerably lower results. Moreover, our results cannot be directly compared with the ones presented in [49], as these were obtained from a private dataset, besides the very different experimental settings including a different performance metric. In [23,37], the *F*1 macro score for TCNN and FCNN DNN based multi-label classifiers are given; however, they use UK-DALE dataset making the comparison irrelevant.

An almost direct comparison is only possible between our method and the results from [19] who have used the same dataset and performance metric. Still, it should be stressed that the performance evaluation method was different since their work targets single-label classification. Yet, the results obtained with our approach are superior by six percentage points.

In short, for a fair comparison, we would have to re-implement all these approaches, which unfortunately is not always possible. Nevertheless, to make this task easier for other authors, we open-sourced the code necessary to replicate our experiments.



#### **6. Conclusions and Future Work Directions**

In this work, we have approached appliance recognition in NILM as a multi-label learning problem which links multiple appliances to an observed aggregate current signal. We first show that features derived from activation current alone could be useful in recognizing devices from total measurements. We later apply Fryze's power theory, which decomposes the current waveform into active and non-active components. The decomposed current signal was then transformed into an image-like representation using the Euclidean-distance-similarity function and fed into the CNN multi-label classifier. Experimental evaluation on the PLAID aggregated dataset shows that the proposed approach is very successful at recognizing multiple appliances from aggregated measurements with an overall 0.94 F-score.

We further show the effectiveness of the proposed CNN multi-label learning in recognizing a single running appliance with over 98% accuracy. We will investigate the use of Fryze's current decomposition and distance similarity matrix for single-label appliance recognition in future iterations of this work. Finally, we presented a detailed error analysis and identified three types of errors: *zero-error*, *one-to-one*, and *many-to-many* errors.

At this point, we acknowledge that the performance of the proposed approach is not ye<sup>t</sup> satisfactory in detecting triple running appliances. A possible explanation for this issue is the small number of training samples with more than two running appliances. In the future, we would like to test our approaches against datasets with more training data. However, this may imply the development of such a dataset since the currently available ones are still scarce concerning high-frequency measurements [52,53].

Finally, it should be mentioned that the proposed method assumes that the appliance state transition (power event) is known in advance. However, in practice, this information has to be provided by an event detection algorithm (e.g., [54–56]). Therefore, future work should investigate how to integrate the proposed approach in the event-based NILM pipeline. Specifically, we plan to explore the use of the proposed Fryze current decomposition for event detection in multi-label appliance recognition.

**Author Contributions:** Conceptualization, A.F; data curation, A.F.; formal analysis, A.F. and L.P.; methodology, A.F. and L.P.; resources, A.F.; software, A.F.; supervision, L.P.; validation, L.P.; writing—original draft, A.F.; writing—review and editing, A.F. and L.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** Lucas Pereira has received funding from the Portuguese Foundation for Science and Technology (FCT) under grants CEECIND/01179/2017 and UIDB/50009/2020.

**Acknowledgments:** The authors thank Christoph Klemenjak and Shridhar Kulkarni for providing insightful comments and advice towards the completion of this work.

**Conflicts of Interest:** The authors declare no conflict of interest.
