*5.2. Error Analysis*

We also analyze the miss-classification errors for the proposed CNN model. To this end, we identified three types of errors, namely *zero-error*, *one-to-one*, and *many-to-many errors*.

The *zero-type* mistake happens when a model predicts no appliance is running while there is at least one active appliance. It can be observed from Figure 11 that the number of zero-type mistakes is very low for the three feature types with decomposed-current distance making no such type of error.

**Figure 11.** (**a**) Distributions of type errors the model makes. (**b**) Number of correct predictions for single, double and triple activations.

On the other hand, the *one-to-one* is the type of error that the model makes when there is only one active appliance running. We see from Figure 11 that the V-I binary image makes 45 one-type errors while the current based features reduce this to seven for the decomposed current, and six for the decomposed current distance feature. The low error rate when one appliance is running can be attributed to the high number of single activations, over 50%, as presented in Figure 7a. It further shows

the effectiveness of the proposed CNN multi-label learning in recognizing individually operating appliances, with over 98% accuracy, as shown in Figure 11b.

The *many-to-many errors* are confusions that a model makes when several appliances are active. Since the PLAID dataset used in our experiment consists of up to three simultaneous active appliances, we further categorized *many-to-many errors* into *single*, *double*, or *complete-error*. A *single* error occurs when a model confuses only one appliance when two or three appliances are active, whereas in *double* fault, the model confuses two appliances when three appliances are active. The *complete-error* is the case when the model produces incorrect predictions for all the active appliances. It can be inferred from Figure 11 that the proposed CNN multi-label model makes a higher number of *double* errors for the three input feature types used. This is likely to be caused by the fewer numbers of samples with more than two appliances running simultaneously at about 5.8%, as depicted in Figure 7a.
