In the actual arithmetic example, the deep learning framework of tensorflow 2.0 is used to construct a sample set consisting of a matrix heat map from the PLAID dataset as well as the REDD dataset in the way described in this paper. After the sample set is produced, 80% of each class is selected as the training set and 20% as the test set. The confusion matrix is used to evaluate the recognition accuracy of the method proposed in the article. Each column of the confusion matrix represents the predicted category, and the total number of each column represents the number of data predicted to be in that category. Each row represents the true attribution category of the data, and the total number of each row represents the number of data instances in that category. The value of each column represents the number of real data predicted as that category. From the confusion matrix, TP, FP, FN, and TN can be calculated, where TP means that the predicted result is true and the actual result is also true, FP means that the predicted result is true and the actual result is false, FN means that the predicted result is false and the actual result is true, and TN means that the predicted result is false and the actual result is also false. The precision rate (P), recall rate (R), summed average of precision and recall (F1), and accuracy rate (A) can be calculated from TP, FP, FN, and TN. P and R can measure the correctness of a positive sample, and the higher the index result, the better. The formulas for calculating each indicator are as follows:
4.1. REDD
The nine loads in the REDD dataset are represented using the numbers shown in
Table 1 below: 1-1, 1-2 denote different states of the same device and electric light 2 and electric light 3 denote two different light loads.
The load identification results for the REDD dataset are shown in
Figure 9 and
Figure 10: green represents the number and percentage of correctly identified samples and yellow represents the number and percentage of incorrectly identified samples.
From the results, it can be seen that the correct identification rate of each type of load is above 91.7%, among which the light 1 and light 2 reach 100%, with an average correct rate of 96.4%. Similarly, for the load socket, washer dryer, and electric furnace, the correct identification rate is more than 94.4%, and the average correct rate is 94.9%. The correct recognition rate of multi-state devices such as the fridge, light, microwave, and socket is above 92.3%, and the average correct rate reaches 96.5%.
Table 2 is evaluated by P, R, F1, and A. As can be seen from
Table 2, most of the P, R, F1, and A of various types of loads were kept at 0.85 and above, and light 3 reached 1, with the average values of 0.95, 0.96, 0.96, and 0.99, respectively.
4.2. PLAID
The 15 loads in PLAID were identified. The load representation is shown in
Table 3, and the multi-state loads are represented in the same way as in the previous example.
The results of load identification are shown in
Figure 11 and
Figure 12. The correct identification rate of each type of load is above 90.8%, and the correct identification rate of the hair curler, laptop and vacuum is 100%, and the average correct rate is 96.24%. Among these, although the kettle is still affected by the similarity load resulting in a slightly lower recognition correct rate than the other loads, the overall correct recognition rate reaches 92%, which is higher than the correct rate generated using the original V-I trajectory.
Meanwhile, the correct recognition rate of the soldering iron is affected by the number of samples and lower than the other loads. However, in terms of the number of incorrectly identified samples, the recognition of the soldering iron is not worse than that of other loads. The multi-state loads such as the microwave, soldering iron, and vacuum have a correct recognition rate of 86.7% or more for each state, with a 100% correct recognition rate for each type of state for vacuum, and an average correct rate of 97.79% for each type of state for multi-state loads. The results of the identification of the four similar loads described previously are shown in
Figure 11. As can be seen from
Figure 11, the average correct identification rate is 96%.
Table 4 evaluates by P, R, F1, and A. Most of the index values are kept at 0.82 and above, among which the hair curler, laptop, and vacuum reach 1.00, and the average values are 0.96, 0.96, 0.96, and 0.99, respectively.
4.3. Comparative Analysis of Results
Using the same dataset, sample sets made of different features are selected and compared using different recognition models to evaluate the effectiveness of different methods in terms of recognition correctness or F1 values.
As shown in
Table 5, the literature that identifies the REDD dataset quantifies the V-I trajectory and uses the SoftMax classifier for identification with an F1 value of 0.88 and quantifies only a single feature V-I trajectory. The similarity load will be difficult to identify with similar quantified values. CNN or SE-ResNet networks are used to identify the V-I trajectory binary map, the binary map after fusion of V-I trajectory and harmonics, and matrix heat map. The feature class, fusion method, and identification network are changed to form a control group, and the average F1 value of the proposed method in this paper is 0.99, which is a better result compared with other control groups.
As shown in
Table 6, in the literature on identifying PLAID datasets, Gao J et al. [
12] and De Baets Leen et al. [
16] used random forests for the identification of V-I302 trajectories; however, with a single feature, random forests have a long training time and high data requirements, otherwise they are prone to overfitting. Shouxiang Wang et al. proposed a feature fusion of V-I trajectories with power, using different networks for feature extraction and, after feature input to the network, the output vectors of the hidden layers of the two neural networks were combined together to form a composite vector feature, which was recognized using BP neural networks with a correct rate of 83.7% [
15]. Xiang Y et al. uses power and current magnitudes incorporated into V-I trajectories to form true color feature images for recognition by CNN, but the true color image of each type of load is a single color, while the color is determined by the power and current magnitudes, which can lead to the difficulty in distinguishing the true color image because the power or current values of individual loads are too large, making most of the loads concentrated in the same color interval, such as microwave ovens and washing machines. In this paper, the data intervals are evenly distributed in a 32 × 32 matrix, making the loads in the same range of value intervals show more significant color variability [
18]. Z Zheng et al. used amplitude and phase angle composition features and a multilayer perceptron for recognition [
14]. De Baets L et al. used CNN for recognition with a correct rate of 77.60% [
23]. Meanwhile, for matrix heat map recognition using CNN and SE-ResNet network respectively, the correct rate was 89.35% and 96.24%, which proves that the SE-ResNet network is better than CNN under the same conditions.
After the above comparison, it was found that the studies of Wang Shouxiang et al. [
15] and Xiang Y et al. [
18] were based on the same dataset, PLAID, and the overall recognition results were better, but some of the loads were not recognized with high accuracy, such as: air condition, fan, fridge, heater, and washing machine. As shown in
Figure 13, the accuracy of this paper’s method for identifying such loads is higher than the above literature. For the PLAID dataset, the current study generally selects 11 load types to be carried out, and this paper also identifies four other types of other loads: blender, coffee maker, hair curler, and soldering iron, and the identification accuracy is higher than 91%.
Instead of using a dedicated one-to-one model for each device, the method proposed in the article is implemented on the basis of a single multiclass classifier, where one model is used for multiple devices with higher computational efficiency. SE-ResNet adds residual connectivity and an SE block completed by squeeze and excitation operations to the CNN, and although it still needs to be retrained when new devices are added, the overall training time of the model is greatly reduced. The proposed method is trained on the REDD dataset with six houses and the PLAID dataset with 56 houses, which indicates that the method is scalable and generalizable, and the addition of more house data is to be continued to be tested on new datasets. Follow-up work should investigate how the proposed method in this paper can reduce the training volume and improve the computational efficiency when identifying new devices.