1. Introduction
Balancing electricity consumption with sustainability is one of the central issues of modern society. Although all participants in the electricity market seek to increase their earnings, as pointed out by Mansouri et al. [
1], current solutions in the electricity market have been incorporating more sustainable options such as renewable sources, as discussed in Nie et al. [
2] and Mansouri et al. [
3]; additionally, the current market is considering the consumer as a prosumer, as indicated by Zhang et al. [
4] and Zhou et al. [
5], capable of using and supplying energy to the grid, which ensures the prominence of the residential sector. As Cary and Benton [
6] pointed out, the residential sector had an energy-saving capacity of more than 66 TWh. In recent reports, as demonstrated by Bang et al. [
7] and Rashid et al. [
8], this sector accounts for approximately 30% of electricity waste in various manners, such as device inefficiency and unsuitable consumption. One of the promising solutions for efficient consumption comes from Smart Homes (SHs) employing Home Energy Management Systems (HEMSs).
A HEMS architecture can manage energy usage in residential environments. For this task, the HEMS collects various data with information from the appliances and decides on the management based on this information entanglement. Generally, a HEMS consists of a controller and smart outlets, as illustrated in
Figure 1. There is a lot of information that HEMS can monitor about the household appliances’ activity, such as frequency, temperature, active power, and several other types of data. However, in recent discussions, as manifested by Mahapatra and Nayya [
9] and Motta et al. [
10], a modern architecture ensures added functions for HEMSs, such as load forecasting as in Jo et al. [
11], load disaggregation as demonstrated in Lemes et al. [
12], appliance anomaly detection as in Tsai et al. [
13] and Lee et al. [
14], and load recognition as documented in Cabral et al. [
15].
The motivation for this study stems from several relevant issues and the broader context of energy management in residential settings. One of the primary challenges faced in the field is the significant amount of electricity waste due to inefficient appliances and unsuitable consumption habits, which accounts for approximately 30% of total residential electricity usage [
7,
8]. Addressing this waste is critical for achieving sustainability goals and reducing energy consumption. Another challenge is the complexity of accurately identifying the various devices operating simultaneously within a household. Traditional methods often fall short in accuracy and reliability, especially in environments where multiple appliances are active concurrently. The implementation of advanced techniques, as proposed in this study, becomes vital in addressing these challenges. The existing literature explores a variety of solutions for load recognition. Nevertheless, gaps remain. For instance, while techniques like Principal Component Analysis (PCA) can reduce the volume of data for information processing, they may always not preserve the most informative patterns necessary for effective decision-making. The dynamic nature of household energy consumption patterns adds complexity to the load-recognition process, requiring robust and adaptive models, and many current approaches lack the robustness for reliable appliance identification. Addressing these gaps can have significant potential implications, such as efficient working appliance management, reducing energy waste, and improving the overall energy efficiency of residences. The potential benefits extend beyond individual households, contributing to broader societal goals of sustainability and environmental conservation.
However, what is load recognition? As per Faustine and Pereira [
16], load recognition is the identifying process of which device is in operation. Why is load recognition relevant? Load recognition plays a vital function in load disaggregation strategies, specifically for appliance identification in the post-disaggregation stage. On the other hand, in the domestic environment, where multiple appliances operate concurrently—such as air conditioners, freezers, heaters, and other devices—it is crucial for HEMSs to discern, in an accurate way, which devices are working, especially when replacing devices connected to smart outlets. Thus, in real-life scenarios, load recognition allows HEMSs to determine automatically the new appliance operating. Likewise, another practical relevance is evident in the automatic building of databases. For dataset creation, through electrical signals analysis, load recognition improves the robustness of database production.
There are several manners to perform load recognition. The most advanced methods use machine learning (ML) techniques. Generally, these approaches incorporate robust strategies for feature handling and employ ML models that demonstrate stability and reliability in decision-making, even under challenging conditions. Currently, works employ diverse approaches to process the features. In Borin et al. [
17], the study used the Stockwell transform for feature extraction. Qaisar and Alshari [
18] and Soe and Belleudy [
19] chose to employ electrical operating patterns from household appliances. In Baets et al. [
20], the authors utilized Voltage–Current (VI) trajectories, such as images, to analyze the device patterns. In Zhiren et al. [
21] and in Cabral et al. [
15], the researchers used the widely known PCA to extract the features. Following this trend at the decision-making stage, more modern studies also explore a variety of techniques. Qaisar and Alshari [
18] employ the models Support Vector Machine (SVM) and
k-Nearest Neighbors (
k-NN) for appliance identification. On the other hand, besides
k-NN and SVM, Soe and Belleudy [
19] use Classification and Regression Trees (CART), Linear Discriminant Analysis (LDA), Logistic Regression (LR), and Naive Bayes (NB). In Huang et al. [
22], the authors apply Long Short-Time Memory Back-Propagation (LSTM-BP) in the classification phase. Furthermore, as presented by Cabral et al. [
15], it is possible to employ ensemble methods based on decision trees (DTs), like Random Forest (RF), for load recognition. Although the methods propose diverse ways of processing the features and identifying the appliances, some gaps in the literature remain untapped.
Gaps in approaches to load recognition still exist, such as enhancing appliance identification performance by improving inter-class separability and boosting the overall performance of the load-recognition system through more reliable models. The present work addresses both of the previously mentioned gaps through what we call the ANOVA–GBM approach. Unlike our principal competitor, the work presented by Cabral et al. [
15], which uses PCA to process the features aiming to improve inter-class separability, we used the Analysis of Variance (ANOVA)
F-test with SelectKBest to enhance the inter-class separability. PCA is a feature extraction technique that projects the characteristics of the data into another feature space that may have reduced dimensionality. However, this projection is not always enough to preserve the most informative patterns that feed the decision-making model. To address this issue, we propose a feature selection technique in which the most informative patterns are chosen by applying the ANOVA
F-test with SelectKBest, thus avoiding the necessity of a forced projection of the data. We also introduce gradient-boosting machine (GBM) architectures in load recognition to ensure higher reliability for appliance identification at the decision-making stage. GBM approaches are ensemble architectures that combine multiple models to produce a more robust final model, where the intention is to correct the errors made by the prior model or set of previous models. In addition to our propositions to deal with the gaps, the proposed approach includes other strategies that make it a robust system for load recognition, such as data preprocessing to determine the ON/OFF appliance state, a procedure to determine the optimal number of features via Cumulative Explained Variance (CEV), and grid search (GS) with K-fold cross-validation (K-CV) to optimize the chosen GBM, contributing to the model generalization capacity. The results of our approach, based on ANOVA–GBM, show the highest accuracy values, weighted average F
1-Score, and Kappa index in comparison with the competitors’ strategies from the literature. It is relevant to mention that our solution is part of an ongoing research project called Open Middleware and Energy Management System for the Home of the Future. This initiative is a collaboration between the University of Campinas, the Brazilian energy company Copel Distribuição S.A. (Curitiba, Brazil), and the Eldorado Research Institute.
Principal Contributions
The principal contributions of our work consist of the following:
Novel approach to load recognition: our study proposes a pioneering approach to load-recognition systems based on the ANOVA F-test with SelectKBest and GBMs. This research is the first to use the ANOVA F-test with SelectKBest to recognize loads in HEMSs, which improves the feature selection and, consequently, aids inter-class separability. This characteristic improves system performance, as enhanced separability ensures that GBMs can more efficiently differentiate the classes. Furthermore, this work is the first to apply GBMs such as the histogram-based gradient-boosting machine (HistGBM), light gradient-boosting machine (LightGBM), and XGBoost (extreme gradient boosting) for load recognition in HEMS applications. Employing robust models like GBMs results in higher reliability for the load-recognition system. Due to this original proposal, this paper also presents a pioneering analysis of the ANOVA–HistGBM, ANOVA–LightGBM, and ANOVA–XGBoost combinations for the task of load recognition;
Practical implications: the ANOVA–GBM approach achieves greater efficiency in training time, even when compared to PCA for a higher number of features. It should be noted that ANOVA–XGBoost is approximately 4.31 times faster than PCA–XGBoost, ANOVA–LightGBM is about 5.15 times faster than PCA–LightGBM, and ANOVA–HistGBM is 2.27 times faster than PCA–HistGBM. In addition, the results show that the ANOVA–GBM approach achieves the highest values for accuracy, weighted average F1-Score, and Kappa index—96.75%, 96.64%, and 0.9452, respectively—compared to competing strategies in the literature. These practical implications are driven by the enhanced feature selection capability and the use of more robust and reliable models, leading to significant improvements in the performance of the load-recognition system and demonstrating the effectiveness and refinement of the proposed approach;
Advances in the load-recognition field: in addition to significantly enhancing the performance and efficiency of load-recognition systems, our study contributes to fundamental elements present in load-recognition systems, such as data preprocessing, feature handling, machine learning architectures, optimization methodologies, level of intrusibility, and reliability. Additionally, this study exploits remaining gaps in load-recognition approaches, such as improving appliance identification performance by enhancing feature selection and boosting the overall performance of the load-recognition system through more reliable models. Notably, the ANOVA F-test with SelectKBest and GBM models establishes a new standard for feature selection and ML architectures in load-recognition systems. This advancement fosters the development of more robust, accurate, and reliable systems, positively impacting academic research and practical applications in the home energy management sector;
Bibliographic survey of contemporary load-recognition systems: we offer a bibliographic survey of contemporary load-recognition systems, concentrating on key aspects such as data preprocessing, feature processing, machine learning architectures, optimization techniques, degree of intrusiveness, and reliability. This review provides insights into the latest advancements in load-recognition technology, addressing crucial components that determine system performance and usability.
The structure of the remaining sections is outlined as follows:
Section 2 presents detailed background to contextualize this study.
Section 3 offers a meticulous description of the proposed system, detailing its processing flow. This section displays the approach to feature selection, criteria for feature relevance, and the optimization of machine learning models.
Section 4 introduces the metrics employed in this study, alongside their rationale, and examines the results obtained with the proposed approach. Additionally, this section analyzes the findings and offers new insights.
Section 5 presents the manuscript conclusion, evaluating the implications of the proposed strategy, highlighting key findings, and identifying promising aspects of the proposed system.
3. Proposed System: ANOVA–GBM Approach for Load Recognition
Figure 3 illustrates all the processing chains comprising the designed system. Several processing chains comprise it, beginning with the collected active power from appliances. For data collection, the system uses the Reference Energy Disaggregation Dataset (REDD) from Kolter and Johnson [
35], which provides comprehensive power usage data from various household appliances over eight days, registered at a frequency of 1/3 Hz. The REDD dataset provides data collection from Household 1 with a wide variety of appliances, which includes the following devices: oven, refrigerator, dishwasher, kitchen oven, lighting, washer, dryer, microwave, bathroom Ground Fault Interrupters (GFIs) outlet, heat pump, stove, and unknown devices. From the appliances in Household 1, our system generated 4609 images with a resolution of
pixels (1024 features in total), a sufficient quantity to assess the robustness of the proposed approach. This was followed by feature selection, determination of the optimal number of features, and selection based on that number. Subsequently, the system performed an optimization of the GBM models, culminating in the final output, which identifies the type of appliance.
According to
Figure 3A, the system incorporates a preprocessing stage responsible for detecting the ON/OFF states and generating images from active power. For this task, our system employs the Discrete Wavelet Transform (DWT) in the same manner as Lemes et al. [
36]. This preprocessing involves the application of the DWT to the active power data to identify the operational states of the appliances. Here, the system uses the level-1 detail coefficients obtained with the Daubechies 4 mother wavelet applied to the active power of the household appliances in the HEMS. The coefficients extracted by the DWT allow for the observation of transition instants between OFF–ON and ON–OFF states through the higher magnitude peaks, where these peaks indicate the beginning and end of the appliance cycles. Afterward, the system converts the identified activity segments into images, following the method outlined by Cabral et al. [
15]. Each resulting image captures a cycle of the appliance activity. The system translates the electrical activity curve of the appliance into black pixels on a white background, i.e., the electrical activity curve is converted into black pixels on a white background, creating a visual representation of the appliance operating. The resolution of these images is adjustable, and, for our experiments, the system used a resolution of
pixels, resulting in 1024 features per image. Next, the method produces a set of
m images with
k pixels, arranging this set into a matrix
with dimensions
. According to the proposed Algorithm 1, these data,
, are then divided into training and testing sets with a ratio of 80% for training and 20% for testing, following the partition suggested by Géron [
37]. The training set is used for hyperparameter tuning and model training, while the testing set is reserved for evaluating the final model performance.
Figure 3 summarizes this processing flow, where the load-recognition system solely utilizes the active power gathered from appliances, depicted by the light blue color (Input) in
Figure 3A. Subsequently, the data preprocessing is depicted in light gray in
Figure 3A.
Figure 3B shows the initial feature selection stage in yellow, where we apply the ANOVA
F-test with SelectKBest. Algorithm 1 begins by dividing the generated data
into training
and testing
sets, adhering to a specified proportion. This step is vital in many machine learning processes to ensure the testing of the trained model and accurate validation of its predictions on previously unseen data. After dividing the dataset, the algorithm applies the ANOVA
F-test with SelectKBest on the training set
using an initial number of components (
). As there are no restrictions for the
value because
acts as an initial assumption in Algorithm 1, the initial number of features
= 1024, i.e., the maximum number of features. The ANOVA
F-test with SelectKBest helps in selecting more informative features. In this step, the selected data then serve as the foundation for determining the optimal number of features in the next stage.
Algorithm 1 Approach for load recognition based on the Analysis of Variance F-test with SelectKBest and the model optimization of the gradient-boosting machines |
- Input:
Generated dataset (), proportion of training data (p.train), proportion of test data (p.test), initial number of components (), threshold (), number of folds (K), set of J candidates for the values of the maximum depth hyperparameter of the chosen model/GBM: = . - Output:
Type of load in operation
- 1:
First step: Divide the database between training set and the test set.
- 2:
Second step: Employ the ANOVA with SelectKBest using and initial number of features. In the sequel, obtain the selected data .
- 3:
Third step: Compute the covariance matrix from . We calculate the covariance matrix based on Lemes et al. [ 12].
- 4:
Fourth step: Obtain the eigenvalues via , in wich is the eigenvector matrix and is the diagonal matrix, i.e., .
- 5:
Fifth step: Sort the eigenvalues in descending order:
- 6:
Sixth step: Discover the optimal number of features (k) through CEV: Generate the variable k and set its value to zero Compute CEVr= if CEVr≥ k ← number of r-th feature end if
- 7:
Seventh step: Employ the ANOVA with SelectKBest, according to k features, to obtain the new selected data for the training set, i.e., the .
- 8:
Eighth step: Employ the possible values for the hyperparameters for each k, i.e.,
- 9:
Ninth step: Apply GS with K-CV Divide in K folds Train the model on each K fold Calculate accuracy Measure average accuracy Assign the average accuracy to the current possible values for the hyperparameters Adopt the hyperparameter with the highest average accuracy achieved:
- 10:
Tenth step: Train the chosen GBM with
- 11:
Eleventh step: Testing the optimized model: GBM using return Type of load in operation
|
Subsequently, as highlighted in orange in
Figure 3C, according to Algorithm 1, the system computes the covariance matrix, the
, from the data selected by the ANOVA
F-test with SelectKBest, i.e., from
. After obtaining the eigenvalues from the covariance matrix, the algorithm sorts them in descending order to determine the optimal number of features
k using Cumulative Explained Variance (CEV). This procedure identifies the minimum number of features that retain most of the original data variability, ensuring a balance between dimensionality reduction and information preservation.
Upon determining the optimal number of features, the algorithm re-applies the ANOVA
F-test with SelectKBest using only this number of features, with reduced dimensionality, both for the training and testing sets. This ensures that the dataset is now reduced to the most informative features, simplifying the model without significant loss of information. This stage is depicted in
Figure 3D, represented in yellow, and Algorithm 1 presents this procedure through the seventh step/method.
In the final stage, Algorithm 1 applies cross-validation using grid search with K-fold to optimize the GBM’s hyperparameters, ensuring that the model achieves maximum robustness. In line with Kuhn et al. [
38], we used a 10-fold cross-validation, which means splitting the dataset into ten parts, and the model is trained and validated ten times, each time using a different part as the validation set and the remaining parts as the training set. This processing chain is depicted in
Figure 3E, highlighted in light green. The proposed approach employs the grid search to exhaustively search for the best hyperparameters by analyzing different hyperparameter combinations, ensuring optimal performance. The GBM model is then trained with the optimized hyperparameters and tested using the selected test dataset, i.e., the test set. It is worth noting that, in this manuscript, we evaluated three GBM architectures: the XGBoost, the LightGBM, and the HistGBM. Finally, the algorithm outputs the operational load type, depicted in
Figure 3 in red (Output), which is the primary objective of the modeling. This meticulous sequential procedure ensures that the final model is well tuned and achieves high levels of robustness and reliability.
4. Results and Discussions
Our work does not merely propose an innovative approach but also commits to evaluating its robustness and reliability. Consequently, it is vital to employ multiple metrics in performance evaluation. This manuscript uses three distinct metrics: accuracy, weighted average F
1-Score (F
1), and the Kappa index. All these metrics are widely known in the literature. Here, it is pertinent to highlight that each metric offers a unique perspective on the performance of ML models, contributing to a comprehensive inspection. As per Laburú et al. [
39], accuracy is essential for overall performance analysis. In our manuscript, accuracy evaluates the overall success rate of the model. We applied accuracy as per Sellami and Rhinane [
40] and Lemes et al. [
12]. On the other hand, according to Guo et al. [
41], F
1 can provide a subtle analysis of the model performance, especially in situations where class imbalance can exist. Because F
1 incorporates this effect, we employed F
1 as one of the evaluation metrics. We applied such a metric following Alswaidan and Menai [
42]. In addition, it is necessary to analyze the reliability of the system. As outlined by Matindife et al. [
23], Kappa can infer the agreement of the system. In this manner, we can verify the reliability of the proposed approach. The Kappa statistic operates from −1 up to 1. A value of −1 indicates no agreement, 0 signifies agreement by chance, and 1 denotes total agreement. We employed Kappa according to Sellami and Rhinane [
40] and Cabral et al. [
15].
For the results analysis, this study employed one of the most relevant and widely utilized datasets in the load recognition literature, from Kolter and Johnson [
35], the REDD. As highlighted in the table of comparison to other approaches, the REDD is commonly used in the performance evaluation of state-of-the-art approaches. The REDD dataset provides data from eight days of collection from Household 1. According to Kolter and Johnson [
35] and Cabral et al. [
15], the active power of appliances is registered at a frequency of 1/3 Hz. Additionally, this dataset features a wide variety of appliances, particularly in Household 1, which includes the following devices: oven, refrigerator, dishwasher, kitchen oven, lighting, washer dryer, microwave, bathroom GFIs, outlet, heat pump, stove, and unknown devices. From the appliances in Household 1, our system generated 4609 images with a resolution of 32 × 32 pixels (1024 features in total), a sufficient quantity to assess the robustness of the proposed approach. It should be noted that the system used 4609 images with a resolution of
pixels, which resulted in
features. It is pertinent to mention that we did not reduce the number of samples, meaning that the number of images (4609) remained unchanged. Additionally, the images consisted of the electrical activity of the appliances, which are the active power curves that characterize the ON state of each appliance. However, our approach selects the most relevant features, reducing the number of features. Thus, the system reduced the 1024 features to a smaller number, considering the most relevant ones through the proposed approach. It is worth saying that
Section 3 and Algorithm 1 detail the procedure for obtaining these relevant features. In line with Géron [
37], 80% of the total images (
) were allocated for training and 20% (
) for testing, with only the training data used for hyperparameter search. For all ML architectures in Algorithm 1, we used K = 10 in the hyperparameter search. As discussed in Kuhn et al. [
38], this value provides test error rate estimates without being affected by improper bias or high variance. In addition, we employed the initial number of features
= 1024, i.e., the maximum number of features. There are no restrictions for the
value because
acts as an initial assumption. This choice was not critical because Algorithm 1 determines the suitable number of features
k. As per Algorithm 1, the system uses CEV and
to impose a feasible value of
k. In simulations, we analyzed different values for
k, such as 32, 64, 128, 256, 512, and 1024 (by adjusting the values of
). However, to surpass the competitor, Cabral et al. [
15], we needed
= 0.999999 and our system found CEV
r = 0.99999985, where
k = 512. It is worth pointing out that the feature selection required more components and, consequently, a higher threshold. On the other hand, to maintain computational efficiency and performance reliability, while the system ran the hyperparameter search, Algorithm 1 applied
=
in each GBM hyperparameter search, i.e.,
=
=
=
.
As depicted in
Figure 3E, the proposed system applies grid search with K-fold cross-validation to determine the optimal XGBoost hyperparameters. To perform this procedure, Algorithm 1 uses the values of
for the max depth search and the same values to define the number of estimators. At the end of this procedure, Algorithm 1 finds the optimal max depth and the optimal number of estimators. At this stage, Algorithm 1 found the optimal max depth of 29 and the optimal number of estimators corresponding to 30. In other words, Algorithm 1 determined that the optimal hyperparameter pair
=
. In this scenario,
Table 2 lists the average results using optimized XGBoost in 50 runs. When comparing the performance gain between techniques—the
—the accuracy gain between ANOVA and PCA achieves an advantage of 0.87 percentage points for ANOVA, which means
= 0.87 pp. For the F
1, this difference increases, with a gain of 1.03 percentage points, i.e.,
= 1.03 pp. Upon examining the agreement index, the Kappa, we observe 1.42 percentage points, symbolizing
= 1.42 pp. By checking
Table 2, ANOVA reaches the highest accuracy, F
1, and Kappa values—
,
, and
, respectively.
Employing LightGBM, Algorithm 1 uses the values of
for the search of hyperparameters. During this phase, the system found the pair
=
.
Table 3 shows the average results from 50 iterations with the optimized LightGBM. By measuring performance gains, the proposed approach achieves
= 1.07 pp. for the accuracy gain,
= 1.17 pp. for the F
1 gain, and
= 1.87 pp. for the Kappa gain. In
Table 3, ANOVA achieves the highest accuracy, F
1, and Kappa values—
,
, and
, respectively—once again.
In this latter scenario, Algorithm 1 employed HistGBM with the values
for hyperparameter tuning and, consequently, found the optimal value for the max depth and optimal value for the max number of leaf nodes. In this process, the system identified the optimal parameters as
=
. Evaluating performance enhancements reported in
Table 4, the proposed approach results in an accuracy improvement of
= 0.96 pp., an F
1 increase of
= 1.00 pp., and a Kappa enhancement of
= 1.61 pp. As shown in
Table 4, the ANOVA method consistently achieves the highest values in accuracy, F
1, and Kappa—
,
, and
, respectively.
Another interesting aspect is the training time, for which
Table 5 lists the average training for the approaches. Comparing the training times of the strategies, ANOVA–XGBoost presents a training time of 3.67 s, saving approximately
of the time compared to PCA–XGBoost, which requires 15.80 s. This means that ANOVA–XGBoost is approximately 4.31 times faster than PCA–XGBoost. Similarly, ANOVA–LightGBM, with a time of 10.22 s, saves about
of the time compared to the PCA–LightGBM technique, which takes 52.61 s, making ANOVA–LightGBM approximately 5.15 times faster than PCA–LightGBM. Finally, the ANOVA–HistGBM technique, with a time of 29.79 s, reduces the training time by about
compared to the PCA–HistGBM, which requires 67.65 s, making ANOVA–HistGBM approximately 2.27 times faster than PCA–HistGBM. Thus, the ANOVA–XGBoost, ANOVA–LightGBM, and ANOVA–HistGBM approaches are more efficient in terms of training time compared to their respective counterparts.
The load-recognition methods presented in
Table 6 vary in their technical approaches, each combining different feature processing strategies and machine learning models to achieve their objectives. For instance, PCA is employed in various methods, such as those by Huang et al. [
23] and Cabral et al. [
15], due to its ability to reduce data dimensionality without losing crucial information. However, as depicted in
Table 6, there is no definitive approach to feature processing. The authors also employ VI trajectories, GADF, Stockwell transform, APF, and consumption pattern analysis. In this context, the proposed approach innovatively employs the ANOVA
F-test with SelectKBest for feature processing, effectively selecting features that enhance classification performance.
On the other hand, researchers employ a wide diversity of machine learning models in load recognition.
Table 6 shows various architectures, such as LSTM-BP and HT-LSTM, which handle sequential data as variations of recurrent neural networks for device identification. Additionally, many methods, including those by De Baets et al. [
20] and Matindife et al. [
23], frequently use CNNs for automatic feature extraction from complex data. The Artificial Intelligence (AI) models encompass a comprehensive range, including
k-NN, DT, RF, AdaBoost-ELM, and SVM. In this context, our proposed system leads the way in utilizing GBMs, thereby ensuring both robust performance and high reliability.
As highlighted by
Table 6, evaluation metrics vary from F
1-Score, precision, and accuracy to the Kappa index, providing a comprehensive view of model performance across different contexts. However, only the works of Matindife et al. [
23], Cabral et al. [
15], and ours employ a dedicated metric for system agreement evaluation, the Kappa index. In addition, there is no consensus regarding the employed dataset. On the other hand, REDD is the most commonly used dataset to evaluate approaches developed by researchers, particularly in more contemporary studies. This dataset offers a rich diversity of appliances and a substantial dataset size, facilitating thorough analysis.
Comparing the performance of the methods, it is evident that each approach has limitations, reaching different values. Huang et al. [
22] utilize PCA with LSTM-BP, achieving an F
1-Score of
on the REDD dataset, whereas De Baets et al. [
20] employ VI trajectories with a CNN, yielding an F
1-macro of
on the Plug Load Appliance Identification Dataset (PLAID). Conversely, Borin et al. [
17] utilize the Stockwell transform with VPC, reaching
accuracy on a private dataset. More recent methods, such as those by Cabral et al. [
15], employ PCA with different models (
k-NN, DT, RF, and SVM) and, as listed in
Table 6, achieve accuracies starting from
on the REDD dataset. Our pioneering method employs GBMs reaching the highest accuracies,
with LightGBM,
with HistGBM, and
with XGBoost.
Based on the data presented in
Table 6, our proposed method demonstrates noteworthy improvements in accuracy over other approaches. The highest accuracy previously reported is 96.31% by Cabral et al. [
15] using PCA and SVM on the REDD dataset. Our method, which utilizes the ANOVA
F-test with SelectKBest for feature processing and XGBoost for classification, achieves an accuracy of 96.75%. This gain represents an improvement of 0.44 percentage points. Compared to the next highest accuracies, such as 95.40% by Qaisar and Alsharif [
18] with SVM and 94.80% by Zhiren et al. [
21] with AdaBoost-ELM, our method shows enhancements of 1.35 and 1.95 percentage points, respectively. Overall, our approach results in a performance increase, especially when compared to strategies that use a CNN by Faustine and Pereira [
16], achieving 94.00%,
k-NN by Soe and Belleudy [
19], achieving 94.05%, and PCA with DT by Cabral et al. [
15], achieving 94.14%. The improvements in these cases are 2.75, 2.70, and 2.61 percentage points, respectively. Although employing different databases, when compared to methods using a CNN by Matindife et al. [
23], achieving 83.33%, VPC by Borin et al. [
17], achieving 90.00%, and HT-LSTM by Heo et al. [
24], achieving 90.04%, the improvements are 13.42, 6.75, and 6.71 percentage points, respectively.
On the other hand, our proposed ANOVA–HistGBM method achieves an impressive accuracy of 96.64%. This result represents a gain of 0.93 percentage points compared to Cabral et al. [
15], which achieves 95.71%. Compared with Qaisar and Alsharif [
18] who reach 95.40% and Zhiren et al. [
21] at 94.80%, our ANOVA–HistGBM method shows improvements of 1.24 and 1.84 percentage points, respectively. Furthermore, our method reveals progress compared to the CNN by Faustine and Pereira [
16] with 94.00%,
k-NN by Soe and Belleudy [
19] with 94.05%, and PCA with DT by Cabral et al. [
15] with 94.14%, showcasing enhancements of 2.64, 2.59, and 2.50 percentage points, respectively. When compared to methods using a CNN by Matindife et al. [
23] at 83.33%, VPC by Borin et al. [
17] at 90.00%, and HT-LSTM by Heo et al. [
24] at 90.04%, our ANOVA–HistGBM method exhibits gains of 13.31, 6.64, and 6.60 percentage points, respectively.
When applying the ANOVA
F-test with SelectKBest and LightGBM, our method achieves an accuracy of 96.42%. This performance marks an increase of 0.11 percentage points over the highest accuracy reported by Cabral et al. [
15]. When compared to Qaisar and Alsharif [
18], which achieves 95.40%, and Zhiren et al. [
21], which reaches 94.80%, our method exhibits improvements of 1.02 and 1.62 percentage points, respectively. Furthermore, our approach also demonstrates gains when compared to the CNN by Faustine and Pereira [
16], which achieves 94.00%,
k-NN by Soe and Belleudy [
19], which achieves 94.05%, and PCA with DT by Cabral et al. [
15], which achieves 94.14%. These comparisons show improvements of 2.42, 2.37, and 2.28 percentage points, respectively. The method also outperforms the CNN by Matindife et al. [
23], achieving 83.33%, VPC by Borin et al. [
17], achieving 90.00%, and HT-LSTM by Heo et al. [
24], achieving 90.04%, with improvements of 13.09, 6.42, and 6.38 percentage points, respectively. Finally, these results underscore the efficacy of our approach in achieving superior accuracy across a diverse set of benchmark comparisons. We believe that performance increases of the proposed method, compared to the direct rival approach, are because of the feature selection technique, which naturally chooses the most significant features.