1. Introduction
Smart grids (SG) are among the most important form of the Internet of Things (IoT) network, which brings comfort to users with the easily managed production, dispensation, and utilization of energy [
1]. An SG provides unwavering quality, adaptability, and proficiency of power systems to consumers [
2,
3]. With the auspicious development of SG, it has extracted expanding consideration from states, ventures, and researchers. As neoteric research indicated that the SG market increase from USD 23.8 billion in 2018 to USD 61.3 billion by 2023 [
4]. SG can be enhanced further by integrating it with new technologies such as machine learning (ML) cloud computing and fifth generation (5G) cellular networks [
5]. With the popularity of the IoT, appliances such as smart meters (SMs) can produce an enormous amount of data [
6]. Data-driven AI technologies could benefit from this data to enhance the user experience with customized energy strategies. These technologies also enable the service providers (SPs) to better predict the power consumption and increase profits [
7].
Despite the conspicuous features of SMs, they endanger the user’s privacy [
8,
9]. For instance, SPs can easily infer the consumer’s routines and daily lifestyle from the real-time energy consumption data collected by SMs. Moreover, the knowledge of these trends even results in crimes such as energy theft. In particular, client energy consumption data needs to be transferred to the central server (CS) for knowledge extraction [
10,
11], which, as a result, compromises the security and confidentiality of the user energy data. Previous studies show significant annual financial losses due to energy theft, e.g., Canada faces a loss of USD 100 million [
12], USD 170 million are lost in the United Kingdom [
13], and in the United States, energy theft can cause losses yearly of USD 6 billion [
14].
Federated learning (FL), also known as collaborative learning, is a novel ML approach that trains the ML model over different devices. These devices hold different local data samples and are placed at various locations. FL allows SPs to extract intuition from users’ data while enabling clients to keep their private data on their respective devices [
15].
Figure 1 depicts the generic framework for the FL-enabled SGs. In this, only the parameters of the ML model need to be shared with the CS while keeping the user data secure on the trusted theft detection station (TDS). TDS downloads the parameters from CS and performs training and evaluation of AI models using the local data. FL is an iterative process that is repeated for a specific number of iterations or until a predefined accuracy is achieved [
16] or the loss of the ML model is minimized [
17]. CS, instead of receiving all the user data, only aggregates the parameters of the model from various TDS, which enables collaborative learning.
Despite the benefits of FL, it faces some basic challenges while implementing the ML models. For instance, cutting-edge deep neural networks (DNN) have extensively been used for identifying the energy theft in SGs. The problem with these models is that they require significant computing resources that are not a viable option for resource-constrained SGs. Additionally, these models have low accuracy and precision in energy theft detection [
17,
18,
19]. Motivated by this, this research proposes the novel energy theft detection model for SGs which has relatively high accuracy and can preserve consumer data.
Contributions
The major contributions of this research are summed up below.
First, a novel framework, federated data privacy (FedDP), is presented for energy theft detection in SGs.
Secondly, to improve the accuracy and to avoid bias of any single ML algorithm, an ensemble learning classifier is proposed in a federated learning environment, named the federated voting classifier (FVC).
FVC can identify the energy theft in SGs, even in the presence of highly unbalanced data, with an accuracy of 91.67%, which is a relatively better performance than the existing techniques.
Moreover, FVC can also significantly outperform other state-of-the-art algorithms in terms of execution time when implemented on the same hardware.
More specifically, FedDP has significant characteristics. First, energy utilization data from SMs are placed in TDSs that can preserve their privacy. Second, all TDSs can cooperatively train and evaluate the ML model by applying the FL in which only the parameters of the ML model are shared with the CS for aggregation. FVC takes the consensus of traditional ML models, namely, random forests (RF), k-nearest neighbors (KNN), and bagging classifiers (BG). Experiments on the real energy usage dataset show that FVC could surpass the other advanced models concerning precision and log loss. The major contribution of this research is illustrated in
Figure 2 below.
3. Methodology
In this subsection, the fundamental ideas of the proposed FedDP architecture are introduced. Mainly, the FedDP model is explained, and the proposed FVC model is elaborated.
3.1. System Model
As described earlier, federated data privacy (FedDP) aims to design a privacy-preserving FL framework that can exploit ML classifiers to identify energy theft in a distributed fashion. FL is the platform that supports ML classification algorithms over multiple decentralized clients that hold the local data samples. This provides data privacy by enabling on-device prediction without sharing the complete data. Moreover, for predicting the energy theft in SGs, FedDP proposes a voting classifier in a federated manner. The voting classifier is an ensemble learning classification model that operates by taking the consensus of the different ML classification models to predict the final class.
FedDP is a two-tier framework that has two major constituents.
- (1)
Theft detection station (TDS): A TDS can obtain real-time data on energy utilization from the group of SMs in its vicinity. This research assumes that (1) the wired or wireless connection between SM and TDS is secure. (2) TDS are low-powered devices but have sufficient storage and processing power that can store the data along with the training of the ML model. (3) Each TDS can automatically infer the data label that associates with previous data of an SM with electricity thievery. (4) TDS can also securely communicate with CS for the exchange of the ML model parameters. In FedDP, TDS is considered a federated client. During the training stage, TDSs download the model parameters (e.g., weights in the neural networks and number of neighbors, Leaf_size, etc., in KNN) from the server and evaluate them using local data.
- (2)
Central server (CS): It is the initialization of the FL process and is responsible for broadcasting default parameters and learning models to all TDS. In the FL process, the CS can receive the model parameters, accumulate them, and can broadcast the improved parameters to all TDSs.
3.2. Privacy-Preserving FedDP
As the energy-related data of each user is limited, TDS pivot on the assortment of large, best quality data from SMs. In each training round of TDS, it can determine
for training the global classification model
. Additionally,
can be represented by the collection of input samples corresponding with their respective label
, where
is the single SM record and
is its corresponding label (i.e., theft or no theft). A fundamental task of any ML model is to learn the mapping of input samples to output labels and the specification of
which predicts the
relative to
while increasing the accuracy or mitigating the loss [
17].
identifies the difference between the predicted and true labels of each training instance
. For all samples in
,
is the mean loss of all instances in
. Therefore, for each TDS, the overall loss is the mean prediction loss of all instances in
. The notations used in this paper are listed in Glossary.
The major purpose of the FL is to find the most advantageous parameters of the ML model that lessen the global loss function.
FedDP proposes a novel framework for collaborative learning of TDS. It has various phases that are elaborated below and illustrated in
Figure 3.
Phase 1: In phase 1 of FedDP, CS initializes CM with any parameters that are necessary for the training of the ML model. Moreover, CS also disseminates these specifications to all the participating clients, i.e., TDS. Once each TDS receives the parameters from CS, the local model is trained on the local heterogeneous dataset to generate a new set of model parameters.
Phase 2: Once θ(T), a set of TDSs, has updated local parameters, then these characteristics are sent to CS for aggregation. CS executes the federated aggregation algorithm that computes the average of parameters received from individual TDS. In this way, a global and more accurate model is developed. The job of CS is to allow collective learning in such a way that each TDS learns from the experience of other detection stations and learns to build an accurate machine learning model. This enables the ML model to continually evolve and update itself.
Phase 3: Subsequently, CS transmits the aggregated parameters to each participating TDS, so that TDS again trains and evaluates CM on its local data by integrating these updated parameters. After the third phase in FedDP, the federated process again continues for a specific number of iterations.
3.3. Predictive Methods
Several classification algorithms are used to predict the energy theft in the network. These models are trained and evaluated on the training and test data. This research proposes a voting-based ensemble of RF, KNN, and BG in a federated manner. To the best of our knowledge, conventional ML algorithms have never been used in FL. These classifiers are briefly described below.
3.3.1. Random Forest (RF)
RF was inspired by decision-tree learning and was first proposed by Breiman in 2001 [
35] as a classifier. It consists of an abundance of tree-structured classifiers so that each tree depends on the value of individualistic and same arbitrary vectors. In contrast, each tree chooses the most successive class in the dataset. In RF, every node is divided by utilizing the best attributes among the subspace of features, picked at random for that node. This technique is vigorous and performs well when contrasted with other regularly used ML models. However, its execution depends on identifying several trees to develop and the quantity of contenders arbitrarily selected at each stage. Generally, the user specifies the number of trees in RF, and this can be done by starting from a low number and slowly increasing it. This study uses 10 trees to train the RF model. Moreover, the model uses the
Gini index to split the whole dataset to create the subset of data for each tree.
3.3.2. k-Nearest Neighbors
The k-nearest neighbors (KNN) classification technique was proposed by Cover et al. in [
36]. It is a classification method that classifies the test data by looking at the nearby set of predicted samples. Two choices should be remembered while executing the KNN. First, the value of
determines that many neighbors ought to be viewed to characterize the test sample and the distance metric to evaluate the distance between the test sample and previously classified samples (i.e., train samples). This study uses 3 as the value of
and
function calculates the interspace between
and
. To optimize the memory utilization,
leaf_size is set to 5. These values are taken on test and trial basis. If
denotes the distance between
and
datapoint and
and
constitutes the values of the
variable for sample
and
respectively, then
can be given as:
3.3.3. Bagging Classifier (BG)
In 1996, Breiman [
37] first presented bagging predictors as a model for creating different indicators and utilizing them to make accumulated predictors. It is an ensemble model, that aims to boost further the strength and accuracy of the machine learning model. It accumulates the mean over the variations while predicting a numerical result and does a vote when predicting a class. The essential concept of the bagging classifier is that it generated many “weak learners” and utilized them to construct “strong learners”. It builds numerous decision trees (DTs) that are weak learners and merges them to produce a strong learner. Each tree gives the decision in favor of a class, and the last expectation of the new class is acquired by the class that has the most votes. Using the decision tree in the bagging classifier shows that our dataset has a high disparity among classes. So, DTs offer good behavior by weighting the outputs of the trees and lessening the variance of the dataset and avoiding over-fitting. This study uses the ensemble of 10 DTs that can cast the vote of the majority class to give the final prediction.
3.4. Proposed FVC Algorithm
This research proposes a novel federated voting classifier (FVC) for energy theft detection. FVC is an ensemble technique that combines various ML models to make one optimal predictive model. The ensemble model combines the output of different ML models, regarding them as the “committee” of multiple decision-makers. FVC is similar to the bagging classifier, with the main difference being that the bagging classifier uses an ensemble of decision trees. In contrast, FVC takes the vote of RF, KNN, and BG, as shown in
Figure 4. Every classifier in the voting classifier is run independently on the training data, and FVC uses the majority consensus of all the models in FVC. For example, let the individual model, RF, KNN, and BG in FVC be given by
and their corresponding predictions are specified by
and
respectively. Moreover,
represents the final prediction of the FVC model, then the consensus of FVC can be given by:
FVC is trained on each client on the local data partitioned into and . Furthermore, an evaluation of the FVC model is performed on the TDS. After that, each client sends model parameters to the server to execute the aggregation of parameters from each TDS. These rounds are computed multiple times and optimized model is achieved.
6. Conclusions
This study investigates how to identify energy theft in SGs while ensuring the privacy of users’ data. A novel federated framework, FedDP, is proposed that uses traditional ML algorithms to predict energy theft behaviors. FedDP first ensures the privacy of data sensed by the SMs, and to enable collaborative learning, a server-based approach is used to aggregate the parameters of the ML classifier with the computational overhead of approximately 35 s. This overhead is significantly better when compared with state-of-the-art models. Current research also proposes the use of a novel federated voting classifier (FVC) to detect energy theft while using the real-world dataset, accurately. Comparative results demonstrate that the proposed FVC performed better when compared with other models. Results show that FVC has the highest accuracy and precision of 91.67% and 89.03, respectively, when compared with the existing models. Moreover, slight improvements can also be seen in terms of F_measure and recall (
Table 5). The efficacy of the FVC classifier was also estimated using RMSE and log loss. Results illustrate that the FVC has the lowest loss (i.e., 0.2884 RMSE and log loss of 0.5050). In future work, we may also want to include the security schemes to protect the parameters when they are exchanged between server and client. Moreover, it is also possible for CS to learn user behavior from the parameters it receives. To avoid this, the parameters could be encrypted using homomorphic encryption so that the CS cannot even look at the parameters.