An Unsupervised Abnormal Power Consumption Detection Method Combining Multi-Cluster Feature Selection and the Gaussian Mixture Model

Liu, Danhua; Huang, Dan; Chen, Ximing; Dou, Jian; Tang, Li; Zhang, Zhiqiang

doi:10.3390/electronics13173446

Open AccessArticle

An Unsupervised Abnormal Power Consumption Detection Method Combining Multi-Cluster Feature Selection and the Gaussian Mixture Model

by

Danhua Liu

¹,

Dan Huang

¹,

Ximing Chen

¹,

Jian Dou

²,

Li Tang

¹ and

Zhiqiang Zhang

^1,*

¹

State Grid AnHui Marketing Service Center, Hefei 230000, China

²

China Electric Power Research Institute, Beijing 100192, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(17), 3446; https://doi.org/10.3390/electronics13173446

Submission received: 29 July 2024 / Revised: 25 August 2024 / Accepted: 27 August 2024 / Published: 30 August 2024

(This article belongs to the Topic Condition Monitoring and Diagnostic Methods for Power Equipment in New Energy Power Systems)

Download

Browse Figures

Versions Notes

Abstract

Power theft and other abnormal power consumption behaviors seriously affect the safety, reliability, and stability of the power grid system. The traditional abnormal power consumption detection methods have complex models and low accuracy. In this paper, an unsupervised abnormal power consumption detection method based on multi-cluster feature selection and the Gaussian mixture model is proposed. First of all, twelve features are extracted from the load sequence to reflect the overall form, fluctuation, and change trend of the user’s electricity consumption. Then, multi-cluster feature selection algorithm is employed to select a subset of important features. Finally, based on the selected features, the Gaussian mixture model is formulated to cluster the normal power users and abnormal power users into different groups, so as to realize abnormal power consumption detection. The proposed method is evaluated through experiments based on a power load dataset from Anhui Province, China. The results show that the proposed method works well for abnormal power consumption detection, with significantly superior performance comapred to the traditional approaches in terms of the popular binary evaluation indicators like recall rate, precision rate, and F-score.

Keywords:

power theft; power grid system; smart grid; abnormal power consumption detection; multi-cluster feature selection; Gaussian mixture model

1. Introduction

With the large-scale popularization of intelligent energy meters and the gradual improvement of integrated reading systems, it has become possible to collect, transmit, and store a large amount of fine-grained information regarding the consumption of electricity (including reactive power, energy consumption, active power, and so on) based on advanced metering infrastructure [1]. Correspondingly, the power industry has transitioned into the era of big data, and this implies restructuring the core value of electricity and shifting the paradigm of its development [2]. Through the exploration of the personalized market demand and the healthy development of enterprises themselves, the transformation from power production as the center to the customer as the center is realized, so as to promote and accelerate the transformation of the power industry to the green development mode of low energy consumption, low emissions, and high efficiency [3]. At present, the power distribution big data are promoting the transformation of the traditional business model to the big data business model [4,5]. In particular, the former is based on the physical model of the power grid, while the latter is based on the correlation of data information.

Generally, the transmission and distribution losses in power grid operation can be divided into two categories [6,7]: technical loss (TL) and non-technical loss (NTL). In detail, TL refers to the loss of electrical energy generated by components such as power lines and transformers due to their own heating during the transmission of electrical energy. These losses can be predicted and will decrease with the progression of technology. Conversely, NTL denotes the residual portion of power transmission and distribution loss that remains unaccounted for even after removing TL, where the abnormal behavior of users like stealing electricity is the main cause of NTL. Power theft and other abnormal power consumption behaviors seriously affect the safety, reliability, and stability of the power grid system. To be specific, electricity theft is generally realized through private tampering with metering devices and power lines, which has serious security risks. The existence of NTL also interferes with the calculation of power grid operation parameters, thus endangering the reliability of power system operation. In addition, NTL can also cause economic losses in grid operation. According to a research report released in January 2017 by Northeast Group [8], a US-based firm specializing in smart grid consulting services, most of the 50 developing countries surveyed face severe NTL, with a combined annual deficit of USD 64.7 billion. Therefore, abnormal power consumption detection (APCD) is particularly important for the operation of independent power distribution and sales companies [9]. In particular, according to [10,11], any unexpected emergencies that might occur in a smart grid have a great impact on the system in terms of autonomy, sustainability, and cost. The consumers’ need for a low System Average Interruption Duration Index (SAIDI) and System Average Interruption Frequency Index (SAIFI) is very urgent.

Fundamentally speaking, APCD can identify faulty meters and perpetrators of electricity theft. The traditional APCD methods rely on the distribution company to send technicians for manual screening, and there are also some methods that rely on camera head or drone monitoring to prevent the occurrence of electric theft [12]. Unfortunately, these methods are usually time-consuming and labor-intensive [13]. With the popularity of smart meters, power companies can collect a large amount of data about power loads, providing a rich data source for data-driven abnormal power consumption detection. The existing APCD methods using power big data analysis techniques and advanced computer algorithms are able to locate abnormal power users more accurately and analyze their behavior more carefully [14]. Typically, these APCD methods include statistical methods, support vector machines, and cluster analysis, among others. From the perspective of data mining, these methods can be roughly categorized into unsupervised learning and supervised learning [15]. Supervised learning requires a training set—that is, it needs to know the type of electricity consumption behavior of some users (e.g., normal, abnormal). In practice, however, it is difficult to obtain enough labelled training samples. Unsupervised learning is characterized by the absence of training samples in advance—that is, learning data sets without conceptual labels (i.e., class labels) to discover structural knowledge hidden in the data [16]. Therefore, this paper focuses on the unsupervised APCD method without a training set—that is, in the case that all user types are unknown—by analyzing the relationship between users to find outlier pairs.

The general procedure of unsupervised APCD consists of four stages [17]: data collection, feature extraction, feature selection, and anomaly detection. During the data collection phase, the power user data are collected by the smart meters for subsequent analysis. During the feature extraction phase, we extract some representative features from the gathered data to better reveal the intrinsic information hidden the data. In the stage of feature selection, a subset of salient features are selected from all features to describe each sample more finely. In the stage of anomaly detection, the normal power users and abnormal power users are well distinguished based on the selected features. Among the above four stages, feature selection and anomaly detection are two crucial stages playing a vital role in improving the detection effect. In particular, feature selection endeavors to choose a subset of features that are most pertinent and informative by means of some specific evaluation criteria [18]. Hence, it holds great potential for reducing computation complexity by removing the irrelevant or redundant features and improving the detection accuracy by retaining the most salient features. Moreover, advanced anomaly detection algorithms are the key to the high detection accuracy.

Over the past few years, plenty of feature selection algorithms have been developed and extensively applied to abnormal power consumption detection [19,20]. One type of efficient feature selection method is filtering, which independently scores features one by one and selects the top-ranked features, and includes the distance evaluation technique, the compensation distance evaluation technique, ReliefF [21], the Laplacian score [22], and self-weighting [23]. The scores can reflect the ability of each feature to discern distinct classes or clusters. However, these filter methods ignore the intrinsic correlation between features, leading to difficulty in selecting an optimal subset of features. In addition, the task of anomaly detection can be realized through the clustering algorithms, where the normal samples and abnormal samples are grouped into two distinct clusters. K-means [24] is a very simple and popular clustering algorithm which works by partitioning a sample into one cluster in a hard-assignment manner, i.e., the membership is 1 or 0. Samples in the real world are more likely to belong to one class in a probabilistic way. Moreover, its use of Euclidean distance as a similarity measure limits its ability to identify complex non-linear usage structures. As an alternative, the Gaussian mixture model (GMM) [25] is an effective generative as well as probabilistic clustering method which formulates its model based on an assumption that the collected data are from multiple Gaussian distributions, while each data cluster is described by a separate Gaussian distribution. A detailed comparison of K-means and GMM was conducted in [26], and the results show that GMM can provide better clustering performances.

In this work, we present an effective unsupervised abnormal power consumption detection method, including the modules of data collection, feature extraction, feature selection, and anomaly detection. Firstly, a variety of characteristic quantities are extracted from the load dataset to characterize the consumption patterns of the users. Secondly, the multi-cluster feature selection (MCFS) [27] algorithm is employed to determine important features to realize dimensionality reduction, and the load sequence of all users is mapped to a two-dimensional plane after dimensionality reduction. In particular, MCFS proposes a systematic approach to assesses correlations among features without label information. It is capable of selecting features that can best preserve the multi-cluster structure of the data. Thirdly, GMM is used for data clustering so as to detect different types of electricity users, and its performance is evaluated by means of a series of typical evaluation indicators. Finally, the proposed method is applied to detect the data of smart meters from a region in Anhui Province, China, and its effectiveness is validated.

The remainder of this paper is organized as follows. Section 2 describes the proposed abnormal power consumption detection method, where the basic principles about MCFS and GMM are provided in detail. In Section 3, we present the experimental verifications and discuss the results. Finally, the conclusions are given in Section 4.

2. The Proposed Method

This work presents an effective unsupervised abnormal power consumption detection method including four modules, as shown in Figure 1. Details about these modules are described below.

2.1. Data Collection

The users’ power load data collected by the smart meters constitute the dataset. This dataset contains N electricity users’ electricity consumption data for H months, where the consumption pattern of the user is expressed by its monthly average load. For every electricity user, it can be represented by a load sequence with H data points. Therefore, the dataset is composed of a total of N H-dimension samples. Each sample undergoes preprocessing by the “min–max” normalization strategy before implementation, as illustrated in Equation (1):

s^{'} = \frac{s - min (s)}{max (s) - min (s)},

(1)

where s and

s^{'}

stand for the data sample before and after “min–max” normalization, respectively.

2.2. Feature Extraction

Feature extraction aims at extracting informative indexes to characterize a user’s electricity consumption pattern, and these indexes can work together to reflect the overall form, fluctuation, and change trend of the user’s electricity consumption. Here, three kinds of typical indexes are adopted and described as follows [28,29,30].

2.2.1. Morphological Index

The four most commonly used morphological indexes are (1) the average monthly electricity consumption; (2) the monthly electricity consumption rate, which can be expressed as the ratio between the average power consumption and the maximum power consumption; (3) the peak–valley difference of monthly electricity consumption, namely the ratio of the variance between maximum and minimum power consumption to the maximum power consumption; and (4) the proportion of electricity consumed in each quarter to the annual electricity consumption.

It should be noted that the monthly electricity consumption rate reflects the overall change in electricity consumption, the peak–valley difference rate reflects the magnitude of electricity consumption change, and the share of electricity usage per quarter reflects the distribution of electricity consumption.

2.2.2. Fluctuation Index

The three most commonly used fluctuation indexes are (1) the monthly electricity consumption dispersion coefficient, expressed as the ratio of the standard deviation of monthly electricity consumption to the mean monthly electricity usage; (2) the ratio of monthly electricity consumption dispersion coefficient to industry monthly electricity consumption dispersion coefficient; and (3) the variance in electricity consumption between the initial and final months.

It is necessary to point out that the dispersion coefficient of each household’s electricity consumption indicates the extent to which a user’s electricity consumption curve deviates from the mean value curve, which can reflect the variability in the user’s electricity consumption in more detail. High electricity consumption recorded in some daily data will exert a more significant influence on the dispersion coefficient. The difference between the beginning and end of the electricity consumption data is based on the monthly electricity consumption data over an extended duration, which can reflect the overall fluctuation in the user’s electricity consumption.

2.2.3. Trend Index

Two trend indexes reflect the rising trend and declining trend of monthly electricity consumption series, which can be calculated through the simple moving average method. As introduced previously, each electricity user has H months of electricity consumption data. Denoting

h_{1}, h_{2}, \dots, h_{H}

as the monthly electricity consumption data, the simple moving average

S_{t}

for the first n points at time t is calculated as follows:

S_{t} = \frac{h_{t - 1} + h_{t - 2} + \dots + h_{t - H}}{H} .

(2)

With Equation (2), the steps in the calculation of two trend indexes are given as below.

Step 1: The monthly average time series of all users are used as their typical monthly electricity consumption curves.

Step 2: The simple moving average series S is utilized in the computation of the monthly electricity consumption data of each power user.

Step 3: The energy usage of each user at time t sequence

h_{t}

and sequence

S_{t}

is compared one by one, and the points where

h_{t}

is greater than

S_{t}

are recorded as

a_{1}, a_{2}, \dots, a_{u}

(that is, the data points where the actual value is greater than the predicted value), while the points where

h_{t}

is less than

S_{t}

are recorded as

b_{1}, b_{2}, \dots, b_{v}

(that is, the data points where actual value is less than the predicted value).

Step 4: The uptrend index tra and the downtrend index trb are calculated based on Equation (3):

\{\begin{matrix} t r a = \sqrt{\sum_{i = 1}^{u} {(a_{i})}^{2} / u} \\ t r b = \sqrt{\sum_{i = 1}^{v} {(b_{i})}^{2} / v} \end{matrix} .

(3)

2.3. Feature Selection

Since there are a large number of features and these features might encompass redundant or overlapping information, it is necessary to carry out dimension reduction (dimensionality reduction) on the dataset to visually represent the power consumption patterns of individual users on a low-dimensional plane and effectively detect abnormal users. In broad terms, dimensionality reduction can be realized through subspace learning or feature selection. Subspace learning involves transforming the initial high-dimensional features into a lower-dimensional subspace by combining them to create new features that represent the data. On the other hand, feature selection focuses on choosing a subset of valuable features from the initial high-dimensional feature set. Feature selection, when contrasted with subspace learning, can maintain the inherent significance of the initial features, thereby offering improved interpretability of the results.

In this work, the MCFS algorithm is employed for feature selection since it is an effective and outstanding unsupervised feature selection algorithm due to its simplicity and efficiency. Fundamentally, MCFS treats the problem of feature selection as a two-stage optimization problem, including a sparse feature decomposition subproblem and a L1-regularized least squares subproblem. Details about these two subproblems are presented as follows.

2.3.1. Sparse Feature Decomposition Subproblem

Given a sample set

X = [x_{1}, x_{2}, \dots, x_{N}]

with N data samples and M features, where the ith sample

x_{i} \in ℜ^{M}

, fundamentally speaking, the sparse feature decomposition subproblem contains three steps.

Step 1: Create the graph comprising vertices, where each vertex represents a data point. Connect each data point to its closest neighbors by adding edges between them. There are various methods that are able to determine the weight matrix

W \in ℜ^{N \times N}

on the graph, and three commonly used ones are introduced in Equation (4):

\{\begin{matrix} W_{i j} = 1 o r 0, ‘ 0 - 1 ’ weighting \\ W_{i j} = exp (\frac{- {∥x_{i} - x_{j}∥}^{2}}{σ}), Heat kernel weighting \\ W_{i j} = {x_{i}}^{T} x_{j}, Dot - product weighting \end{matrix},

(4)

where it is necessary to point out that “0–1 weighting” is straightforward and requires minimal computational effort, making it highly accessible. The ith sample

x_{i}

and the jth sample

x_{j}

refer to the ith node and the jth node, respectively, and

W_{i j}

measures the similarity between

x_{i}

and

x_{j}

. Moreover, W is processed as

W = (W + W^{T}) / 2

before implementation, for the purpose of ensuring the diagonal properties of W.

Step 2: Compute a diagonal matrix

D \in ℜ^{N \times N}

and the graph’s Laplacian score

L \in ℜ^{N \times N}

. In particular, the diagonal elements of D are

D_{i i} = \sum_{j} W_{i j} (i = 1, 2, \dots, N)

and the graph’s Lapalcian score is

L = D - W

.

Step 3: The sparse feature decomposition subproblem is equivalent to addressing the subsequent generalized eigenvalue problem [31]:

L y = λ D y,

(5)

where the matrix

Y = [y_{1}, y_{2}, \dots, y_{C}]

, and

y_{c}

is the eigenvector corresponding to the smallest eigenvalue in the aforementioned generalized eigen-problem. Each row of Y represents the flattened embedding for each individual data point. C denotes the number of clusters within the data and each cluster describes how the data are distributed in its respective dimension. During cluster analysis, each cluster provides insight into the distribution of data within its corresponding cluster.

2.3.2. L1-Regularized Least Squares Subproblem

Once the "flat" embedding Y of the data points is obtained, it becomes possible to evaluate the significance of each feature in each inner dimension (each column of Y). This assessment involves determining the contribution of each feature to the differentiation of each cluster. For each row vector of Y, namely

y_{c}

, one can identify a pertinent subset of features by minimizing the fitting error formulated in Equation (6):

\underset{a_{c}}{m i n i m i z e} {∥y_{c} - X^{T} a_{c}∥}^{2} + β |a_{c}|,

(6)

where the vector

a_{c} \in ℜ^{M}

(

c = 1, 2, \dots, C

) contains the combination coefficients for different features. Essentially speaking, the non-zero elements in

a_{c}

correspond to the most relevant features with respect to

y_{c}

. Theoretically, Equation (6) is a standard L1-regularized regression problem (namely so-called LASSO), which can be efficiently solved by means of the existing optimization algorithms such as matching pursuit, orthogonal matching pursuit, least angel regression, etc.

After optimization, one can obtain C sparse coefficient vectors

{\{a_{c}\}}_{c = 1}^{C} \in ℜ^{M}

. Then, we compute the MCFS score

M C F S (m)

of every feature, namely

M C F S (m) = max_{c \in \{1, 2, \dots, C\}} |a_{c, m}|,

(7)

where

a_{c, m}

refers to the mth element of the vector

a_{c}

. The feature with the largest MCFS score is considered to be the most important. As a result, we can arrange M MCFS scores in descending order and choose the top-ranked D features

(D < M)

, and these features are of great importance.

2.4. Anomaly Detection

The clustering algorithm is the core of the anomaly detection model in this paper. According to the clustering-based anomaly detection method, samples that do not fall within any cluster are considered as outliers, while samples that are distant from the cluster center can also be classified as outliers. Clustering can divide similar objects into several subsets via the static classification method through some special algorithms, and thus objects within the same subset exhibit certain similarities. Through clustering, the new collection is divided into several clusters (different clusters represent different types of power users). The same type of power users have similar electricity consumption patterns, and through more refined anomaly detection in each type of power user, one can eliminate the mutual interference between different categories of users to a certain extent. A high-quality clustering algorithm needs to meet two requirements: high similarity of data within a class and weak data similarity between classes.

GMM is usually applied to data clustering owing to its three advantages. Firstly, GMM describes the data using Gaussian distribution, which is a common probability distribution with good mathematical properties and interpretability. Secondly, GMM can be adapted to various shapes of data distribution, including round, oval, and irregular shapes. This is because GMM uses a combination of multiple Gaussian distributions to approximate the probability distribution of arbitrary shapes, and is therefore highly adaptable. Thirdly, GMM takes the probability distribution of the data into account when clustering, so it can better handle noisy data and outliers. At the same time, the clustering results of GMM have the characteristics of soft clustering—that is, the data points can belong to multiple clustering centers, which can better reflect the real situation of the data. Therefore, GMM is utilized for data clustering in this work, through which different types of electricity users are clustered into different groups and displayed visually on a two-dimensional plane. In the visualization, a small number of abnormal power users are clearly distinguished from normal users.

2.4.1. Descriptions about GMM

GMM refers to a linear combination of multiple Gaussian distribution functions, and the real-world data are generally sampled from a mixture Gaussian distribution. Denoting

F = [f_{1}, f_{2}, \dots, f_{N}]

as the sample set of X after feature selection

f_{i} \in ℜ^{D}

, it is natural to assume that

f_{1}, f_{2}, \dots, f_{N}

obey a mixture Gaussian distribution [32]:

\begin{matrix} P (f) = \sum_{k = 1}^{K} π_{k} N (f | μ_{k}, Σ_{k}), \\ N (f | μ_{k}, Σ_{k}) = \frac{1}{\sqrt{{(2 π)}^{D} |Σ_{k}|}} e x p (- \frac{1}{2} {(f - μ_{k})}^{T} Σ_{k}^{- 1} (f - μ_{k})), \end{matrix}

(8)

where

0 \leq π_{k} \leq 1

is the weight coefficient of each sub Gaussian distribution,

\sum_{k = 1}^{K} π_{k} = 1

, and K is the number of Gaussian distributions. The distribution

N (f | μ_{k}, Σ_{k})

represents the kth Gaussian distribution,

μ_{k} \in R^{D}

and

Σ_{k} \in R^{D \times D}

represent the mean and covariance matrix of the kth Gaussian distribution, respectively, and D represents the sample dimensionality after feature selection.

Next, we estimate the model parameter

\{π_{1}, π_{2}, \dots π_{K}; μ_{1}, μ_{2}, \dots μ_{K}; Σ_{1}, Σ_{2}, \dots Σ_{K}\}

of GMM, introducing a new K-dimensional variable

z = {(z_{1}, z_{2}, \dots, z_{k}, \dots, z_{K})}^{T}

, where

z_{k}

only takes two values, 0 or 1.

z_{k} = 1

indicates that the kth class is selected—that is,

P (z_{k} = 1) = π_{k}

. More mathematically,

z_{k}

satisfies two mathematical conditions:

z_{k} \in \{0, 1\}

and

\sum_{k = 1}^{K} z_{k} = 1

.

We assume that

z_{k}

is independent and identically distributed, so the joint probability distribution form of variable z is

p (z) = p (z_{1}, z_{2}, \dots, z_{k}) = p (z_{1}) p (z_{2}) \dots p (z_{k}) = \prod_{k = 1}^{K} π_{k}^{z_{k}},

(9)

Therefore, we can acquire

p (f | z) = \sum_{k = 1}^{K} π_{k} N {(f | μ_{k}, Σ_{k})}^{z_{k}}

. Furthermore, we can obtain

p (f) = \sum_{k = 1}^{K} p (z_{k}) p (f | z_{k}) = \sum_{k = 1}^{K} π_{k} N (f | μ_{k}, Σ_{k}),

(10)

In Equation (10), a latent variable z is introduced, which means that we know that the data have K classes, but when we randomly select a data point, we do not know which class this data point belongs to. Therefore, we use a latent variable to describe this phenomenon. Note that in Bayesian theory,

p (z)

is a prior and

p (f | z)

is the likelihood probability, which is easy to understand and can be used to calculate the posterior probability

p (z | f)

:

\begin{matrix} γ_{k} = p (z_{k} = 1 | f) \\ = \frac{p (z_{k} = 1) p (f | z_{k} = 1)}{p (f, z_{k} = 1)} \\ = \frac{p (z_{k} = 1) p (f | z_{k} = 1)}{\sum_{k = 1}^{K} p (z_{k} = 1) p (f, z_{k} = 1)} \\ = \frac{π_{k} N (f | μ_{k}, Σ_{k})}{\sum_{k = 1}^{K} π_{k} N (f | μ_{k}, Σ_{k})}, \end{matrix}

(11)

where

γ_{k}

represents the probability that sample f comes from the kth submodel.

2.4.2. Estimation of GMM Model Parameters Using the Expectation-Maximization Algorithm

The expectation-maximization (EM) algorithm is commonly adopted to estimate the parameters of the model containing latent variables. It consists of two steps: the first step is to obtain a rough value of the parameter to be estimated, and the second step is to maximize the likelihood function using the value from the first step. The model parameter set is

\{π_{1}, π_{2}, \dots π_{K}; μ_{1}, μ_{2}, \dots μ_{K}; Σ_{1}, Σ_{2}, \dots Σ_{K}\}

. For the entire dataset, the optimization objective

L = \prod_{i = 1}^{N} P (f_{i})

taking the logarithm of both sides simultaneously yields

l n (L) = \sum_{i = 1}^{N} l n (P (f_{i})) = \sum_{i = 1}^{N} l n (\sum_{k = 1}^{K} π_{k} N (f_{i} | μ_{k}, Σ_{k})) .

(12)

Step 1: By simultaneously taking partial derivatives of both sides of Equation (12) and setting

\partial l n (L) / \partial μ_{k} = 0

, we can obtain

\sum_{i = 1}^{N} \frac{π_{k} N (f_{i} | μ_{k}, Σ_{k})}{\sum_{k = 1}^{K} π_{k} N (f_{i} | μ_{k}, Σ_{k})} Σ_{k}^{- 1} (f_{i} - μ_{k}) = 0 \Rightarrow {\overset{⌢}{μ}}_{k} = \frac{1}{N} \sum_{i = 1}^{N} γ (z_{i k}) f_{i}

(13)

where

N_{k} = \sum_{i = 1}^{N} γ (z_{i k})

represents the number of sample points input to the kth Gaussian model,

γ (z_{i k})

represents the probability of sample point

f_{i}

from the kth model, and

{\overset{⌢}{μ}}_{k}

is equivalent to the weighting of all sample points.

Step 2: By simultaneously taking partial derivatives of

Σ_{k}

on both sides of Equation (12) and setting

\partial l n (L) / \partial Σ_{k} = 0

, we can obtain

\begin{matrix} \sum_{i = 1}^{N} \frac{π_{k} N (f_{i} | μ_{k}, Σ_{k})}{\sum_{k = 1}^{K} π_{k} N (f_{i} | μ_{k}, Σ_{k})} [(f_{i} - μ_{k}) {(f_{i} - μ_{k})}^{T} - Σ_{k}] = 0 \\ \Rightarrow \sum_{i = 1}^{N} γ (z_{i k}) [(f_{i} - μ_{k}) {(f_{i} - μ_{k})}^{T} - Σ_{k}] = 0 \\ \Rightarrow \sum_{i = 1}^{N} γ (z_{i k}) (f_{i} - μ_{k}) {(f_{i} - μ_{k})}^{T} = Σ_{k} \sum_{i = 1}^{N} γ (z_{i k}) \\ \Rightarrow {\overset{⌢}{Σ}}_{k} = \frac{1}{\sum_{i = 1}^{N} γ (z_{i k})} \sum_{i = 1}^{N} γ (z_{i k}) (f_{i} - μ_{k}) {(f_{i} - μ_{k})}^{T} . \end{matrix}

(14)

Step 3: This step requires partial derivatives of

π_{k}

on both sides simultaneously, with a constraint of

\sum_{k = 1}^{K} π_{k} = 1

. Therefore, by introducing the Lagrange operator, we have

ψ = \sum_{i = 1}^{N} l n (\sum_{k = 1}^{K} π_{k} N (f_{i} | μ_{k}, Σ_{k})) + λ (\sum_{k = 1}^{K} π_{k} - 1)

, then

\frac{\partial ψ}{\partial π_{k}} = \sum_{i = 1}^{N} \frac{π_{k} N (f_{i} | μ_{k}, Σ_{k})}{\sum_{k = 1}^{K} π_{k} N (f_{i} | μ_{k}, Σ_{k})} + λ = 0 .

(15)

From Equation (15), we can obtain

\begin{matrix} λ = - \frac{\sum_{i = 1}^{N} γ (z_{i k})}{π_{k}} = - \frac{N_{k}}{π_{k}}, \\ \Rightarrow π_{k} = - \frac{N_{k}}{λ} . \end{matrix}

(16)

According to Equation (16), we have the following derivation:

\sum_{k = 1}^{K} π_{k} = - \sum_{k = 1}^{K} \frac{N_{k}}{λ} = - \frac{1}{λ} \sum_{k = 1}^{K} N_{k} = - \frac{1}{λ} N = 1,

(17)

where

λ = - 1 / N

, so the estimation is

{\hat{π}}_{k} = \frac{N_{k}}{N}

.

In summary, the detailed procedure of estimating GMM parameters using the EM algorithm is available in Algorithm 1. It is necessary to point out that GMM is applicable to data clustering. To be specific, the model parameters of GMM are learned from the unlabeled data. Then, based on the learned GMM model, the posterior probability of each unlabeled sample is calculated, and the group with the highest posterior probability is determined as the corresponding label for that sample.

Algorithm 1 Estimation of GMM parameters using EM

1:: Input: The unlabeled dataset ${f_{1}, f_{2}, \dots, f_{N}}$ and K
2:: Output: $θ = {{\hat{π}}_{1}, {\hat{π}}_{2}, \dots {\hat{π}}_{K}; {\hat{μ}}_{1}, {\hat{μ}}_{2}, \dots {\hat{μ}}_{K}; {\hat{Σ}}_{1}, {\hat{Σ}}_{2}, \dots {\hat{Σ}}_{K}}$
3:: Initialization: $π_{k} = 1 / K$ , $μ_{k} = 0$ , $Σ_{k} = I$
4:: E-step: Calculate the probability $γ_{i k}$ of sample $f_{i}$ from the kth model using the following equation:

$γ_{i k} = \frac{π_{k} N (f_{i} | μ_{k}, Σ_{k})}{\sum_{k = 1}^{K} π_{k} N (f_{i} | μ_{k}, Σ_{k})}, i = 1, 2, \dots, N; k = 1, 2, \dots, K$
5:: M-step: Update ${\hat{π}}_{k}$ , ${\hat{μ}}_{k}$ , ${\hat{Σ}}_{k}$ using the following equations:

${\hat{π}}_{k} = \frac{N_{k}}{N}$

${\hat{μ}}_{k} = \frac{1}{N_{k}} \sum_{i = 1}^{N} γ_{i k} f_{i}$

${\hat{Σ}}_{k} = \frac{1}{N_{k}} \sum_{i = 1}^{N} γ_{i k} (f_{i} - {\hat{μ}}_{k}) {(f_{i} - {\hat{μ}}_{k})}^{⊤}$
6:: Update E-step and M-step alternately until convergence

3. Experiment Verification

This section presents an experimental evaluation to verify the proposed method. First of all, experimental data are described in Section 3.1, where the dataset contains a year’s worth of consumer electricity consumption data for a district in Anhui Province, China. After that, detailed results and analysis of abnormal power consumption detection using the proposed method are provided in Section 3.2. Finally, in Section 3.3, we make comparisons between our method and two other classical methods in terms of the recall rate, precision rate, and F-score and discuss the results in detail.

3.1. Data Description

The dataset used in this experiment contains twelve-month power load data from 5424 power users from Anhui Province, China, including 5356 normal power users and 68 abnormal power users. The proportion of abnormal users is around 1.25%. The sampling interval is 30 min. The power data contains abundant information reflecting the law of user’s power consumption. Thus, each user can be described by a time-domain power signal sample with numerous data points. In total, the collected dataset consists of 5424 samples belonging to two electricity consumption patterns.

It is necessary to point out that there may be some missing values during the sampling process, inevitably leading to incomplete information. To this end, a simple data filling strategy is adopted to fix the missing data. Specifically speaking, each missing point takes the mean value of the two points before and after it. After data recovery, each sample is pre-processed via the min–max strategy using Equation (1), so as to obtain a load sequence data for each sample.

3.2. Results and Analysis

A total of twelve feature indexes (i.e.,

M = 12

) are calculated from each load sequence for each power user, as listed in Table 1. After that, the MCFS algorithm is employed to perform feature selection for them. Regarding the parameter setting for MCFS, ‘0–1’ weighting in Equation (4) is adopted because of its simplicity and the neighbor count is configured to be 5. Empirically, the cluster number C is set to be 5. The twelve MCFS scores for these features are plotted in Figure 2, where it is noted that these values are normalized into the range [0, 1]. From Figure 2, it is clear that the MCFS scores of different features are different, while the scores of F3 and F8 are obviously higher than those of other features. This means that F3 and F8 contain the important information of the whole feature set, and hence both F3 and F8 are selected and determined as the salient and sensitive features, and the rest of the features are simply discarded.

We treat the values of F3 and F8 as the horizontal coordinate and vertical coordinate, respectively, so the load sequence of all users can be mapped as scatter points on a two-dimensional plane, as shown in Figure 3. In this figure, each dot represents a user, with the black dot representing a normal power user and the red dot representing an abnormal power user. It is clear to see that most of the points corresponding to abnormal users are distributed in areas with low density. The core idea of the unsupervised anomaly detection method based on GMM is to give each user the anomaly degree according to the local density.

After feature selection, GMM is utilized to cluster 5424 power user samples to divide the normal and abnormal users. Regarding the parameter setting for GMM, the number of Gaussian distributions, i.e., K, is set as 2, the number of iterations is fixed as 100. The abnormal power consumption detection results are summarized with a confusion matrix listed in Table 2. In this table, TP, FP, FN, and TN stand for the number of true positive samples, false positive samples, false negative samples, and true negative samples, respectively. This table shows that 127 normal users are incorrectly classified as abnormal users, whereas 12 abnormal users are incorrectly classified as normal users. The vast majority of users’ electricity consumption behavior is correctly identified. Correspondingly, it is easy to compute three well-accepted indicators quantitatively evaluating the performance, i.e., the recall rate, precision rate, and F-score, as defined in Equation (18):

\{\begin{matrix} r e c a l l r a t e = \frac{T P}{T P + F N} \times 100 % \\ p r e c i s i o n r a t e = \frac{T P}{T P + F P} \times 100 % \\ F - s c o r e = \frac{2 T P}{2 T P + F N + F P} \times 100 % \end{matrix},

(18)

where the above three indicators are very suitable to evaluate binary classification problems. Through calculation, the recall rate, precision rate, and F-score of the proposed method reach 97.63%, 99.77% and 98.69%, respectively. The results indicate that the proposed method can effectively detect abnormal power users with very high accuracy.

3.3. Comparison with Other Methods

In order to further verify the effectiveness of the proposed approach, two other methods are combined to make comparisons with the proposed method, as graphically displayed in Figure 4. In Method 1, PCA replaces MCFS to perform feature selection, while GMM is still used for anomaly detection. In Method 2, the K-means algorithm replaces GMM to perform anomaly detection, while MCFS remains to conduct feature selection. The performance indicators for the recall rate, the precision rate, and F-score are compared for the three methods.

As shown in Figure 4, the proposed method attains the highest accuracies in terms of the recall rate (97.63%), the precision rate (99.77%), and the F-score (96.69%), and these indicators are significantly better than those for the two compared approaches. This demonstrates the superiority and effectiveness of the proposed method in dealing with the detection of abnormal power consumption. In the proposed method, MCFS is employed to select two salient features through considering the multi-cluster structure of the original high-dimension data, and GMM is responsible for data clustering through considering multi-source distribution characteristics of complex data. These factors work together to ensure the excellent performance of our method. Moreover, Figure 4 exhibits that both Method 1 and the proposed method perform better than Method 2, indicating that GMM is more effective than K-means in data clustering.

4. Conclusions

In this paper, an unsupervised abnormal power consumption detection method considering the highly imbalanced situation between the proportion of normal and abnormal users is proposed. This method utilizes the MCFS algorithm for feature selection and employs GMM for data clustering. The proposed method is evaluated through experiments on the twelve-month power load data from Anhui Province, China. The main findings are as follows: first, two salient features (i.e., the peak–valley difference of monthly electricity and the monthly electricity consumption dispersion coefficient) dominating the detection performance are selected by MCFS rather than PCA, which can clearly and separately classify the normal and abnormal users. Second, GMM is able to accurately detect the abnormal power consumption behavior over the commonly used K-means algorithm. In summary, the results prove that the proposed method is capable of accurately and effectively detecting abnormal power consumption behavior in an unsupervised manner. Moreover, through comparison analysis, the proposed method performs significantly better than traditional approaches in terms of the recall rate, precision rate, and F-score.

Although the proposed method performs well in identifying abnormal power consumption behavior, it still has several limitations. First, it is difficult to accurately detect users with a weakly abnormal behavior regarding electricity consumption. Second, there are too many hyperparameters involved in the proposed method, such as the heat kernel weighting coefficient, the number of selected features, the cluster number, and so on. The improper setting of these parameters might adversely affect the results to a certain degree. Therefore, future works focus on exploring a feature discriminant enhancement mechanism to deal with the problem of a weak classification boundary, as well as an effective adjustment algorithm to adaptively determine the hyperparameters.

Author Contributions

D.L. (Methodology, Funding acquisition, and Project administration); D.H. (Data Curation and Formal analysis); X.C. (Writing—Original Draft); J.D. (Writin—Review and Editing); L.T. (Writing—Original Draft and Visualization); Z.Z. (Conceptualization and Supervision). All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Science and Technology Project of State Grid “Research on Key Technologies for Integrated Operation of Electricity Consumption Information Collection and Application of Network-wide Collaborative Analysis” Grant No. 5108-202218280A-2-255-XG.

Data Availability Statement

The experimental data is private, so please contact us if you would like it.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jiang, R.; Lu, R.; Wang, Y.; Luo, J.; Shen, C.; Shen, X. Energy-theft detection issues for advanced metering infrastructure in smart grid. Tsinghua Sci. Technol. 2014, 19, 105–120. [Google Scholar] [CrossRef]
di Vimercati, S.D.C.; Facchinetti, D.; Foresti, S.; Livraga, G.; Oldani, G.; Paraboschi, S.; Rossi, M.; Samarati, P. Scalable distributed data anonymization for large datasets. IEEE Trans. Big Data 2022, 9, 818–831. [Google Scholar] [CrossRef]
Zhao, B.; Dong, X.; Ren, G.; Liu, J. Optimal user pairing and power allocation in 5G satellite random access networks. IEEE Trans. Wirel. Commun. 2021, 21, 4085–4097. [Google Scholar] [CrossRef]
Zhang, Z.; Wang, B.; Liu, M.; Qin, Y.; Wang, J.; Tian, Y.; Ma, J. Limitation of Reactance Perturbation Strategy Against False Data Injection Attacks on IoT-based Smart Grid. IEEE Internet of Things J. 2023, 11, 11619–11631. [Google Scholar] [CrossRef]
Weber, M.; Turowski, M.; Çakmak, H.K.; Mikut, R.; Kühnapfel, U.; Hagenmeyer, V. Data-driven copy-paste imputation for energy time series. IEEE Trans. Smart Grid 2021, 12, 5409–5419. [Google Scholar] [CrossRef]
Esmael, A.A.; Da Silva, H.H.; Ji, T.; da Silva Torres, R. Non-technical loss detection in power grid using information retrieval approaches: A comparative study. IEEE Access 2021, 9, 40635–40648. [Google Scholar] [CrossRef]
Ayokunle, A.; Misra, S.; Oluranti, J.; Ahuja, R. Technical Losses (Tl) and Non-technical Losses (NTL) in Nigeria. In Proceedings of the Information Systems and Management Science: Conference Proceedings of 3rd International Conference on Information Systems and Management Science (ISMS) 2020; Springer: Berlin/Heidelberg, Germany, 2022; pp. 147–159. [Google Scholar]
Massaferro, P.; Di Martino, J.M.; Fernández, A. Fraud detection in electric power distribution: An approach that maximizes the economic return. IEEE Trans. Power Syst. 2019, 35, 703–710. [Google Scholar] [CrossRef]
Nimmy, K.; Dilraj, M.; Sankaran, S.; Achuthan, K. Leveraging power consumption for anomaly detection on IoT devices in smart homes. J. Ambient Intell. Humaniz. Comput. 2023, 14, 14045–14056. [Google Scholar] [CrossRef]
Fotopoulou, M.; Rakopoulos, D.; Petridis, S.; Drosatos, P. Assessment of smart grid operation under emergency situations. Energy 2024, 287, 129661. [Google Scholar] [CrossRef]
Prettico, G.; Marinopoulos, A.; Vitiello, S. Guiding electricity distribution system investments to improve service quality: A European study. Util. Policy 2022, 77, 101381. [Google Scholar] [CrossRef]
McLoughlin, F.; Duffy, A.; Conlon, M. A clustering approach to domestic electricity load profile characterisation using smart metering data. Appl. Energy 2015, 141, 190–199. [Google Scholar] [CrossRef]
Viegas, J.L.; Esteves, P.R.; Melicio, R.; Mendes, V.; Vieira, S.M. Solutions for detection of non-technical losses in the electricity grid: A review. Renew. Sustain. Energy Rev. 2017, 80, 1256–1268. [Google Scholar] [CrossRef]
Leite, J.B.; Mantovani, J.R.S. Detecting and locating non-technical losses in modern distribution networks. IEEE Trans. Smart Grid 2016, 9, 1023–1032. [Google Scholar] [CrossRef]
Yap, K.S.; Tiong, S.K.; Nagi, J.; Koh, J.S.; Nagi, F. Comparison of supervised learning techniques for non-technical loss detection in power utility. Int. Rev. Comput. Softw. 2012, 7, 626–636. [Google Scholar]
Júnior, L.A.P.; Ramos, C.C.O.; Rodrigues, D.; Pereira, D.R.; de Souza, A.N.; da Costa, K.A.P.; Papa, J.P. Unsupervised non-technical losses identification through optimum-path forest. Electr. Power Syst. Res. 2016, 140, 413–423. [Google Scholar] [CrossRef]
Zhang, W.; Dong, X.; Li, H.; Xu, J.; Wang, D. Unsupervised detection of abnormal electricity consumption behavior based on feature engineering. IEEE Access 2020, 8, 55483–55500. [Google Scholar] [CrossRef]
Lu, Z.; Chu, Q. Feature selection using class-level regularized self-representation. Appl. Intell. 2023, 53, 13130–13144. [Google Scholar] [CrossRef]
Ahmad, R.; Wazirali, R.; Bsoul, Q.; Abu-Ain, T.; Abu-Ain, W. Feature-selection and mutual-clustering approaches to improve DoS detection and maintain WSNs’ lifetime. Sensors 2021, 21, 4821. [Google Scholar] [CrossRef]
Teh, H.Y.; Kevin, I.; Wang, K.; Kempa-Liehr, A.W. Expect the unexpected: Unsupervised feature selection for automated sensor anomaly detection. IEEE Sens. J. 2021, 21, 18033–18046. [Google Scholar] [CrossRef]
Robnik-Šikonja, M.; Kononenko, I. Theoretical and empirical analysis of ReliefF and RReliefF. Mach. Learn. 2003, 53, 23–69. [Google Scholar] [CrossRef]
Si, L.; Wang, Z.; Tan, C.; Liu, X. A feature extraction method based on composite multi-scale permutation entropy and Laplacian score for shearer cutting state recognition. Measurement 2019, 145, 84–93. [Google Scholar] [CrossRef]
Wei, Z.; Wang, Y.; He, S.; Bao, J. A novel intelligent method for bearing fault diagnosis based on affinity propagation clustering and adaptive feature selection. Knowl.-Based Syst. 2017, 116, 1–12. [Google Scholar] [CrossRef]
Ahmed, M.; Seraj, R.; Islam, S.M.S. The k-means algorithm: A comprehensive survey and performance evaluation. Electronics 2020, 9, 1295. [Google Scholar] [CrossRef]
Zhang, H.; Zhu, Z.; Li, H.; He, S. Network Biomarker Detection From Gene Co-Expression Network Using Gaussian Mixture Model Clustering. IEEE/ACM Trans. Comput. Biol. Bioinform. 2023, 20, 3523–3534. [Google Scholar] [CrossRef] [PubMed]
Patel, E.; Kushwaha, D.S. Clustering cloud workloads: K-means vs gaussian mixture model. Procedia Comput. Sci. 2020, 171, 158–167. [Google Scholar] [CrossRef]
Cai, D.; Zhang, C.; He, X. Unsupervised feature selection for multi-cluster data. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–28 July 2010; pp. 333–342. [Google Scholar]
Chijie, Z.; Bin, Z.; Jun, H.; Qiushuo, L.; Rong, Z. Anomaly detection for power consumption patterns based on unsupervised learning. Proc. CSEE 2016, 36, 379–387. [Google Scholar]
Junhui, L.; Jiahui, Z.; Gang, M.; Yanfeng, G.; Gangui, Y.; Songjie, S. Day-ahead optimal scheduling strategy of peak regulation for energy storage considering peak and valley characteristics of load. Proc. CSEE 2020, 40, 128–133. [Google Scholar]
Huang, Q.; Tang, Z.; Weng, X.; He, M.; Liu, F.; Yang, M.; Jin, T. A novel electricity theft detection strategy based on dual-time feature fusion and deep learning methods. Energies 2024, 17, 275. [Google Scholar] [CrossRef]
Belkin, M.; Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv. Neural Inf. Process. Syst. 2001, 14. [Google Scholar]
Guan, Y.; He, S.; Ren, S.; Liu, S.; Li, D. Mixture Gaussian process model with Gaussian mixture distribution for big data. Chemom. Intell. Lab. Syst. 2024, 253, 105201. [Google Scholar] [CrossRef]

Figure 1. The anomaly detection flowchart of electricity consumption behavior.

Figure 2. MCFS scores of twelve features.

Figure 3. Scatter diagram for all customers after mapping using F3 and F8.

Figure 4. Abnormal power consumption detection results for different methods.

Table 1. Descriptions about the extracted features.

Feature Number	Description
F1	Average monthly electricity consumption
F2	Monthly electricity consumption rate
F3	Peak–valley difference of monthly electricity consumption
F4–F7	Proportion of electricity consumption in the four quarters to the annual electricity consumption
F8	Monthly electricity consumption dispersion coefficient
F9	Ratio of monthly electricity consumption dispersion coefficient to annual consumption dispersion coefficient
F10	Difference in electricity consumption between the first half year and the second half year
F11–F12	The uptrend and downtrend indexes

Table 2. Confusion matrix of the abnormal power consumption detection.

	Normal Users (Real)	Abnormal Users (Real)
Normal users (predicted)	TP = 5229	FP = 12
Abnormal users (predicted)	FN = 127	TN = 58

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, D.; Huang, D.; Chen, X.; Dou, J.; Tang, L.; Zhang, Z. An Unsupervised Abnormal Power Consumption Detection Method Combining Multi-Cluster Feature Selection and the Gaussian Mixture Model. Electronics 2024, 13, 3446. https://doi.org/10.3390/electronics13173446

AMA Style

Liu D, Huang D, Chen X, Dou J, Tang L, Zhang Z. An Unsupervised Abnormal Power Consumption Detection Method Combining Multi-Cluster Feature Selection and the Gaussian Mixture Model. Electronics. 2024; 13(17):3446. https://doi.org/10.3390/electronics13173446

Chicago/Turabian Style

Liu, Danhua, Dan Huang, Ximing Chen, Jian Dou, Li Tang, and Zhiqiang Zhang. 2024. "An Unsupervised Abnormal Power Consumption Detection Method Combining Multi-Cluster Feature Selection and the Gaussian Mixture Model" Electronics 13, no. 17: 3446. https://doi.org/10.3390/electronics13173446

APA Style

Liu, D., Huang, D., Chen, X., Dou, J., Tang, L., & Zhang, Z. (2024). An Unsupervised Abnormal Power Consumption Detection Method Combining Multi-Cluster Feature Selection and the Gaussian Mixture Model. Electronics, 13(17), 3446. https://doi.org/10.3390/electronics13173446

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Unsupervised Abnormal Power Consumption Detection Method Combining Multi-Cluster Feature Selection and the Gaussian Mixture Model

Abstract

1. Introduction

2. The Proposed Method

2.1. Data Collection

2.2. Feature Extraction

2.2.1. Morphological Index

2.2.2. Fluctuation Index

2.2.3. Trend Index

2.3. Feature Selection

2.3.1. Sparse Feature Decomposition Subproblem

2.3.2. L1-Regularized Least Squares Subproblem

2.4. Anomaly Detection

2.4.1. Descriptions about GMM

2.4.2. Estimation of GMM Model Parameters Using the Expectation-Maximization Algorithm

3. Experiment Verification

3.1. Data Description

3.2. Results and Analysis

3.3. Comparison with Other Methods

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI