Spatiotemporal Joint Cleaning of Distribution Network Measurement Data Based on Correntropy Criterion with Variable Center Unscented Kalman Filter

Sun, Zhenglong; Liu, Chuanlin; Liu, Baihan; Zhou, Jun; Chen, Chen; Song, Qipeng

doi:10.3390/app12178436

Open AccessArticle

Spatiotemporal Joint Cleaning of Distribution Network Measurement Data Based on Correntropy Criterion with Variable Center Unscented Kalman Filter

by

Zhenglong Sun

^1,*

,

Chuanlin Liu

¹,

Baihan Liu

¹,

Jun Zhou

²,

Chen Chen

² and

Qipeng Song

³

¹

Key Laboratory of Modern Power System Simulation and Control & Renewable Energy Technology, Ministry of Education, Northeast Electric Power University, Jilin 132012, China

²

NARI Group Corporation (State Grid Electric Power Research Institute), Nanjing 211106, China

³

State Grid Shanghai Energy Interconnection Research Institute Co., Ltd., Pudong New District, Shanghai 201203, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(17), 8436; https://doi.org/10.3390/app12178436

Submission received: 31 July 2022 / Revised: 19 August 2022 / Accepted: 19 August 2022 / Published: 24 August 2022

Download

Browse Figures

Versions Notes

Abstract

:

Measurement data cleaning is a key step of edge computing in a distribution network; it is beneficial to improve the state perception and regional autonomy level of a distribution network. According to the temporal and spatial correlation of measurement data in the distribution network, a joint cleaning method of measurement data in a distribution network is proposed based on the correntropy criterion with variable center unscented Kalman filter (CC-VC-UKF). Initially, the mean square error (MSE) in the original unscented Kalman filter (UKF) is replaced by the correntropy criterion with variable center (CC-VC) to improve the accuracy of filtering the measurement data in the distribution network with a non-Gaussian non-zero mean measurement deviation. Then, the measured data of different measuring devices located on the same section of the line are filtered based on the CC-VC-UKF algorithm according to their respective reference time series to improve the signal-to-noise ratio of the measured data. Then, the filtered measured data are filtered and cleaned based on the CC-VC-UKF algorithm according to the space–time joint filtering and cleaning technology. Finally, the method is used to test the measurement data of the distribution network obtained by a power supply company in a city in north China to solve the problem of measurement deviation caused by the existence of space distance. Results show that this method can obtain FTU measurement data with higher precision from network topology based on the filtered TTU measurement data through the media of filtered spatial measurement deviation.

Keywords:

data cleaning; CC-VC; spatiotemporal union; spatial measurement deviation; Kalman algorithm

1. Introduction

With the proposal of the strategic goal of “carbon peaking and carbon neutrality” in China and the large-scale access of new energy power generation, the traditional power production model has been broken, and the planning, management, and dispatch of power production have become increasingly complex [1]. The data of the power grid presents the characteristics of multisource heterogeneity, scattered data, large scale, rapid change, and many types [2]. The acquisition of distribution data is the basis for the analysis of distribution network operation [3]. The use of a feeder terminal unit (FTU), distribution terminal unit (DTU), and the emerging transformer terminal unit (TTU) in recent years has caused the collection of multistate measurement data in the distribution network to be realized in an orderly manner. At the same time, effective data cleaning of these measurement data can provide high-quality data sources for the realization of multitype protection and fault isolation transparency and online safety checking of the distribution network [4,5]. Therefore, the cleaning of abnormal data is greatly important in practical engineering applications [1].

Data cleaning is a method used to detect and eliminate errors and inconsistencies in data [6]. The power grid can be simplified into a physical topology structure composed of power generation nodes and power transmission networks. A strong correlation exists between each node and line through complex physical connections. At the same time, due to the inertia of power system equipment and system inertia, the power system operation data have time correlation in a continuous period [7]. At present, the research on the identification and processing of abnormal data in the power system can be divided into three categories according to the temporal and spatial correlation analysis of the measured data. The first category is to clean the data according to the temporal correlation of the measured data. The network filters the sampled signal from the DC system, and the wavelet neural network can organically combine the time–frequency local characteristics of the wavelet transform with the self-learning and adaptive nature of the neural network. Literature [8] proposed a method to convert the detection of similar and repeated records of text data into the detection of similar and repeated records of its binary strings; it provided the daily load data on mutual cleaning for the distribution network, showing the similarity of the daily cycle to ideas. Literature [9] proposed a power grid data cleaning and fusion algorithm based on a time series similarity measure, which uses symbol aggregation, the Euclidean algorithm, and similar sequences to adjust similarity weighting to complete cleaning; it uses a distributed Kalman filtering algorithm to complete data fusion. However, the cleaning algorithm requires a relatively large amount of calculation and is unsuitable for distributed small computing power equipment.

The second type is to complete data cleaning according to the spatial correlation of measurement data. Literature [10,11] proposed a state estimation model based on a deep neural network and a combined particle filter and the convolutional neural network state of the power system. Estimation models demonstrate the spatial correlation of measurement data. Literature [12] used the correlation between fault remote signaling data to group remote signaling displacement data, and then transform the fault diagnosis problem of remote signaling displacement data into the classification problem of sample data in multidimensional space. Literature [13] proposed a load data repair method based on a collaborative filtering recommendation algorithm, which calculates the load range recommendation degree according to the horizontal correlation of load changes within the distribution network area and realizes rapid data correction for abnormal loads. Literature [14] proposed a system protection method for an integrated pipe gallery power cabin based on multisource heterogeneous data fusion. The method initially integrates multiple distributed data sources from the pipe gallery power cabin using middleware technology for data layer fusion. Then, the data with similar characteristics are divided into subspaces, and the proposed LGP method is used to extract features in each subspace. Finally, the features extracted in each subspace are fused, fully using the spatial correlation of multiple distributed data sources. In fact, using data correlation for data cleaning cannot sufficiently discriminate dirty data from time or space, and the correlation of data should be measured from the dual correlation of time and space [15].

Literature [16] used the two-way comparison method to identify and process abnormal data in spatial power load forecasting. This method uses the load variation between the loads at the previous moment as the criterion for judging whether the data are abnormal for horizontal comparison; it also simultaneously processes the abnormal data. For multiyear data, after judging the abnormal values in the load data of each year at the same time, the average value of the normal data is used as the correction value for horizontal comparison. Literature [17] proposed a multilevel cleaning and identification method of measurement data based on the spatial–temporal correlation characteristics of data. The second-level data identification is carried out according to the time–series correlation of the measurement data, and the convolutional neural network is used to establish a spatial–physical correlation model for the third-level data identification. However, these articles on the spatiotemporal correlation of measurement data ignore the measurement deviation caused by the different spatial positions of the measurement equipment when considering the spatial correlation of the measurement equipment.

Toward this end, in this paper, a joint spatiotemporal cleaning technique for distribution network measurement data based on CC-VC-UKF is proposed. In the presence of spatial measurement deviations, the FTU measurement sequence is filtered based on the CC-VC-UKF algorithm with the TTU measurement data sequence as the reference sequence, and the FTU measurement data with higher accuracy can be obtained from the filtered data sequence, and measurement deviations according to the network topology.

The rest of this paper is organized as follows: The related work is discussed in Section 2; the proposed CC-VC-UKF method is presented in Section 3; combined spatiotemporal cleaning of measurement data in the distribution network based on the CC-VC-UKF algorithm is presented in Section 4; extensive simulations are conducted in Section 5; and finally, the conclusion is given in Section 6.

2. Related Work

Even if the electrical quantities of the same line are measured by multiple measuring devices, the measurement results of the multiple measuring devices may produce measurement deviations with a non-zero mean and non-Gaussian distribution due to the spatial differences in the measuring devices. To solve the problem of a non-zero mean and non-Gaussian distribution measurement deviation caused by the spatial distance of the measurement equipment, the CC-VC-UKF algorithm is designed to measure the same section of a line in different spaces. The measurement data of the measurement equipment at the location are filtered and cleaned, and some abnormal data of the FTU are removed through the data change trend of the TTU under the inherent existence of the spatial measurement deviation to ensure the quality of the FTU measurement data.

In this work, we use the flexible center position of the variable center (VC) cross-correlation entropy and the deviation distribution that matches the non-zero mean value to reduce the calculation force requirement and solve the measurement deviation defect of a non-zero mean and non-Gaussian distribution caused by the spatial inconsistency of measuring equipment. According to the characteristics of [18], a spatiotemporal joint cleaning technology of distribution network measurement data based on the correntropy criterion with variable center unscented Kalman filter (CC-VC-UKF) is proposed. The K-means algorithm is used to cluster and weigh the historical data [19] to obtain the reference sequence for data error correction. The accuracy of FTU measurement data in the presence of a zero mean non-Gaussian measurement deviation provides a new idea for data cleaning a distribution network.

3. CC-VC-UKF

3.1. Correntropy Criterion with Variable Center

Considering two random variables X and Y, the CC [20] is defined as follows:

V (X, Y) = E [k_{σ} (X, Y)] = \int k_{σ} (x, y) d F_{X, Y} (x, y)

(1)

where

k_{σ} (e) = \frac{1}{σ \sqrt{2 π}} \exp (\frac{- e^{2}}{2 σ^{2}})

represents the Gaussian kernel function with the kernel width of

σ

,

e = X - Y

,

E []

denotes the expectation operator.

The sample mean estimator of the correntropy is usually obtained from a finite number of samples

{\{(x_{i}, y_{i})\}}_{i = 1}^{N}

:

{\hat{V}}_{σ} (X, Y) = \frac{1}{N} \sum_{i = 1}^{N} k_{σ} (x_{i}, y_{i})

(2)

The CC-VC [20] is defined as follows:

V_{σ, c} (x, y) = E [k_{σ} (e - c)] = \frac{1}{σ \sqrt{2 π}} E [\exp (- \frac{{(e - c)}^{2}}{2 σ^{2}})]

(3)

where

c \in R

is the center position,

e = X - Y .

Similar to the CC, the CC-VC function can be estimated with limited samples

\hat{V} (X, Y) = \frac{1}{N} \sum_{i = 1}^{N} k_{σ} (e - c)

(4)

The CC cost function with VC can be defined as follows:

\begin{matrix} {\hat{J}}_{V C L} (X, Y) = k_{σ} (0, 0) - \hat{V} (X, Y) \\ = \frac{1}{σ \sqrt{2 π}} \exp (- \frac{c^{2}}{2 σ^{2}}) \\ - \frac{1}{N} \sum_{i = 1}^{N} k_{σ} (e - c) \end{matrix}

(5)

3.2. CC-VC-UKF Algorithm Derivation

With the definition of the CC cost function with VC, the CC-VC-UKF algorithm can be derived. The specific derivation process of the algorithm is shown in Figure 1.

The initial value of the nonlinear measurement system, the state equation, and the measurement equation are set, and then the prior state mean value a and one step prediction covariance matrix b are obtained by the unscented transformation (UT) according to the state variable and estimated deviation covariance matrix at time

i - 1

in the time update. Similarly, in the measurement update, the prior mean and measurement cross covariance matrix can be obtained through UT transformation according to these variables. Finally, the Kalman gain and updated covariance matrix can be obtained through indeterminate−point iteration and updated state estimates [21]. The specific discussion on the establishment of the fixed point iterative equation is as follows:

After establishing the regression model [22], to obtain the optimal state variable, the CC-VC can be expressed as follows:

{\hat{J}}_{C C - V C} (x_{i}) = \frac{1}{σ \sqrt{2 π}} \times [\exp (- \frac{c^{2}}{2 σ^{2}}) - \frac{1}{L} \sum_{k = 1}^{L} \exp (- \frac{{(e_{k, i} - c)}^{2}}{2 σ^{2}})]

(6)

where

e_{k, i}

is the kth element of

e_{i}

:

e_{k, i} = d_{k, i} - w_{k, i} x

(7)

The optimal estimate can be obtained by minimizing Equation (8), as follows:

{\hat{x}}_{i} = \arg \min_{x_{i}} \frac{1}{σ \sqrt{2 π}} \times [\exp (- \frac{c^{2}}{2 σ^{2}}) - \frac{1}{L} \sum_{k = 1}^{L} \exp (- \frac{{(e_{k, i} - c)}^{2}}{2 σ^{2}})]

(8)

Solved by the descending gradient method, let

\frac{\partial {\hat{J}}_{V C L} (x_{i})}{\partial x_{i}} = 0

to obtain the following:

\frac{1}{σ \sqrt{2 π} L} \times \frac{1}{2 σ^{2}} \times \sum_{k = 1}^{L} \exp (- \frac{{(e_{k, i} - c)}^{2}}{2 σ^{2}}) \times (d_{k, i} - w_{k, i} x_{i}) w_{k, i} = 0

(9)

After simplification we can obtain the following:

\begin{matrix} x_{i} = {(\sum_{k = 1}^{L} \exp (- \frac{{(e_{k, i} - c)}^{2}}{2 σ^{2}}) w_{k, i}^{T} w_{k, i})}^{- 1} \\ \times (\sum_{k = 1}^{L} \exp (- \frac{{(e_{k, i} - c)}^{2}}{2 σ^{2}}) w_{k, i}^{T} d_{k, i}) \end{matrix}

(10)

This formula is the fixed point equation of the state variable, which can be expressed as follows:

x_{i} = g (x_{i})

(11)

From Formula (11), the iterative equation of the fixed point can be obtained, as follows:

{\hat{x}}_{i, t + 1} = g ({\hat{x}}_{i, t})

(12)

where

x_{i, t}

is the state estimate value of

x_{i}

at fixed point

k

iterations, and the optimal solution formula of the state value can also be expressed as follows:

x_{i} = {(W_{i}^{T} C_{i} W_{i})}^{- 1} (W_{i}^{T} C_{i} D_{i})

(13)

of which,

C_{i} = (\begin{matrix} C_{x, i} & 0 \\ 0 & C_{y, i} \end{matrix})

(14)

C_{x, i} = d i a g \{\begin{array}{l} \exp \{- \frac{{(e_{1, i} - c)}^{2}}{2 σ^{2}}\}, \\ \dots, \exp \{- \frac{{(e_{n, i} - c)}^{2}}{2 σ^{2}}\} \end{array}\}

(15)

C_{y, i} = d i a g \{\begin{array}{l} \exp \{- \frac{{(e_{n + 1, i} - c)}^{2}}{2 σ^{2}}\}, \\ \dots, \exp \{- \frac{{(e_{n + m, i} - c)}^{2}}{2 σ^{2}}\} \end{array}\}

(16)

After performing the corresponding mathematical operations through the variable substitution and matrix inversion principle, we can obtain the following:

x_{i} = {\hat{x}}_{i| i - 1} + {\tilde{K}}_{i} (y_{i} - {\hat{y}}_{i| i - 1})

(17)

of which,

{\tilde{K}}_{i} = {\tilde{P}}_{i| i - 1} {\bar{H}}_{i}^{T} {({\bar{H}}_{i} {\tilde{P}}_{i| i - 1} {\bar{H}}_{i}^{T} + {\tilde{R}}_{i})}^{- 1}

(18)

{\tilde{P}}_{i| i - 1} = S_{p,}_{i| i - 1} C_{x, i}^{- 1} {(S_{p,}_{i| i - 1})}^{T}

(19)

{\tilde{R}}_{i} = S_{r, i} C_{y, i}^{- 1} S_{r, i}^{T}

(20)

Finally, the derivation of the CC-VC-UKF algorithm can be completed by updating the state variance and the state estimated value through the relevant formula in Figure 1.

3.3. Relationship between the CC-VC and the UKF

When the noise does not satisfy the Gaussian distribution and the mean value is non-zero, the UKF is insensitive to the above noise and is prone to large estimation errors. The center of the variable center cross-correlation entropy criterion can be located at any position [20]; thus, it can effectively match the non-zero mean non-Gaussian deviation distribution. As a result, the VC cross-correlation entropy criterion is used to replace the mean square error criterion in the unscented Kalman filter.

As shown in Figure 2, in the measurement update of the unscented Kalman filtering algorithm, the covariance matrix can be obtained by the weighted summation of the observed and predicted values of the Sigma point set in the time update. According to the mean square error criterion, the Kalman gain can be obtained directly from the following equation:

{\tilde{K}}_{i} = {\tilde{P}}_{x y, i} {\tilde{P}}_{y y, i}^{- 1}

. The CC-VC-UKF algorithm obtains the indeterminate-point iterative equation according to the minimized cross-correlation entropy, as well as the Kalman gain through indeterminate-point iteration.

4. Combined Spatiotemporal Cleaning of Measurement Data in the Distribution Network Based on the CC-VC-UKF Algorithm

Complex physical connections exist between various nodes and lines in the power grid, with a strong correlation [23]. Therefore, the measurement data collected by the measurement device also have a time and space correlation, as shown in Figure 3.

The active daily load curve of FTU measurement for 30 days is obtained from the field measured data, as shown in Figure 4. The change trend of the measured data changes periodically in time. The time series shows that the observation points after a time interval are similar, and the daily load curves of adjacent days are similar. At the same time, network topology constraints and power flow constraints are found between various nodes of the power grid [24]. In addition, as shown in the schematic topology diagram of the power flow in Figure 5, the measurement data have a spatial correlation. Therefore, this study introduces the variable center cross-correlation entropy function into the unscented Kalman filter algorithm and performs filtering and cleaning based on the spatiotemporal correlation of the measurement data, which allows for significant improvements in the accuracy and data quality of the final cleaned data in the presence of non-zero mean non-Gaussian distributed spatial measurement deviations.

The specific process of the multilevel data filtering and cleaning technology based on space and time is shown in Figure 6. In the whole process, all filtering algorithms adopt the CC-VC-UKF algorithm to deal with the measurement deviation of a non-zero mean and non-Gaussian distribution. The measurement data of the measurement equipment at different positions of the line are temporally filtered according to the respective reference time series to improve the quality of the measurement data. After filtering the data by the measuring equipment, the spatial measurement deviation after filtering and the line end measurement data after secondary filtering can be obtained. Finally, according to the network topology relationship in the figure, the measurement data of the head end of the line can be obtained after the spatiotemporal joint filtering and cleaning.

4.1. Obtain a Reference Time Series

Prior to performing spatial filtering and fusion, the measurement data of the measurement equipment should be filtered according to the reference time series to reduce the volatility of the measurement data and improve the signal-to-noise ratio of the measurement data. The reference time series needs to be classified and weighted by the K-means clustering algorithm. The similarity measurement can rapidly determine the measurement equipment of the data with relatively small deviation from the spatial measurement of the data sequence to be cleaned. Therefore, the K-means clustering algorithm and similarity measure are described as follows:

4.1.1. K-Means Clustering

Although a certain periodicity is found in the load-day characteristic of the load, only the changing trend is periodic. The load difference between different days is still relatively large due to the influence of various factors. Therefore, to obtain the reference sequence, the historical data should be clustered and divided into six categories. The errors caused by the clustering can be ignored when the experimental verification data have been divided into six categories.

K-means algorithm clustering is performed on historical data. The K-means algorithm flow is shown in Algorithms 1:

Algorithms 1 K-Means Algorithm Flow
Input:	Historical time series
Output:	Classification result
1	Select the initial k samples as the initial cluster center $a_{1}, a_{2}, \dots \dots, a_{k} .$
2	For each sample $x_{i}$ in the dataset, its distance is calculated to k cluster centers and classified into the class corresponding to the cluster center with the smallest distance.
3	For each category $a_{j}$ , its cluster center $a_{j} = \frac{1}{\|c_{i}\|} \sum x$ is recalculated.
4	Repeat steps 2 and 3 for 10 iterations.

In the K-means algorithm, if the value of

K

is extremely small, that is, the number of classifications is extremely small, then each category will have more data, and the load characteristics of different days cannot be distinguished effectively, thereby affecting the filtering results. If the value of K is extremely high, that is, the number of classifications is extremely high, then each category will have extremely less data, thereby magnifying the influence caused by the accidental measurement error; it will also be detrimental to the filtering result. Therefore, the value principle of K should be as large as possible while satisfying the accuracy requirements. The value method of K is shown in Algorithms 2:

Algorithms 2 $Value Process of K$
Input:	TTU history sequence
Output:	Value of k
1	The TTU sequence of a certain day is used to filter the FTU sequence of the day to obtain the filtered sequence FTU1 and calculate the difference $e$ between the mean of FTU1 and the mean of FTU.
2	A number $x$ is added to the data in the TTU sequence to obtain TTU2 and use TTU2 to filter the FTU to obtain the filtered sequence FTU2; then, the mean value of FTU2 and the mean difference $e^{'}$ of FTU are calculated.
3	$x$ is continuously adjusted until $e^{'} = e$ , and $x$ is written down at this time.
4	The TTU sequences of all days are averaged, and the difference between the maximum value and the minimum value is used to obtain the range R of the TTU sequence.
5	The number of categories k that can be obtained by $K = R / x$ .

4.1.2. Similarity Measurement

The Minkowski distance is one of the most widely used algorithms in similarity measurement. It requires that the two sequences to be compared have the same length and that the points of the two time series correspond one-to-one [23] to rapidly calculate the Minkowski distance between the two sequences.

D (A, C) = \sqrt[p]{\sum_{i = 1}^{n} | a_{i} - c_{i} |^{p}}

(21)

Among them, A and C are the two time series, and

a_{i}

and

c_{i}

are the i-th point of A and C sequences, respectively. When p = 2, the Minkowski distance is the Euclidean distance. The distance between the sequence to be cleaned and the reference sequences of various types is compared, and the one with the smallest distance is considered the reference sequence. In this study, the closest distance to the time sequence to be cleaned is the TTU measurement time series.

4.2. Space–Time Joint Cleaning Based on CC-VC-UKF Algorithm

4.2.1. Filtering Based on Reference Time Series

(1) The reference time series corresponding to FTU and TTU measurement data is obtained by K-means algorithm clustering. Taking FTU as an example, the state equation is fitted according to the reference time series. The specific process of fitting is as follows:

First, the measurement value of the equipment is set as the ordinate value. The measurement time interval of the measurement equipment is every 5 min, with 288 measurement time points in one day, and the unit of the abscissa is set to 1. Thus, the trigonometric function relationship between measurement data values and measurement time points can be fitted, as follows:

x_{k}^{a} = 408.6 + 117.1 \sin (0.02274 k + 0.145 π)

(22)

where

x_{k}^{a}

represents the FTU measurement reference time series data value at time

k

.

(2) After mathematical derivation of the trigonometric function relationship, the measured data relationship at two adjacent moments can be obtained as the fitted state equation, as follows:

x_{k}^{a} = f (x_{k - 1}^{a}) + w_{k}^{a}

(23)

where

x_{k}^{a}

represents the FTU measurement reference time series data value at time

k

,

f (x_{k - 1}^{a})

represents the conversion relationship between the variables at time

k

, and time

k - 1

,

w_{k}^{a}

represents the deviation between the FTU reference time series value at time

k

and the reference time series value predicted from time

k - 1

. Then, the FTU and TTU measurement data are filtered based on the CC-VC-UKF algorithm.

4.2.2. Cleaning Based on the Reference Sequence of TTU Measurement

When the FTU and TTU measurement data are filtered, the FTU measurement data can be cleaned according to the TTU measurement data as a reference sequence. Taking the measured active power as an example, the power flow relationship between FTU and TTU is shown in Figure 7 in the measured data obtained from a city’s electric power company.

Therefore, the equation of state and the measurement equation can be set as follows:

\begin{array}{l} x_{k}^{b} = f (x_{k - 1}^{b}) + w_{k}^{b} \\ y_{k}^{b} = x_{k}^{b} + v_{k}^{b} \end{array}

(24)

where

x_{k}^{b}

represents the TTU measurement data value at time

k

, f represents the conversion formula of the TTU measurement data value at time

k

and time

k - 1

,

y_{k}^{b}

represents the FTU measurement data value at time

k

,

w_{k}^{b}

represents the deviation of the TTU measurement data value between time

k

and time

k - 1

, and

v_{k}^{b}

represents the spatial measurement deviation of time

k

. The distribution histogram of

v_{k}^{b}

is shown in Figure 8, intuitively indicating that

v_{k}^{b}

is a non-zero mean non-Gaussian distribution at this time. In addition, spatial measurement deviations are theoretically inherent, including line losses between measurement devices and measurement errors in the measurement devices themselves. However, the CC-VC-UKF algorithm proposed in this paper is not designed to eliminate these deviations, but rather to eliminate some of the FTU anomalies through the trending of the TTU data in the form of CC-VC-UKF-based filtering.

After filtering by the CC-VC-UKF algorithm,

x_{k}^{'}

is set as the measured data value of the TTU measurement data after filtering, and

v_{k}^{'}

is the spatial measurement deviation after filtering. Then, the measured data value of the FTU after joint spatiotemporal cleaning can be expressed as

y_{k}^{'} = x_{k}^{'} + v_{k}^{'}

.

5. Numerical Results

5.1. Original Data Analysis

The data types and data volumes of different types of power grids vary greatly. For the distribution network data, the distribution network measurement data received from a municipal bureau in a certain northern region are obtained, the sampling period of the measuring equipment is five minutes, and a daily load data series contains 288 points. It also includes data sets of various measurement equipment, such as FTU and TTU.

The communication of the measuring equipment in the distribution network is incompletely reliable; thus, the error data types are mostly 0 values. Figure 9 shows an example of FTU measurement data for a day in the distribution network, where the red and blue lines indicate the zero and non-zero values of the measurement data, respectively. The FTU data of the distribution network obtained this time will be zero every day from 1:00 to 4:00 in the morning.

After obtaining the raw data of FTU, the 0 value in the early morning must be preprocessed first. No particularly good interpolation algorithm is available due to the large number and concentration of missing values. However, the load is relatively stable in the early morning, and the variation is small. Thus, the method of inserting the nearest value can be used to repair the 0 value in the early morning.

In the same way, the long-term on-site measured data of a certain section of the distribution network can be preprocessed, and the FTU at point A and different measurement equipment at other points can measure the active daily load curve for one month. Taking FTU as an example, the K-means algorithm clustering is performed on its active daily load curve, and the classification of the obtained active daily load curve is shown in Figure 10.

5.2. Experimental Result

Figure 6 shows the third type of load curve as an example, indicating the active daily load curve for the 12th day to the 19th day, 21st day, and 22nd day for the FTU measurement. The FTU measurement active daily load curve on the 14th day is considered the active power curve to be cleaned up. Then, the average value of the remaining active daily load curves is obtained as a reference time series. Finally, the active power curve to be cleaned is filtered based on the CC-VC-UKF algorithm. As shown in Figure 11, the random disturbance of the filtered measurement data is reduced, and the data quality is improved.

In the same way, the TTU measurement active power curve determined by the shortest Euclidean distance is filtered based on its reference time series, and the result is shown in Figure 12.

According to the spatiotemporal joint cleaning technique described in Section 3.2, the FTU measurement data sequence is filtered based on the CC-VC-UKF algorithm with the TTU measurement data as the reference data sequence. The spatial measurement deviation before and after filtering is compared, as shown in Figure 13.

Finally, after the joint space–time filtering and cleaning, the active power curve measured by the FTU is shown in Figure 14.

The calculation formula of the spatial measurement deviation is

v_{k}^{c} = y_{k}^{c} - x_{k}^{c}

, where

x_{k}^{c}

represents the TTU measurement data value at time

k

, and

y_{k}^{c}

represents the FTU measurement data value at time

k

. At the same time, the power flow relationship between FTU and TTU in Figure 7 shows that the spatial measurement deviation between FTU and TTU should theoretically be positive. The power system will be attacked by dirty data to varying degrees due to the influence of the actual site, thereby causing a large deviation in the measurement data of the measurement equipment. Thus, the spatial measurement deviation may be negative. This study selects the measurement data of FTU and TTU from 3:00 to 7:00 on the 30th day, and the situation is that the spatial measurement deviation is negative at this time. The comparison of spatial measurement deviation before and after spatiotemporal joint cleaning using the CC-VC-UKF algorithm is shown in Figure 15.

The figure shows that after filtering based on the CC-VC-UKF algorithm, the negative part of the spatial measurement deviation that deviates from the normal range is transformed into a more reasonable normal value, and the volatility of measurement deviation is reduced, showing that the proposed algorithm has a better filtering effect on spatial measurement deviation under a dirty data attack.

5.3. Experimental Result Verification

The original data sequence with no more than 1% vacancy is selected as the experimental set, and the data are set to be empty; the data sequence is cleaned according to a random proportion. Then, the proposed method is used to clean the data sequence, and the null values in the data sequence are used as the predicted values of the gap after data cleaning. The gap predictions are compared with the original values, and the mean absolute percentage error (MAPE), root mean square error (RMSE) [25], mean absolute error (MAE) [11], and signal-to-noise ratio (SNR) are calculated. The formula for calculating SNR is as follows:

S N R = \frac{\sum (y_{i} - \bar{y})^{2} - \sum (y_{i} - {\hat{y}}_{i})^{2}}{\sum (y_{i} - {\hat{y}}_{i})^{2}}

(25)

where

y_{i}

represents the original data value corresponding to the moment when the set data value is empty,

{\hat{y}}_{i}

represents the predicted value of the vacancy, and

\bar{y}

represents the average value of the original data value.

This study compares the data cleaning of the random forest algorithm [26], the LSTM algorithm [27], and the deep neural network algorithm [28]. It also applies the CC-VC algorithm to the traditional extended Kalman filter (EKF). The comparative experimental results are shown in Table 1. In addition, to display the comparison results more clearly, the corresponding data cleaning indicator radar chart is shown in Figure 16. The figure shows that compared with the advanced algorithms published in the past two years, the proposed data cleaning algorithm still has certain advantages in improving data cleaning accuracy and data quality, which is helpful for the processing and storage of measurement data by subsequent power distribution terminals.

In addition, for the measurement of data sequences at different measurement time points, the Gaussian filtering algorithm, the wavelet threshold algorithm [5], and the algorithm in this study are used to filter the calculated time consumption. The data are completed on a computer with a platform CPU of 2.1 GHZ and a memory of 16 GB to study the time complexity of the proposed algorithm. The results are shown in Table 2.

The table shows that when the measurement time point is 60, the CC-VC-UKF algorithm consumes about 95% more time than the wavelet threshold algorithm. When the measurement time point is 576, the corresponding consumption time increases by about 28%. This finding shows that with the increase in measurement time points, the time complexity gap of the CC-VC-UKF algorithm compared with other algorithms gradually narrows. When the scale of the measurement data is sufficiently large, the difference of the time complexity of the algorithm compared with other algorithms can be ignored.

In addition, given that the proposed algorithm is aimed at offline data, it can satisfy the data cleaning requirements without considering the real-time performance.

6. Conclusions

A spatiotemporal joint cleaning technology based on CC-VC-UKF is proposed to solve the problem of inconsistent measurement data caused by the non-zero mean and non-Gaussian distribution measurement deviation caused by the spatial distance of the measurement equipment. The measurement data of each piece of measurement equipment is filtered based on CC-VC-UKF according to the temporal correlation of the measurement data. Then, taking the filtered data as the reference sequence, the data sequence to be cleaned is filtered based on the CC-VC-UKF algorithm according to the spatial correlation of the measured data. Through simulation verification, the following conclusions are drawn:

Compared with other advanced algorithms, such as LSTM and deep neural network, the spatiotemporal joint cleaning technology based on the proposed CC-VC-UKF algorithm improves the accuracy and quality of data cleaning to a certain extent.
For the non-zero mean non-Gaussian distribution measurement deviation caused by the spatial distance of the measurement equipment and the measurement deviation caused by negative data input, the spatiotemporal joint cleaning technology based on CC-VC-UKF can better reduce the fluctuation of the measurement deviation and can convert the negative part of the measurement deviation from the normal range into a more reasonable normal value.
In this study, the CC-VC algorithm is also applied to the traditional EKF, which ignores the influence of high-order terms in the linearization process. Thus, the experimental results show that applying the CC-VC algorithm to UKF can obtain higher accuracy data than applying it to EKF.

Author Contributions

Conceptualization: Z.S.; methodology: Z.S. and C.L.; validation: Z.S., C.L. and B.L.; data curation: Z.S., C.L. and B.L.; writing—review and editing: Z.S., C.L. and B.L.; project administration: Z.S., J.Z., C.C. and Q.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by State Grid Corporation of China headquarters science and technology project under grant number 5500-202055466A-0-0-00.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Huang, M.; Wei, Z.; Pau, M.; Ponci, F.; Sun, G. Interval state estimation for low-voltage distribution systems based on smart meter data. IEEE Trans. Instrum. Meas. 2018, 68, 3090–3099. [Google Scholar] [CrossRef]
Hongda, Z.; Jian, Y.; Xiong, L.; Li, Y.; Jiaying, W.; Yingjun, H. Research on Remote Diagnosis Model of Metering Point Anomaly of 10kV Distribution Line Based on AMI Data. In Proceedings of the 2021 IEEE 4th International Conference on Automation, Electronics and Electrical Engineering (AUTEEE), Shenyang, China, 19–21 November 2021; pp. 55–60. [Google Scholar]
Louis, A.; Ledwich, G.; Walker, G.; Mishra, Y. Measurement sensitivity and estimation error in distribution system state estimation using augmented complex Kalman filter. J. Mod. Power Syst. Clean Energy 2020, 8, 657–668. [Google Scholar] [CrossRef]
Davarzani, S.; Pisica, I.; Taylor, G.A. Study of missing meter data impact on domestic load profiles clustering and characterization. In Proceedings of the 2016 51st International Universities Power Engineering Conference (UPEC), Coimbra, Portugal, 6–9 September 2016; pp. 1–6. [Google Scholar]
Long, H.; Wu, Z.; Fang, C.; Gu, W.; Wei, X.; Zhan, H. Cyber-attack detection strategy based on distribution system state estimation. J. Mod. Power Syst. Clean Energy 2020, 8, 669–678. [Google Scholar] [CrossRef]
Li, Y.; Shi, F.; Zhang, H. Panoramic synchronous measurement system for wide-area power system based on the cloud computing. In Proceedings of the 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA), Wuhan, China, 31 May–2 June 2018; pp. 764–768. [Google Scholar]
Hu, Y.; Wang, Y.; Kuang, J. Performance evaluation of a clock synchronization over fiber data links for large experiments. In Proceedings of the 2019 IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), Manchester, UK, 26 October–2 November 2019; pp. 1–2. [Google Scholar]
Deltuva, R.; Lukočius, R.; Otas, K. Dynamic Stability Analysis of Isolated Power System. Appl. Sci. 2022, 12, 7220. [Google Scholar] [CrossRef]
Qu, Y.; Yang, T.; Li, T.; Zhan, Y.; Fu, S. Path Tracking of Underground Mining Boom Roadheader Combining BP Neural Network and State Estimation. Appl. Sci. 2022, 12, 5165. [Google Scholar] [CrossRef]
Liu, W.; Pokharel, P.P.; Principe, J.C. Correntropy: Properties and applications in non-Gaussian signal processing. IEEE Trans. Signal Process. 2007, 55, 5286–5298. [Google Scholar] [CrossRef]
Zhao, H.; Tian, B. Measurement. Robust Power System Forecasting-Aided State Estimation With Generalized Maximum Mixture Correntropy Unscented Kalman Filter. IEEE Trans. Instrum. Meas. 2022, 71, 1–10. [Google Scholar]
Qin, Z.; Luan, Z.; Yang, S. Generalized load modeling considering random proportion of wind power connected to distribution networks. In Proceedings of the 2016 International Conference on Smart Grid and Clean Energy Technologies (ICSGCE), Chengdu, China, 19–22 October 2016; pp. 252–256. [Google Scholar]
Song, S.; Wei, H.; Lin, Y.; Wang, C.; Gómez-Expósito, A. A Holistic State Estimation Framework for Active Distribution Network with Battery Energy Storage System. J. Mod. Power Syst. Clean Energy 2021, 10, 627–636. [Google Scholar] [CrossRef]
Wang, C.; Mu, G.; Cao, Y. A method for cleaning power grid operation data based on spatiotemporal correlation constraints. IEEE Access 2020, 8, 224741–224749. [Google Scholar] [CrossRef]
Ai, M.; Chen, N.; Ge, X.; Li, Z.; Pu, T.; Li, Y.; Chen, Z.; Lin, P.; Wu, W. A CEP based ETL method of active distribution network operation monitoring and controlling signal data. In Proceedings of the CIRED Workshop 2016, Helsinki, Finland, 14–15 June 2016. [Google Scholar]
Modarresi, M.S.; Huang, T.; Ming, H.; Xie, L. Robust phase detection in distribution systems. In Proceedings of the 2017 IEEE Texas Power and Energy Conference (TPEC), College Station, TX, USA, 9–10 February 2017; pp. 1–5. [Google Scholar]
Motepe, S.; Hassan, A.N.; Stopforth, R. South African distribution networks load forecasting using ANFIS. In Proceedings of the 2018 IEEE International Conference on Power Electronics, Drives and Energy Systems (PEDES), Chennai, India, 18–21 December 2018; pp. 1–6. [Google Scholar]
Sun, Z.; Liu, C.; Peng, S. Maximum Correntropy with Variable Center Unscented Kalman Filter for Robust Power System State Estimation. Entropy 2022, 24, 516. [Google Scholar] [CrossRef] [PubMed]
Khodaparast, J.; Khederzadeh, M. Least square and Kalman based methods for dynamic phasor estimation: A review. Prot. Control Mod. Power Syst. 2017, 2, 1–18. [Google Scholar] [CrossRef]
Li, J.; Gao, M.; Liu, B.; Cai, Y. Forecasting Aided Distribution Network State Estimation Using Mixed μPMU-RTU Measurements. IEEE Syst. J. 2022, 1–11. [Google Scholar] [CrossRef]
Ma, W.; Qiu, J.; Liu, X.; Xiao, G.; Duan, J.; Chen, B. Unscented Kalman filter with generalized correntropy loss for robust power system forecasting-aided state estimation. IEEE Trans. Ind. Inform. 2019, 15, 6091–6100. [Google Scholar] [CrossRef]
Xu, C.; Li, Q.; Ying, D. An Effective Adaptive Combination Strategy for Distributed Learning Network. Appl. Sci. 2021, 11, 5723. [Google Scholar] [CrossRef]
Jerouschek, D.; Tan, Ö.; Kennel, R.; Taskiran, A. Data preparation and training methodology for modeling lithium-ion batteries using a long short-term memory neural network for mild-hybrid vehicle applications. Appl. Sci. 2020, 10, 7880. [Google Scholar] [CrossRef]
Liang, W.; Zhu, Y.; Li, G. Power Grid Cloud Resource Status Data Compression Method Based on Deep-learning. Recent Adv. Comput. Sci. Commun. 2021, 14, 941–951. [Google Scholar] [CrossRef]
Zhang, B.; Zhang, W.; Jin, X.; Hong, W. Prediction of Cooling Load of District Cooling System Based on TRNSYS. J. Northeast Electr. Power Univ. 2021, 41, 105–110. [Google Scholar]
Yan, X.; Jin, Y.; Xu, Y.; Li, R. Wind turbine generator fault detection based on multi-layer neural network and random forest algorithm. In Proceedings of the 2019 IEEE Innovative Smart Grid Technologies-Asia (ISGT Asia), Chengdu, China, 21–24 May 2019; pp. 4132–4136. [Google Scholar]
Daoud, N.; Eltahan, M.; Elhennawi, A. Aerosol Optical Depth Forecast over Global Dust Belt Based on LSTM, CNN-LSTM, CONV-LSTM and FFT Algorithms. In Proceedings of the IEEE EUROCON 2021-19th International Conference on Smart Technologies, Lviv, Ukraine, 6–8 July 2021; pp. 186–191. [Google Scholar]
Li, J.; Wang, P.; Lin, L.; Shi, W.; Li, X.; Wang, J.; Zhang, P. Intelligent diagnosis and recognition method of GIS partial discharge data map based on deep learning. In Proceedings of the 2021 Power System and Green Energy Conference (PSGEC), Shanghai, China, 20–22 August 2021; pp. 253–256. [Google Scholar]

Figure 1. Schemes following the same format.

Figure 2. UKF algorithm derivation flowchart.

Figure 3. Temporal and spatial correlation of distribution network measurement data.

Figure 4. FTU that measures the 30-day active power daily load curve.

Figure 5. Power flow topology.

Figure 6. Flow chart of multistage data filtering technology based on space–time.

Figure 7. Power flow relationship between FTU and TTU.

Figure 8. Histogram of spatial measurement deviation prior to filtering.

Figure 9. Example of FTU generating data errors.

Figure 10. Classification of active daily load curve after cluster analysis.

Figure 11. Active power curve to be cleaned based on reference time series filtering.

Figure 12. TTU measured active power curve based on reference time series filtering.

Figure 13. Comparison of the 29th sky interval measurement deviation before and after filtering.

Figure 14. FTU-measured data curve after space–time combined filtering cleaning.

Figure 15. Comparison of the 30th sky interval measurement deviation before and after filtering.

Figure 16. Comparison results of different algorithms.

Table 1. Comparison of filtering results of different algorithms.

Algorithm	MAPE	RMSE(MW)	MAE(MW)	SNR
Random Forest	9.4527%	7.7264	8.0462	8.7561 × 10⁻⁵
LSTM	8.7819%	7.3861	7.5257	8.9366 × 10⁻⁵
Deep Neural Network	8.4667%	6.9628	6.4102	9.3422 × 10⁻⁵
CC-VC-EKF	10.0353%	8.4243	7.6343	9.2547 × 10⁻⁵
Algorithm of this paper	7.8918%	6.8967	5.8653	9.8751 × 10⁻⁵

Table 2. Time used for filtering and cleaning by different algorithms.

Measurement Time Point	Time Consumed/s
Measurement Time Point	Wavelet Threshold	Gaussian Filter	CC-VC-UKF
60	0.64	0.97	1.25
144	1.35	2.08	2.39
288	3.51	4.14	5.27
576	6.32	7.96	8.07

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, Z.; Liu, C.; Liu, B.; Zhou, J.; Chen, C.; Song, Q. Spatiotemporal Joint Cleaning of Distribution Network Measurement Data Based on Correntropy Criterion with Variable Center Unscented Kalman Filter. Appl. Sci. 2022, 12, 8436. https://doi.org/10.3390/app12178436

AMA Style

Sun Z, Liu C, Liu B, Zhou J, Chen C, Song Q. Spatiotemporal Joint Cleaning of Distribution Network Measurement Data Based on Correntropy Criterion with Variable Center Unscented Kalman Filter. Applied Sciences. 2022; 12(17):8436. https://doi.org/10.3390/app12178436

Chicago/Turabian Style

Sun, Zhenglong, Chuanlin Liu, Baihan Liu, Jun Zhou, Chen Chen, and Qipeng Song. 2022. "Spatiotemporal Joint Cleaning of Distribution Network Measurement Data Based on Correntropy Criterion with Variable Center Unscented Kalman Filter" Applied Sciences 12, no. 17: 8436. https://doi.org/10.3390/app12178436

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Spatiotemporal Joint Cleaning of Distribution Network Measurement Data Based on Correntropy Criterion with Variable Center Unscented Kalman Filter

Abstract

1. Introduction

2. Related Work

3. CC-VC-UKF

3.1. Correntropy Criterion with Variable Center

3.2. CC-VC-UKF Algorithm Derivation

3.3. Relationship between the CC-VC and the UKF

4. Combined Spatiotemporal Cleaning of Measurement Data in the Distribution Network Based on the CC-VC-UKF Algorithm

4.1. Obtain a Reference Time Series

4.1.1. K-Means Clustering

4.1.2. Similarity Measurement

4.2. Space–Time Joint Cleaning Based on CC-VC-UKF Algorithm

4.2.1. Filtering Based on Reference Time Series

4.2.2. Cleaning Based on the Reference Sequence of TTU Measurement

5. Numerical Results

5.1. Original Data Analysis

5.2. Experimental Result

5.3. Experimental Result Verification

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI