Detection of False Data Injection Attacks on Smart Grids Based on A-BiTG Approach

He, Wei; Liu, Weifeng; Wen, Chenglin; Yang, Qingqing

doi:10.3390/electronics13101938

Open AccessArticle

Detection of False Data Injection Attacks on Smart Grids Based on A-BiTG Approach

by

Wei He

¹,

Weifeng Liu

¹

,

Chenglin Wen

^2,* and

Qingqing Yang

¹

School of Electrical and Control Engineering, Shaanxi University of Science and Technology, Xi’an 710021, China

²

School of Automation, Guangdong University of Petrochemical Technology, Maoming 525000, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(10), 1938; https://doi.org/10.3390/electronics13101938

Submission received: 10 April 2024 / Revised: 10 May 2024 / Accepted: 13 May 2024 / Published: 15 May 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

A false data injection attack (FDIA) is the main attack method that threatens the security of smart grids. FDIAs mislead the control center to make wrong judgments by modifying the measurement data of the power grid system. Therefore, the effective and accurate detection of FDIAs is crucial for the safe operation of smart grids. However, the current deep learning-based methods do not fully exploit the short-term local characteristics and long-term dependencies of power grid data and have poor correlation with past and future time series information, resulting in a lack of credibility in the detection results. In view of this, an FDIA detection model combining a bidirectional temporal convolutional network and bidirectional gated recurrent unit with an attention mechanism (A-BiTG) was proposed. The proposed model utilizes a bidirectional time convolutional network (BiTCN) and bidirectional gated recurrent unit (BiGRU) to consider past and future temporal information in the grid. This enhances the ability of the model to capture long-term dependencies and extract features, while also solving the model’s problem of exploding and vanishing gradients. In addition, an attention mechanism (AM) was added to dynamically assign weights to the extracted feature information and retain the most valuable features to improve the detection accuracy of the model. Finally, the proposed method was compared with existing methods on the IEEE 14-bus and IEEE 118-bus test systems. The results show that the proposed detection model is more robust and superior under different noise environments and FDIA signals with different intensities.

Keywords:

smart grid; bidirectional temporal convolutional network; bidirectional gated recurrent unit; attention mechanism; FDIAs

1. Introduction

With the rapid development of the Internet, artificial intelligence, and cyber–physical system (CPS) technology, the traditional power grid has evolved into a highly informationized smart grid that integrates sensing, control, computation, decision-making, and communication functions into one. Now, a typical CPS merges physical power transmission systems with computer networks. The optimization and standardization of this smart grid are of vital importance to ensure its safe and stable operation [1,2]. However, due to the large-scale integration of smart terminals and the transmission of data through networks by numerous smart devices, the continuous improvement of the intelligent level of the power grid also provides rich attack entrances for malicious attackers. This makes the grid more vulnerable to network attacks, such as denial-of-service attacks [3], replay attacks [4], topological tampering attacks [5], and false data injection attacks (FDIAs) [6].

FDIAs are a new attack mode that maliciously tampers with the measurement data collected by the smart grid supervisory control and data acquisition (SCADA) system. This poses a serious threat to the safe operation of the smart grid system. It is important to ensure the integrity of the measurement data to maintain the safe operation of the smart grid system [7]. FDIAs of a smart grid are shown in Figure 1. In this attack scenario, the attacker will implant well-designed FDIAs into the data transmission channel or sensor equipment of the smart grid. Subsequently, the SCADA system transmits the information data collected by sensors, instruments, and remote terminal equipment to the control center. The control center estimates the state of the collected measurement data, and then issues corresponding instructions to the power system. However, the injected FDIAs affect the results of the state estimation and successfully avoid the traditional bad data detection (BDD) mechanism of the power system. Finally, the control center will issue misleading instructions, which will lead to serious errors in power generation, transmission, distribution, power consumption, and other aspects of the smart grid, which evolves into instability or even paralysis of the smart grid system.

Ensuring the secure and effective operation of smart grids has become an urgent and critical issue, with BDD being a common detection method. However, research has shown that traditional BDD methods may have certain limitations when it comes to detecting carefully designed FDIAs by attackers [8]. These highly covert attacks are difficult to detect effectively. As FDIAs pose a significant threat to the secure operation of smart grids as a new type of network attack, establishing an effective and efficient FDIA detection mechanism is crucial [9]. In view of this, this study considers a new approach for the effective detection of FDIAs present in smart grids.

Specifically, the main contributions of this article are as follows:

For the first time, an A-BiTG model is proposed for the detection of FDIAs in power grids. The A-BiTG model is able to effectively capture the diversity of local information data in power grids and enhance the model’s ability to perceive dynamic changes in the time series. Meanwhile, the model solves the common problems of gradient vanishing and explosion in neural networks and also helps the model to better capture the long-term dependencies between input information.
Secondly, the proposed BiTCN-BiGRU parallel structure enhances the parallel processing capability of the model, enabling it to manage multiple input streams simultaneously and improving the computational speed. Furthermore, the integration of the attention layer into the BiGRU layer helps to dynamically adjust the weights of each time step in the learning process, which enhances the expressive ability of the model and improves the accuracy of the model detection.
Finally, this study conducted experiments on the IEEE 14-bus and IEEE 118-bus datasets to evaluate the performance of the A-BiTG model. The experimental results indicate that compared to some mainstream detection models, the proposed A-BiTG model demonstrates a superior detection accuracy and precision when facing covert attacks. It also exhibits lower false positive rates, a faster convergence speed of neural networks, and better stability and robustness.

The structure of this paper is as follows. Section 2 presents related work. Section 3 describes relevant models in power systems. Section 4 presents the proposed FDIA detection model in detail, including each submodule and the overall detection process. As discussed in Section 5, a large number of simulation experiments were conducted to validate the effectiveness of the model. The conclusions are provided in Section 6.

2. Related Work

Currently, FDIA detection studies for smart grids are divided into two main categories: model-driven algorithms and data-driven algorithms. Model-driven algorithms mainly detect potential attacks by monitoring abnormal changes in model outputs or intermediate states. Among them, ref. [10] proposed a network attack detection strategy based on distributed system state estimation (DSSE) for distribution system state estimation. This method analyzes detection through changes in the state estimation of the power grid model, addressing the issue of network attack detection in distributed systems. Ref. [11] introduced an improved unscented Kalman filter (UKF) for FDIA detection in the power grid, which exhibits a strong detection accuracy for dynamic models. Refs. [12,13] presented detection schemes based on a square root cubature Kalman filter (SRCKF) and square root extended Kalman filter (SREKF). These methods use filtering for power grid state detection to determine whether FDIAs have occurred. In [14], Wen et al. proposed a multi-level fine fingerprint authentication method that compares high-order information extracted from measurement data with data from a secure database to detect FDIA signals. Additionally, in [15], Rashed et al. proposed a new Kalman filter-based state estimator based on the traditional weighted least squares-based distributed state estimation (DSE), which uses the deviation of the estimator’s output from the DSE to detect FDIAs in the grid system. While model-driven methods have shown excellent performances in transmission systems, they may encounter failures in distribution systems [16]. Moreover, model-based detection methods can be affected by measurement noise, communication noise, and telemetry noise, leading to a decrease in the accuracy of the grid state estimation, which affects the detection performance of FDIAs in smart grids.

As the global wave of industrialization and information technology advances, GPU computing power continues to advance, providing strong support and assurance for data-driven algorithm research [17]. In [18], Zhao et al. proposed a binary classification detection model based on long short-term memory (LSTM) and used the dropout method to prevent overfitting, aiming to improve the detection accuracy of FDIAs, achieving F1-scores of 82% and 75% for IEEE 14-bus and IEEE 118-bus systems, respectively. In [19], Zhang et al. combined multi-layer convolutional neural networks (CNNs) for spatial feature extraction and designed a multi-label classifier to detect FDIA signals. In [20], Wahid et al. designed an FDIA detection method based on CNN-LSTM, combining LSTM with a CNN, and utilized this model to extract mixed features from different multisource sensor data in the power grid for FDIA anomaly signal detection. In [21], Y. Raghuvamsi et al. used temporal convolutional networks (TCNs) for spatiotemporal feature extraction from power grid data for FDIA detection. However, in a unidirectional TCN, only forward convolution calculations of input sequences are considered, neglecting the impact of backward information on prediction results, which can affect the model’s detection performance. In [22], Su et al. proposed an interpretable grid-based method for detecting FDIAs; the method is based on the multihead graph attention network and temporal convolutional network (MGAT-TCN) model, which fully exploits the temporal and spatial feature information of the grid. This approach achieves interpretability of the model in the spatial dimension. Ref. [23] introduced a self-encoder specific neural network architecture to address the detection of cyber-attacks in power grids, which achieved good detection results, in order to overcome the challenges faced by model-based or data-driven machine learning methods in silos. In [24], Jabbari Zideh et al. proposed a physical information machine learning approach for anomaly detection, classification, and other operations on data. Furthermore, federated learning (FL) is regarded as an efficient technique for using training data and enhancing model generalization in a distributed environment of smart grids. Therefore, this is also seen as one of the effective means applied to address FDIAs in smart grids. Li et al. [25] designed an end-to-end FL-based cloud collaboration mechanism to extract spatio-temporal features of data from grid data and jointly detect local FDIA data. The traditional neural networks mentioned above do not consider issues such as data transmission security and privacy. To address this, in [26], Li et al. combined federated learning with a CNN-GRU and designed a secure communication protocol based on the Paillier cryptosystem. This not only ensures detection accuracy but also enhances the security and privacy of the training model. But this approach increases the risk of overfitting.

Although some of the current mainstream methods are effective in detecting FDIAs in smart grids, they still face some problems and challenges. Among them, most of the model-based methods are limited to simple linear models, which have poor robustness when dealing with dynamic data and high modeling costs and complexity. In addition, with the explosive growth of smart grid data volume, data-driven based detection methods do not consider both historical and future information of the time series, leading to poor model feature extraction and unsatisfactory final detection results. Therefore, we propose an A-BiTG detection model for FDIAs in massive amounts of grid data. This model can simultaneously consider the past and future information of the smart grid and utilize convolution to extract the temporal features of the data. Then, the weights of each time step are dynamically adjusted by the attention mechanism, which effectively improves the detection accuracy and efficiency. This innovation aims to provide strong security for the safe operation of the smart grid.

3. Model Description of Power Systems

In this section, we will introduce the state estimation model of the power system, the BDD detection mechanism, and the relevant models of FDIAs.

3.1. Power System State Estimation

State estimation is a fundamental analytical tool for energy management systems (EMSs) to achieve reliable power system detection. The power system’s operating state can be inferred from the measurement values obtained from various sensors in the grid, thereby controlling the safe and orderly operation of the power system [1]. Given the known network topology and parameters of the instruments in the power system, including bus voltages [7], active power injections, and reactive power injections, the measurement equation based on AC nonlinearity is as follows:

Z = h (x) + e

(1)

where

Z \in R^{m \times 1}

represents the measurement vector, with m as the dimension of the measurement vector;

x \in R^{n \times 1}

represents the system state vector, with n as the dimension of the state vector; and

e \in R^{m \times 1}

represents the measurement noise vector and is assumed to follow a Gaussian distribution. Additionally,

h (x)

represents the nonlinear relationship between the measurement values and the state variables.

The measurement vector Z, obtained from sensors and the state vector x are denoted as follows:

Z = {(P_{i}, Q_{i}, P_{i j}, Q_{i j})}^{T}

(2)

x = {(v_{i}, θ_{i})}^{T}

(3)

where

P_{i}

represents the real power injection at bus i, and

Q_{i}

represents the reactive power injection at bus i.

P_{i j}

represents the real power flow from bus i to bus j, and

Q_{i j}

represents the reactive power flow from bus i to bus j.

v_{i}

and

θ_{i}

are the voltage magnitude and voltage phase angle at bus i, respectively.

The measurement function

h (x)

reflects the nonlinear relationship between the state values and the measured values. It can be derived from the expressions for active power, reactive power, active flow, and reactive flow in the electrical line topology [27], as follows:

h (x) = [\begin{matrix} P_{i} (v_{i}, v_{j}, θ_{i}, θ_{j}) \\ Q_{i} (v_{i}, v_{j}, θ_{i}, θ_{j}) \\ P_{i j} (v_{i}, v_{j}, θ_{i}, θ_{j}) \\ Q_{i j} (v_{i}, v_{j}, θ_{i}, θ_{j}) \\ v_{i} \end{matrix}]

(4)

The specific expressions for

P_{i}

,

Q_{i}

,

P_{i j}

, and

Q_{i j}

are as follows:

P_{i} (v_{i}, v_{j}, θ_{i}, θ_{j}) = \sum v_{i} v_{j} (G_{i j} cos θ_{i j} + B_{i j} sin θ_{i j})

(5)

Q_{i} (v_{i}, v_{j}, θ_{i}, θ_{j}) = \sum v_{i} v_{j} (G_{i j} sin θ_{i j} - B_{i j} cos θ_{i j})

(6)

P_{i j} (v_{i}, v_{j}, θ_{i}, θ_{j}) = v_{i}^{2} g_{i j} - v_{i} v_{j} (g_{i j} cos θ_{i j} + b_{i j} sin θ_{i j})

(7)

Q_{i j} (v_{i}, v_{j}, θ_{i}, θ_{j}) = - v_{i}^{2} (b_{i j} + y_{c}) - v_{i} v_{j} (g_{i j} sin θ_{i j} + b_{i j} cos θ_{i j})

(8)

where

θ_{i j} = θ_{i} - θ_{j}

is the phase difference between bus i and bus j.

G_{i j}

represents the real part of the element in row i and column j of the conductance matrix.

B_{i j}

represents the imaginary part of the element in row i and column j of the conductance matrix.

g_{i j}

represents the conductance of branch

i j

.

b_{i j}

represents the conductance of branch

i j

.

y_{c}

represents the conductance of the line to the ground.

Ref. [28] proposed various methods for solving the system state vector, with the most commonly used algorithm in the smart grid being the weighted least squares (WLS) algorithm. Therefore, through the WLS solution, the estimated system state vector

\hat{x}

is as follows:

\hat{x} = \underset{x}{arg \min} {[Z - h (x)]}^{T} W [Z - h (x)]

(9)

where W is a diagonal matrix, and each of its elements is equal to the inverse of the respective measurement precision.

3.2. Bad Data Detection

In power systems, sensor measurements are often influenced by factors such as environmental noise and sensor faults during their transmission to the power system control center, leading to deviations in the measured data from the true values and causing errors. This type of disturbed measurement data is commonly referred to as “bad data”. Researchers typically use a BDD mechanism based on the principle of residual analysis to detect such data [29]. The residual r is defined as follows:

{∥r∥}_{2} = {∥Z - h (\hat{x})∥}_{2}

(10)

Comparing the residual r with the detection threshold

τ

yields the following result:

{∥r∥}_{2} \overset{Θ_{1}}{<} τ, {∥r∥}_{2} \overset{Θ_{2}}{>} τ

(11)

where

Θ_{1}

represents normal data,

Θ_{2}

indicates bad data in the SCADA system’s measurement data, and

τ

is the algorithm detection threshold.

3.3. False Data Injection Attacks

The basic principle of constructing FDIAs in AC power systems is to manipulate a set of measurement vectors to change certain state vectors [30]. For instance, if an attacker intends to change the actual power on bus i, they should create a series of attack vectors to ensure that there is no discrepancy between the estimated state vector and the actual state value. But due to the subjection to the power flow equations, all measurements that depend on this state variable will be affected at this point.

As indicated by the power flow equation constraints in Equations (5) to (8), we can understand that a change in the state value will cause a change in the corresponding measured value. According to reference [28], to make the attack vector covert, the attack vector

α

can be denoted as follows:

α = h (\hat{x} + l) - h (\hat{x})

(12)

where l is denoted as the deviation vector of the state values after the attack.

According to Equation (12), the attacker needs to fully understand the power grid, including the topology

h (\cdot)

and the estimated state

\hat{x}

. In adding the attack vector

α

to the measurement vector, the post-attack measurement vector can be represented as follows:

Z_{α} = Z + α

(13)

Then the system state vector under attack is as follows:

{\hat{x}}_{α} = x + l

(14)

The residual after the attack can be represented as follows:

\begin{matrix} {∥r_{α}∥}_{2} & = {∥Z_{α} - h ({\hat{x}}_{α})∥}_{2} = {∥Z + α - h (\hat{x} + l)∥}_{2} \\ = {∥Z + h (\hat{x} + l) - h (\hat{x}) - h (\hat{x} + l)∥}_{2} \\ = {∥r∥}_{2} \end{matrix}

(15)

As shown in Equation (15), the residual values of the system remain unchanged before and after the attack. Therefore, such stealthy FDIA signals can effectively bypass the BDD detector, posing a threat to the secure operation of the smart grid system.

4. A-BiTG Detection Model

In this section, we present the A-BiTG detection model, which mainly includes the input–output module, BiTCN module, and BiGRU-AM module. Subsequently, the overall framework of the A-BiTG detection model and the loss function are elaborated. Finally, the specific implementation process of the model is presented.

4.1. Input and Output Module

In this subsection, the active power and reactive power measurement data of all nodes in the SCADA module of the smart grid are taken as input features of the model, and they are transformed into the time series data

X_{t}

required by the A-BiTG model. That is,

X_{t} = [x_{1}, x_{2}, \dots, x_{t}, \dots x_{T}] \in R^{T \times N \times V}

,

x_{t} = [P_{i}, Q_{i}] \in R^{N \times V}

,

P_{i} = [p_{1}, p_{2}, \dots, p_{N}]

, and

Q_{i} = [q_{1}, q_{2}, \dots, q_{N}]

represent the time series injection of active power and reactive power at node i, respectively,

i = 1, 2, \dots, N

. t is the time step, T is the input sequence length, N is the number of grid topology nodes, and V is the grid power feature dimension.

After processing through the A-BiTG model, the detection result of the model can be represented as follows:

I = f (X)

(16)

where f is the detection model function, and I is the detection result of the model.

I = 1

represents the attack data, and

I = 0

represents normal data.

4.2. BiTCN Module

In 2018, Bai et al. [31] proposed the TCN module, which has significant advantages over LSTM in handling time series data. It can efficiently process time series data in parallel and maintain gradient stability through the mechanism of residual networks.

As shown in Figure 2, the TCN module consists of three main components: causal convolution, dilated convolution, and residual blocks [32]. In this module, the convolutional kernel is denoted as

k = 3

, and the dilation factor is denoted as

d = [1, 2, 4]

, where the dilation factor is typically a factorial of 2.

4.2.1. BiTCN Module Architecture

The traditional one-way TCN cannot fully account for the bidirectional nature of the data [33]. Its neurons at moment t can only see the information before that moment and cannot access the information after that moment, which leads to insufficient information mining to eliminate the feature information that is irrelevant to the detection task, which, in turn, affects the detection accuracy. To alleviate this drawback, this paper introduces the BiTCN module, which integrates bidirectional information to facilitate global feature extraction and enhance the robustness of the model.

Given the input sequence

X_{t} = [x_{1}, x_{2}, \dots, x_{t}, \dots x_{T}] \in R^{T \times N \times V}

of the model, firstly, we denote the forward input vector as

\vec{X}

, which is

{\vec{X}}_{t} = [x_{1}, x_{2}, \dots, x_{t}, \dots x_{T}] \in R^{T \times N \times V}

. Then, we reverse X to obtain the backward sequence

\overset{\leftarrow}{X}

, which is

{\overset{\leftarrow}{X}}_{t} = [x_{T}, \dots, x_{t}, \dots, x_{2}, x_{1}] \in R^{T \times N \times V}

, with the sequence order of

\vec{X}

and

\overset{\leftarrow}{X}

reversed, as shown in Figure 3. The temporal data

\vec{X}

and

\overset{\leftarrow}{X}

are input into the BiTCN module for feature extraction learning. After learning, the forward features are represented as

\vec{G}

, and the backward features are represented as

\overset{\leftarrow}{G}

. Finally, the concatenated sequence features from both directions are obtained as follows:

G = C o n c a t e n a t e (\vec{G}, \overset{\leftarrow}{G})

(17)

where

C o n c a t e n a t e (\cdot)

represents the fusion concatenation operation of the encoding information from both forward and backward directions, and G is the fused vector with a dimension of twice the unidirectional feature dimension.

The BiTCN module processes past time series data through forward convolution and future time series data through backward convolution. It can reduce bias caused by information leakage, making it more suitable for the global feature extraction task of time series data in FDIA detection for smart grids. It exhibits a more comprehensive and accurate predictive performance.

4.2.2. BiTCN Residual Block

Due to the fact that the BiTCN is essentially a deep network, when we need to extract more historical information from the sequence, we have to increase the depth of the network. However, as the depth of the network increases, the entire network becomes unstable, and the convergence speed becomes slower. To address this issue, we introduce residual blocks. As shown in Figure 4, the composite residual block is divided into a left half and a right half. The left half consists of two branches: one branch performs causal convolution operations, while the other branch performs the output operation. The outputs of these two branches are then added together to obtain the forward residual output unit, which is activated by an activation function. The right half is the mirror image of the left half. Finally, the forward residual output unit from the left half is combined with the backward residual output unit from the right half to produce the merged output. The output of the

h + 1

-th layer residual block can be represented as follows:

X^{\vec{h} + 1} = σ (F (\vec{X^{h}}) + \vec{X^{h}})

(18)

X^{\overset{\leftarrow}{h} + 1} = σ (F (\overset{\leftarrow}{X^{h}}) + \overset{\leftarrow}{X^{h}})

(19)

X^{h + 1} = C o n c a t e n a t e (X^{\vec{h} + 1}, X^{\overset{\leftarrow}{h} + 1})

(20)

where

\vec{X^{h}}

and

\overset{\leftarrow}{X^{h}}

, respectively, represent the input of the h-th forward and backward residual blocks.

F (\cdot)

denotes the transformation operations, including dilated causal convolution, weight normalization, batch normalization, ReLU activation, and the dropout layer.

σ (\cdot)

represents the activation function.

C o n c a t e n a t e (\cdot)

signifies the fusion concatenation of the outputs from both forward and backward residual blocks, and

X^{h + 1}

indicates the residual output after concatenation.

4.3. BiGRU Module

The traditional GRU is a variant of an RNN. As shown in Figure 5, it addresses the vanishing gradient problem in traditional RNNs by introducing an update gate and a reset gate [34]. The update gate controls when to update the hidden state, while the reset gate determines when to allow past information to influence the current hidden state. The GRU has a lower complexity, performs well in handling long-term dependencies, and has fewer parameters, making it easier to implement.

In Figure 5,

r_{t}

represents the reset gate, which controls the amount of information to be forgotten when its value is smaller.

v_{t}

is the update gate, where a larger value results in more information being retained.

h_{t - 1}

denotes the previous time step’s output state,

x_{t}

represents the input,

{\tilde{h}}_{t}

is the current hidden state information, and

h_{t}

indicates the current output value. Based on the structure of the GRU neural network, its main formulas for forward propagation are as follows:

v_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}])

(21)

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}])

(22)

{\tilde{h}}_{t} = tanh (W \cdot [r_{t} * h_{t - 1}, x_{t}])

(23)

h_{t} = (1 - v_{t}) * h_{t - 1} + v_{t} * {\tilde{h}}_{t}

(24)

where

W_{z}

,

W_{r}

, and W represent the weight matrix coefficients,

σ (\cdot)

denotes the activation function, and ∗ the indicates element-wise multiplication of vectors.

4.3.1. BiGRU Module Architecture

In the detection of FDIAs in smart grids, future information is crucial for predicting the current moment. However, a unidirectional GRU cannot directly utilize this information. To better predict time series data in smart grids, our detection model employs a BiGRU module, which is an improved GRU structure, as illustrated in Figure 6. The BiGRU consists of a forward GRU and a backward GRU; thus, its output is influenced by both forward and backward states, enabling a better understanding of the overall characteristics of the sequence and thereby enhancing the final detection accuracy.

After the measurement input sequence

X_{t}

is processed through the BiGRU module, the sequence data are, respectively, read through the forward module and the backward module, resulting in the forward hidden state sequence

\vec{h_{t}}

and the backward hidden state sequence

\overset{\leftarrow}{h_{t}}

. Upon obtaining the forward and backward hidden state sequences, the hidden state output

h_{t}

of the BiGRU at time step t is obtained by concatenating the forward hidden state

\vec{h_{t}}

and the backward hidden state

\overset{\leftarrow}{h_{t}}

. Furthermore, after the sequence training is completed, the set of hidden states is denoted as

H = (h_{1}, h_{2}, \dots, h_{T})

. The module equations are expressed as shown in Equations (25)–(27):

\vec{h_{t}} = G R U (x_{t}, \vec{h_{t - 1}})

(25)

\overset{\leftarrow}{h_{t}} = G R U (x_{t}, \overset{\leftarrow}{h_{t + 1}})

(26)

h_{t} = C o n c a t e n a t e (\vec{h_{t}}, \overset{\leftarrow}{h_{t}})

(27)

where function

G R U (\cdot)

represents the nonlinear transformation of the input vector to obtain the corresponding hidden state,

C o n c a t e n a t e (\cdot)

denotes the concatenation operation of the hidden state outputs in the forward and backward directions, and

\vec{h_{t}}

and

\overset{\leftarrow}{h_{t}}

represent the forward and backward hidden states at time step t, respectively.

4.3.2. Attention Mechanism Module

The history of attention mechanisms can be traced back to the early 1980s. Early research was mainly focused on psychology and cognitive science, aiming to understand human visual attention and cognitive processes. With the rapid rise of deep learning, the “neural attention mechanism” was proposed by Bahdanau in 2014 [35]. This mechanism dynamically assigns importance weights to different parts of the model’s input sequence, thereby ignoring features that are irrelevant or have low relevance to the task, and dynamically weighting features with high relevance. This mechanism greatly improves the performance of the model.

As shown in Figure 6, the proposed model takes the feature vectors obtained from training the BiGRU network model as the input to the attention layer. Through the attention mechanism, the information of the feature vectors is dynamically weighted to ensure that the model retains the most important features and improves the FDIA detection accuracy. This process is mainly shown in Equations (28)–(30):

D = tanh (W_{a} H + b_{a})

(28)

A = s o f t max (D)

(29)

C = A H

(30)

where D represents the attention scores, H represents the output of the BiGRU module,

tanh (\cdot)

is the activation function, and

W_{a}

is the weight matrix obtained through calculation.

b_{a}

is the bias term, A is the attention weight value, and C is the computation result of the output of the BiGRU module and the attention weight value A.

4.4. The A-BiTG Overall Framework and Loss Function

The overall framework of the A-BiTG and the input–output sequence after passing through each module are shown in Figure 7. Firstly, we input the measurement data of the smart grid into a parallel network composed of the BiTCN and BiGRU. The BiTCN branch explores the bidirectional time series features, extracts short-term local features from the time series data, and enhances the algorithm’s ability to extract time series features. Meanwhile, the BiGRU branch captures the long-term dependencies of past and future information in the time series data. It uses the attention mechanism to dynamically redistribute weights for important feature information in the time series, ensuring the model’s detection accuracy. Secondly, the feature information G extracted based on the BiTCN module and the feature information C extracted based on BiGRU-AM are concatenated through a connection layer, forming a feature sequence

η

containing both local short-term features and long-term features. Then, the concatenated vector sequence

η

is flattened. Finally, the fully connected layer completes the nonlinear mapping of the input features through the softmax function activation and outputs the classification results:

η = c o n c a t e n a t e (G, C)

(31)

I = S o f t max (λ ψ + ξ)

(32)

where

C o n c a t e n a t e (\cdot)

represents the concatenation operation, and

η

represents the concatenated feature vector.

ψ

represents the flattened output result. I represents the model’s predicted output,

λ

represents the weight parameters, and

ξ

represents the error bias vector.

The detection of FDIAs in smart grids belongs to a common binary classification problem. In order to enhance the learning performance of the A-BiTG model, this study adopted the cross entropy loss function, combined with the Adam and cosine annealing algorithms to adjust the learning rate for model parameter updates. The specific expression of the loss function is as follows:

ϕ_{l o s s} = - \sum_{i = 1}^{M} [a_{i} log I_{i} + (1 - a_{i}) log (1 - I_{i})]

(33)

where M represents the batch size,

a_{i}

represents the operational status label of the i-th actual power grid, and

I_{i}

represents the predicted result of the i-th model.

4.5. FDIA Detection Steps Based on A-BiTG Model

The detection process of FDIAs based on the A-BiTG model mainly consists of three parts, including grid data preprocessing, training the detection model, and testing the detection model, as shown in Figure 8.

The specific process is as follows:

Preprocessing the measurement data from the SCADA system involves segmenting the processed time series data, dividing them into training and testing sets, and using them as input for the A-BiTG model.
In the model training phase, the training set data are imported into the constructed A-BiTG model, and the model parameters are iteratively updated according to the set number of training epochs. The best-performing model is selectively chosen and saved.
During the testing phase, the trained model is applied to the test set data for validation; then, computational model evaluation metrics are obtained, completing the model testing process.

In practical grid applications, detecting FDIAs usually requires integrating with the BDD mechanism, making the entire process more complex. Firstly, the measurement data of the smart grid undergo BDD based on state estimation. Secondly, they are combined with the A-BiTG model proposed in this paper for FDIA detection. Finally, the detection results output by the model are processed to ensure the safe and reliable operation of the smart grid.

5. Experimental Simulation and Result Analysis

This section presents a series of simulation experiments conducted to assess the overall performance of the proposed A-BiTG model. It includes information on the model evaluation metrics, computer configuration requirements, the generation of simulation data, and relevant hyperparameter settings.

5.1. Evaluation Metrics

To evaluate the FDIA detection accuracy of the A-BiTG model, this study utilized four commonly used metrics in the field of machine learning object detection—accuracy, precision, recall, and F1-score—as our evaluation criteria for the model’s detection output results [36].

T P

,

F N

,

T N

, and

F P

, respectively, represent true positives, false negatives, true negatives, and false positives. The definitions of these four evaluation metrics are as follows.

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}

(34)

where Accuracy is expressed as the ratio of all samples that are correctly judged. It boasts a simple and intuitive characteristic, particularly applicable in cases of a balanced category distribution. A higher accuracy rate correlates with a superior overall effectiveness of the detection model.

P r e c i s i o n = \frac{T P}{T P + F P}

(35)

where Precision is the ratio of predicted attack samples to real attack samples. It serves as a measure of the model’s ability to accurately identify attacks. The higher the precision rate, the lower the false alarm rate of the detection model, and the better the detection effect.

R e c a l l = \frac{T P}{T P + F N}

(36)

where Recall is the ratio of predicting correct attack samples among real attack samples. The higher the recall rate, the lower the detection model miss-recall rate, and the better the detection effect.

F 1 - score = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(37)

where F1-score is the reconciled mean of the precision and recall, used as a comprehensive evaluation metric when the choice of precision or recall scores may cause the model to provide high false alarms and missed alarms, respectively. The higher the F1-score, the better the overall performance of the detection model.

5.2. Experimental Environment

The experiment was conducted on a Windows 11 operating system with an Intel(R) Core(TM) i7-12700H CPU, 16.00 GB of RAM, and an NVIDIA GeForce RTX 3060 Laptop GPU. This experimental computer was purchased from Lenovo in China and is the model number Deliverer Y9000P. The algorithm was implemented and run using the PyTorch deep learning framework, and uses Matpower 7.1 version of MATLAB R2023b from MathWorks in the United States for power flow calculations and data state estimation.

5.3. Dataset Settings

5.3.1. FDIA Data Generation

In this experiment, in order to generate normal measurement data, we obtained real load data from the New York Independent System Operator (NYSIO) [37]. The load profiles were proportionally added to the IEEE 14-bus and IEEE 118-bus system loads, and the system measurements were sampled. It is worth mentioning that in order to ensure the relative realism of the experiments, we added Gaussian noise with a variance of 0.25 and a mean of 0. The size of the added noise was calculated to be mainly between 0.35% and 2% of the corresponding measured data. Then, the BDD process was executed to obtain the valid measurement data. Secondly, considering that it is very dangerous to carry out FDIAs in smart grids, this study, based on the principle and construction method of FDIAs in Section 3, adopted the method of modifying the load to generate the FDIA data and controlled the load-modified measurement value to be between 80% and 125% of the actual value. In addition, we used a random walk attack method. Over a period of time, false data were continuously injected into the system bus in the form of a multi-point attack, and the Matpower toolkit was used for FDIA data generation.

5.3.2. Dataset Partitioning

We divided the IEEE 14-bus and IEEE 118-bus test systems into equal numbers of training and test sets. Each test system contained 16,000 sets of measurement data, including 8000 normal samples and 8000 FDIA samples, with normal samples labeled as 0, and FDIA samples labeled as 1. The specific division is shown in Table 1.

5.4. Model Detection Performance Analysis

5.4.1. Comparison of Detection Performance of Different Models

To assess the performance of the A-BiTG model, we compared it with several mainstream FDIA detection models currently used in smart grids. These models included an LSTM model [18], CNN model [19], CNN-LSTM model [20], and TCN model [21], which were previously mentioned. We ensured that the comparison was conducted using the same dataset.

The batch sizes of all models in the experiment were 32, the training epoch was 70, and the Adam optimizer was used to regulate the learning rate with an initial learning rate of 0.01 and a maximum time step of 100. Moreover, the CNN model consisted of two convolutional layers, each with a kernel size of 1, followed by a 1 × 1 max-pooling layer. This WAS followed by a fully connected layer to activate the output. The TCN model consistED of two inflated convolutional layers with A kernel size 1 and a residual block; then, a dropout layer WAS connected, with its loss rate set to 0.3, and finally, the output results were activated using a fully connected layer. The LSTM model had two LSTM layers, each containing 64 hidden units, and a dropout layer was added after each LSTM layer, with a loss rate set to 0.3. The CNN-LSTM model consisted of three convolutional layers with a kernel size 1, a 1 × 1 pooling and activation layer, followed by an LSTM layer containing 128 hidden units with two jump connections. Finally, the results were output through a fully connected layer. In addition, in order to ensure the fairness of the experimental results, we took the average of five tests as the final results of the experiment. In order to verify the superiority of the A-BiTG model proposed in this paper, we conducted a comparison with the five algorithms to evaluate the detection metrics in the training phase of the model under the same dataset as well as environment.

As can be seen from Table 2, the A-BiTG detection had the highest accuracy, precision, recall, and F1-score metrics during model training, which were 96.23%, 95.47%, 97.27%, and 96.37%, respectively. Compared to the A-BiTG model, the CNN, TCN, and LSTM models had a single network structure, and all indicators were lower when tested for FDIAs. Therefore, the results of this experiment illustrate that in the A-BiTG model, since the BiTCN is good at capturing short-term features and the BiGRU is good at dealing with long-term dependencies, this combination enables the model to understand the time series data more comprehensively and improve the accuracy of the FDIA detection, which has the incomparable advantage of a single network model. In addition, the CNN-LSTM model was inferior to the A-BiTG in all evaluation indexes, mainly due to the large amount of CNN-LSTM parameter models, resulting in a more complex training model, and they can only unidirectionally deal with time series data and cannot adequately capture bidirectional temporal dependencies. As a result, the understanding of the overall characteristics of the data is not comprehensive enough, making the model less accurate in detection.

Table 3 lists the detection performances of these models on the IEEE 118-bus system. They are similar to the IEEE 14-bus system detection results. The detection results show that the A-BiTG model proposed in this paper still has a high detection accuracy under larger and more complex system models, and the F1-score of the model is as high as 95.91%. This is also due to the introduction of the attention mechanism in the model, which makes the decisions of the model more interpretable, retains more valuable feature information in the detection process, and improves the overall detection performance of the model. Therefore, the A-BiTG model can provide an effective guarantee for the safe operation of the smart grid.

In addition, we analyzed comparative experiments with different model detection times on the IEEE 14 and 118 test systems. The results are shown in Table 4. The detection time of the A-BiTG model was slightly longer than that of the single network structure model. This is mainly attributed to the simplified structure of the single network model with fewer parameters, which reduces the detection time. However, compared to the CNN-LSTM model, the detection time of the A-BiTG model was shorter by 8.76 s and 153.50 s on the IEEE 14 and IEEE 118 test systems, respectively. Although the detection time of the A-BiTG model was slightly longer, the detection performance of the proposed model is substantially better than those of the other models, and the existing hardware level is sufficient to bridge the gap in the training efficiency of A-BiTG model. Therefore, the method proposed in this paper is more competitive.

5.4.2. Impact of Attack Intensity

Different strengths of FDIA attacks will have impacts on the detection of the grid model, especially the higher strength attacks, which tend to bring more serious impacts and challenge the accuracy, robustness, and reliability of the model. In this paper, we experimentally verified the detection performance of the A-BiTG model under three different FDIA modes, namely, low, medium, and high.

From the detection results on the IEEE 14-bus system in Figure 9 and Table 5, it can be seen that the detection F1-score of the A-BiTG model is above 95% under three different attack patterns. As the attack intensity increases, its detection precision, accuracy, recall, and F1-score all improve; in particular, under high-intensity FDIAs, the F1-score reaches 97.72%. However, in the comparison experiments above, it was found that the detection performances of the other four models were poor. This reflects the limited feature extraction capability of some current mainstream models. These models cannot effectively avoid the influence of undesirable features, resulting in unsatisfactory detection performances. The BiTCN and BiGRU have a natural advantage in the feature extraction of time series, which can better consider the global information of the data, and, combined with the attention mechanism, can assign a higher weight to the important feature information in the detection task, which greatly improves the detection performance of the model.

Figure 10 and Table 6 show the comparison experiment results of different attack strengths of the five models on the IEEE118-bus system. Similar to the simulation results of the IEEE 14-bus system, when facing the large dimension of the grid measurement data, the detection results of the A-BiTG detection model proposed in this paper for the three different attack strengths of low, medium, and high show F1-scores of 92.85%, 94.61%, and 96.28%, which are 5.35%, 5.23%, and 2.74% higher than those of the existing advanced CNN-LSTM detection model. The experimental results show that the A-BiTG model still has a strong detection advantage in the face of more complex attack environments and power network systems.

5.4.3. Impact of Environmental Noise

In order to verify the robustness of the A-BiTG model to ambient noise, we performed three noise settings. The standard deviations of the noise were

0.5 θ

,

θ

, and

1.5 θ

, which satisfy the normal distribution and represent three different sizes of low, medium, and high ambient noise, respectively, where

θ

is the initial setting of the ambient noise size. Figure 11 and Figure 12 display the results of the robustness comparison between some of the current mainstream detection models and the A-BiTG model for different noise environments on IEEE 14-bus and IEEE 118-bus systems, respectively.

As can be seen from Figure 11, when the noise error is

0.5 θ

, the F1-score of the A-BiTG detection model proposed in this paper is higher than those of the other models when performing FDIA detection, which is 97.84%. When the noise error reaches the maximum, the model in this paper improved 19.84%, 15.06%, 6.76%, and 4.25% relative to the CNN, TCN, LSTM, and CNN-LSTM models, respectively, where the A-BiTG model is the least affected by noise.

Figure 12 shows the detection results of the IEEE 118-bus system. Although the amount of measurement data has increased, the A-BiTG model is still minimally affected by noise when compared to other mainstream detection models, and our model still has a strong detection advantage in performing the detection of FDIAs even in a high-noise environment. The experimental results indicate that the A-BiTG model exhibits greater robustness.

6. Conclusions

Aiming at false data injection attacks in smart grids, we proposed a detection method called the A-BiTG model, which aims to fully extract the temporal characteristics of grid measurement data. This method effectively addresses the issues of a poor feature extraction capability for grid timing data in current mainstream algorithm models and low detection accuracy. Through the simulation of IEEE 14-bus and IEEE 118-bus systems under FDIA attacks, we drew the following conclusion: our model shows an excellent detection performance in the face of FDIAs with different intensities, which is attributed to the ability of the BiTCN and BiGRU to fully mine past and future temporal information from smart grid measurement data for deep feature extraction. In addition, the A-BiTG model shows strong robustness in different noise environments, so it can better adapt to power grid attack detection tasks in various complex environments, maintain the stability and reliability of the detection performance, and provide an important guarantee for the safe operation of smart grids.

Although our method achieved good results in detecting FDIAs, it does not take into account situations in the power grid where sensor failure leads to the data loss of a node. This can have an impact on the accuracy of the model. Additionally, the method does not address how to effectively locate and defend against attack signals after detecting FDIA signals. Therefore, in the future, while continuing to study attack detection, we should focus on attack localization and defense mechanisms for smart grids.

Author Contributions

Conceptualization, W.H., C.W., W.L. and Q.Y.; methodology, W.H.; software, W.H.; validation, W.H. and Q.Y.; formal analysis, W.H.; investigation, W.H.; resources, W.H.; data curation, W.H.; writing—original draft preparation, W.H.; writing—review and editing, C.W. and W.L.; visualization, C.W.; supervision, C.W. and W.L.; project administration, C.W.; funding acquisition, C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key Research and Development Program of China under Grant 2023YFB4704000 and the National Natural Science Foundation of China under Grants 62125307, 62376147, U22A2046, and 61933013.

Data Availability Statement

The data presented in this study are available in this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, Y.; Wei, X.; Li, Y.; Dong, Z.; Shahidehpour, M. Detection of False Data Injection Attacks in Smart Grid: A Secure Federated Deep Learning Approach. IEEE Trans. Smart Grid 2022, 13, 4862–4872. [Google Scholar] [CrossRef]
Jin, B.; Zhao, X.; Yuan, D. Attack–Defense Confrontation Analysis and Optimal Defense Strategy Selection Using Hybrid Game Theoretic Methods. Symmetry 2024, 16, 156. [Google Scholar] [CrossRef]
Yang, H.; Li, T.; Yan, J.; Elvira, V. Hierarchical Average Fusion With GM-PHD Filters Against FDI and DoS Attacks. IEEE Signal Process. Lett. 2024, 31, 934–938. [Google Scholar] [CrossRef]
Sriranjani, R.; M, B.K.; K, P.A.; Saleem, M.; Hemavathi, N.; Parvathy, A. Machine Learning Based Intrusion Detection Scheme to Detect Replay Attacks in Smart Grid. In Proceedings of the 2023 IEEE International Students’ Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 18–19 February 2023; pp. 1–5. [Google Scholar] [CrossRef]
Gao, S.; He, Z.; Wei, X.; Liu, Y.; Huang, T.; Lei, J. Bilevel Model for Protection-Branch Measurements-Based Topology Attack Against DC and AC State Estimations. IEEE Syst. J. 2022, 16, 5369–5379. [Google Scholar] [CrossRef]
Baul, A.; Sarker, G.C.; Sadhu, P.K.; Yanambaka, V.P.; Abdelgawad, A. XTM: A Novel Transformer and LSTM-Based Model for Detection and Localization of Formally Verified FDI Attack in Smart Grid. Electronics 2023, 12, 797. [Google Scholar] [CrossRef]
Shen, K.; Yan, W.; Ni, H.; Chu, J. Localization of False Data Injection Attack in Smart Grids Based on SSA-CNN. Information 2023, 14, 180. [Google Scholar] [CrossRef]
Musleh, A.S.; Chen, G.; Dong, Z.Y. A Survey on the Detection Algorithms for False Data Injection Attacks in Smart Grids. IEEE Trans. Smart Grid 2020, 11, 2218–2234. [Google Scholar] [CrossRef]
Mo, Y.; Chabukswar, R.; Sinopoli, B. Detecting Integrity Attacks on SCADA Systems. IEEE Trans. Control Syst. Technol. 2014, 22, 1396–1407. [Google Scholar] [CrossRef]
Long, H.; Wu, Z.; Fang, C.; Gu, W.; Wei, X.; Zhan, H. Cyber-attack Detection Strategy Based on Distribution System State Estimation. J. Mod. Power Syst. Clean Energy 2020, 8, 669–678. [Google Scholar] [CrossRef]
Lisheng, W.; Qian, Z. Detection of False Data Injection Attack in Smart Grid Based on Improved UKF. J. Syst. Simul. 2023, 35, 1508–1516. [Google Scholar]
Wang, Z.; Zhang, Q.; Sun, H.; Hu, J. Detection of False Data Injection Attacks in smart grids based on cubature Kalman Filtering. In Proceedings of the 2021 33rd Chinese Control and Decision Conference (CCDC), Kunming, China, 22–24 May 2021; pp. 2526–2532. [Google Scholar] [CrossRef]
Luo, X.; Bai, M.; Wang, X.; Sun, X. Square-root Extended Kalman Filter-based Detection of False Data Injection Attack in Smart Grids. In Proceedings of the 2021 IEEE 5th Conference on Energy Internet and Energy System Integration (EI2), Taiyuan, China, 22–24 October 2021; pp. 2376–2381. [Google Scholar] [CrossRef]
Yang, L.; Wen, C.; Wen, T. Multilevel Fine Fingerprint Authentication Method for Key Operating Equipment Identification in Cyber-Physical Systems. IEEE Trans. Ind. Inform. 2023, 19, 1217–1226. [Google Scholar] [CrossRef]
Rashed, M.; Gondal, I.; Kamruzzaman, J.; Islam, S. State Estimation within IED Based Smart Grid Using Kalman Estimates. Electronics 2021, 10, 1783. [Google Scholar] [CrossRef]
Musleh, A.S.; Chen, G.; Yang Dong, Z.; Wang, C.; Chen, S. Spatio-temporal data-driven detection of false data injection attacks in power distribution systems. Int. J. Electr. Power Energy Syst. 2023, 145, 108612. [Google Scholar] [CrossRef]
Niu, X.; Li, J.; Sun, J.; Tomsovic, K. Dynamic Detection of False Data Injection Attack in Smart Grid using Deep Learning. In Proceedings of the 2019 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), Washington, DC, USA, 18–21 February 2019; pp. 1–6. [Google Scholar] [CrossRef]
Zhao, Y.; Jia, X.; An, D.; Yang, Q. LSTM-Based False Data Injection Attack Detection in Smart Grids. In Proceedings of the 2020 35th Youth Academic Annual Conference of Chinese Association of Automation (YAC), Zhanjiang, China, 16–18 October 2020; pp. 638–644. [Google Scholar] [CrossRef]
Zhang, G.; Li, J.; Bamisile, O.; Cai, D.; Hu, W.; Huang, Q. Spatio-Temporal Correlation-Based False Data Injection Attack Detection Using Deep Convolutional Neural Network. IEEE Trans. Smart Grid 2022, 13, 750–761. [Google Scholar] [CrossRef]
Wahid, A.; Breslin, J.G.; Intizar, M.A. Prediction of Machine Failure in Industry 4.0: A Hybrid CNN-LSTM Framework. Appl. Sci. 2022, 12, 4221. [Google Scholar] [CrossRef]
Raghuvamsi, Y.; Kalyani, V.; Teeparthi, K.; Geetha, S.N.; Srimannarayana, Y.; Batchu, S. Temporal Convolutional Network-based Locational Detection of False Data Injection Attacks in Power System State Estimation. In Proceedings of the 2023 7th International Conference on Computer Applications in Electrical Engineering-Recent Advances (CERA), Roorkee, India, 27–29 October 2023; pp. 1–6. [Google Scholar] [CrossRef]
Su, X.; Deng, C.; Yang, J.; Li, F.; Li, C.; Fu, Y.; Dong, Z.Y. DAMGAT Based Interpretable Detection of False Data Injection Attacks in Smart Grids. IEEE Trans. Smart Grid, 2024; early access. [Google Scholar] [CrossRef]
Gaggero, G.B.; Caviglia, R.; Armellin, A.; Rossi, M.; Girdinio, P.; Marchese, M. Detecting Cyberattacks on Electrical Storage Systems through Neural Network Based Anomaly Detection Algorithm. Sensors 2022, 22, 3933. [Google Scholar] [CrossRef]
Zideh, M.J.; Chatterjee, P.; Srivastava, A.K. Physics-Informed Machine Learning for Data Anomaly Detection, Classification, Localization, and Mitigation: A Review, Challenges, and Path Forward. IEEE Access 2024, 12, 4597–4617. [Google Scholar] [CrossRef]
Li, H.; Dou, C.; Yue, D.; Hancke, G.P.; Zeng, Z.; Guo, W.; Xu, L. End-Edge-Cloud Collaboration-Based False Data Injection Attack Detection in Distribution Networks. IEEE Trans. Ind. Inform. 2024, 20, 1786–1797. [Google Scholar] [CrossRef]
Li, B.; Wu, Y.; Song, J.; Lu, R.; Li, T.; Zhao, L. DeepFed: Federated Deep Learning for Intrusion Detection in Industrial Cyber–Physical Systems. IEEE Trans. Ind. Inform. 2021, 17, 5615–5624. [Google Scholar] [CrossRef]
Wang, Y.; Xia, M.; Yang, Q.; Song, Y.; Chen, Q.; Chen, Y. Augmented State Estimation of Line Parameters in Active Power Distribution Systems with Phasor Measurement Units. IEEE Trans. Power Deliv. 2022, 37, 3835–3845. [Google Scholar] [CrossRef]
Abur, A.; Expósito, A.G. Power System State Estimation: Theory and Implementation; CRC Press: Boca Raton, FL, USA, 2004. [Google Scholar]
Liu, Y.; Ning, P.; Reiter, M.K. False data injection attacks against state estimation in electric power grids. ACM Trans. Inf. Syst. Secur. 2011, 14, 13. [Google Scholar] [CrossRef]
Jorjani, M.; Seifi, H.; Varjani, A.Y. A Graph Theory-Based Approach to Detect False Data Injection Attacks in Power System AC State Estimation. IEEE Trans. Ind. Inform. 2021, 17, 2465–2475. [Google Scholar] [CrossRef]
Bai, S.; Kolter, J.Z.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv 2018, arXiv:1803.01271, 01271. [Google Scholar]
Wang, Y.; Chen, J.; Chen, X.; Zeng, X.; Kong, Y.; Sun, S.; Guo, Y.; Liu, Y. Short-Term Load Forecasting for Industrial Customers Based on TCN-LightGBM. IEEE Trans. Power Syst. 2021, 36, 1984–1997. [Google Scholar] [CrossRef]
Xuan, B.; Li, J.; Song, Y. BiTCN malware classification method based on multi-feature fusion. In Proceedings of the 2022 International Conference on Image Processing, Computer Vision and Machine Learning (ICICML), Xi’an, China, 28–30 October 2022; pp. 359–364. [Google Scholar] [CrossRef]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural Machine Translation by Jointly Learning to Align and Translate. arXiv 2016, arXiv:1409.0473. [Google Scholar]
Zhang, Y.; Wang, J.; Chen, B. Detecting False Data Injection Attacks in Smart Grids: A Semi-Supervised Deep Learning Approach. IEEE Trans. Smart Grid 2021, 12, 623–634. [Google Scholar] [CrossRef]
Feng, H.; Han, Y.; Si, F.; Zhao, Q. Detection of False Data Injection Attacks in Cyber-Physical Power Systems: An Adaptive Adversarial Dual Autoencoder with Graph Representation Learning Approach. IEEE Trans. Instrum. Meas. 2024, 73, 9000411. [Google Scholar] [CrossRef]

Figure 1. A graphical representation of a smart grid under FDIAs.

Figure 2. TCN architecture diagram.

Figure 3. BiTCN architecture diagram.

Figure 4. BiTCN residual block architecture diagram.

Figure 5. GRU architecture diagram.

Figure 6. BiGRU-AM architecture diagram.

Figure 7. Framework diagram of the A-BiTG model.

Figure 8. A-BiTG detection flow chart.

Figure 9. Comparison results of F1-score for different attack strengths for IEEE 14-bus system.

Figure 10. Comparison results of F1-score for different attack strengths for IEEE 118-bus system.

Figure 11. Comparative robustness results for ambient noise in IEEE 14-bus systems.

Figure 12. Comparative robustness results for ambient noise in IEEE 118-bus systems.

Table 1. Model training and test dataset partitioning.

Dataset	Labels
Dataset	Normal Samples	FDIA Samples	Total Samples
Training	5600	5600	11,200
Test	2400	2400	4800

Table 2. Comparison of detection methods in IEEE 14-bus power systems.

$Data$	$Model Methods$	$Accuracy$	$Precision$	$Recall$	F1-Score
$IEEE 14$	CNN	81.91%	80.84%	80.72%	78.72%
	TCN	85.03%	92.33%	76.37%	83.60%
	LSTM	91.83%	95.10%	87.18%	91.42%
	CNN-LSTM	93.23%	93.61%	93.18%	93.39%
	A-BiTG	96.23%	95.47%	97.27%	96.37%

Table 3. Comparison of detection methods in IEEE 118-bus power systems.

$Data$	$Model Methods$	$Accuracy$	$Precision$	$Recall$	F1-Score
$IEEE 118$	CNN	80.75%	82.55%	78.04%	78.68%
	TCN	84.60%	88.77%	79.17%	83.70%
	LSTM	90.13%	88.48%	92.26%	90.33%
	CNN-LSTM	92.03%	92.77%	91.62%	92.19%
	A-BiTG	95.77%	95.14%	96.69%	95.91%

Table 4. Comparison of training times of different models.

Model Methods	Total Training Time of the Models(s)
Model Methods	IEEE 14	IEEE 118
CNN	71.26	697.19
TCN	82.18	842.43
LSTM	89.45	975.78
CNN-LSTM	124.37	1229.19
A-BiTG	115.64	1075.69

Table 5. Comparison of detection methods of models on IEEE 14-bus system under different attack strengths.

Model Methods	Low Attack Intensity			Medium Attack Intensity			High Attack Intensity
Model Methods	Accuracy	Precision	Recall	Accuracy	Precision	Recall	Accuracy	Precision	Recall
CNN	76.98%	79.06%	73.56%	80.18%	81.18%	77.55%	86.30%	85.93%	87.66%
TCN	80.51%	82.31%	77.80%	84.87%	82.42%	88.58%	90.73%	88.34%	94.42%
LSTM	86.87%	86.85%	86.85%	89.30%	87.60%	91.52%	92.87%	90.53%	96.17%
CNN-LSTM	90.93%	90.22%	91.29%	91.70%	90.02%	94.29%	95.03%	94.21%	96.23%
A-BiTG	94.33%	96.96%	91.79%	95.77%	96.39%	95.32%	97.15%	98.84%	96.72%

Table 6. Comparison of detection methods of models on IEEE 118-bus system under different attack strengths.

Model Methods	Low Attack Intensity			Medium Attack Intensity			High Attack Intensity
Model Methods	Accuracy	Precision	Recall	Accuracy	Precision	Recall	Accuracy	Precision	Recall
CNN	73.49%	75.31%	70.04%	76.02%	76.45%	73.40%	83.49%	85.52%	80.92%
TCN	79.73%	80.64%	77.15%	82.10%	79.79%	85.91%	88.00%	91.35%	83.91%
LSTM	82.34%	83.75%	79.59%	86.30%	82.26%	92.52%	91.73%	94.86%	88.70%
CNN-LSTM	87.43%	86.90%	88.12%	89.53%	90.55%	88.25%	93.37%	93.51%	93.57%
A-BiTG	92.67%	90.49%	95.33%	94.31%	91.87%	96.53%	96.42%	98.15%	94.82%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

He, W.; Liu, W.; Wen, C.; Yang, Q. Detection of False Data Injection Attacks on Smart Grids Based on A-BiTG Approach. Electronics 2024, 13, 1938. https://doi.org/10.3390/electronics13101938

AMA Style

He W, Liu W, Wen C, Yang Q. Detection of False Data Injection Attacks on Smart Grids Based on A-BiTG Approach. Electronics. 2024; 13(10):1938. https://doi.org/10.3390/electronics13101938

Chicago/Turabian Style

He, Wei, Weifeng Liu, Chenglin Wen, and Qingqing Yang. 2024. "Detection of False Data Injection Attacks on Smart Grids Based on A-BiTG Approach" Electronics 13, no. 10: 1938. https://doi.org/10.3390/electronics13101938

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detection of False Data Injection Attacks on Smart Grids Based on A-BiTG Approach

Abstract

1. Introduction

2. Related Work

3. Model Description of Power Systems

3.1. Power System State Estimation

3.2. Bad Data Detection

3.3. False Data Injection Attacks

4. A-BiTG Detection Model

4.1. Input and Output Module

4.2. BiTCN Module

4.2.1. BiTCN Module Architecture

4.2.2. BiTCN Residual Block

4.3. BiGRU Module

4.3.1. BiGRU Module Architecture

4.3.2. Attention Mechanism Module

4.4. The A-BiTG Overall Framework and Loss Function

4.5. FDIA Detection Steps Based on A-BiTG Model

5. Experimental Simulation and Result Analysis

5.1. Evaluation Metrics

5.2. Experimental Environment

5.3. Dataset Settings

5.3.1. FDIA Data Generation

5.3.2. Dataset Partitioning

5.4. Model Detection Performance Analysis

5.4.1. Comparison of Detection Performance of Different Models

5.4.2. Impact of Attack Intensity

5.4.3. Impact of Environmental Noise

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI