Optimizing CNN-LSTM for the Localization of False Data Injection Attacks in Power Systems

Li, Zhuo; Xie, Yaobin; Ma, Rongkuan; Wei, Zihan

doi:10.3390/app14166865

Open AccessArticle

Optimizing CNN-LSTM for the Localization of False Data Injection Attacks in Power Systems

School of Cyberspace Security, Information Engineering University, Zhengzhou 450000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 6865; https://doi.org/10.3390/app14166865

Submission received: 1 July 2024 / Revised: 3 August 2024 / Accepted: 5 August 2024 / Published: 6 August 2024

(This article belongs to the Special Issue Machine Learning and Deep Learning-Based Fault Detection and Diagnosis)

Download

Browse Figures

Versions Notes

Abstract

:

As the informatization of power systems advances, the secure operation of power systems faces various potential network attacks and threats. The false data injection attack (FDIA) is a common attack mode that can lead to abnormal system operations and serious economic losses by injecting abnormal data into terminal links or devices. The current research on FDIA primarily focuses on detecting its existence, but there is relatively little research on the localization of the attacks. To address this challenge, this study proposes a novel FDIA localization method (GA-CNN-LSTM) that combines convolutional neural networks (CNNs), long short-term memory (LSTM), and a genetic algorithm (GA) and can accurately locate the attacked bus or line. This method utilizes a CNN to extract local features and combines LSTM with time series information to extract global features. It integrates a CNN and LSTM to deeply explore complex patterns and dynamic changes in the data, effectively extract FDIA features in the data, and optimize the hyperparameters of the neural network using the GA to ensure an optimal performance of the model. Simulation experiments were conducted on the IEEE 14-bus and 118-bus test systems. The results indicate that the GA-CNN-LSTM method achieved F1 scores for location identification of 99.71% and 99.10%, respectively, demonstrating superior localization performance compared to other methods.

Keywords:

power systems; attack detection; deep learning; genetic algorithm

1. Introduction

With the rapid advancement of information technology, the degree of informatization in power systems is continuously improving, offering numerous conveniences and opportunities to the power industry. Power systems comprise both a physical and an information system. The integration of information and physical systems represents the long-term developmental trajectory of power systems. Following the deep integration of these systems, failures in the information system and network attacks will not only disrupt the information system’s function but also pose additional risks to the physical system, endangering its secure operation [1]. As power systems are critical national infrastructure and key drivers of economic and social progress, any cyber attacks on them would not only jeopardize their safe and stable operation but also result in significant economic losses and potentially disrupt normal social functions, thereby endangering personal safety [2].

The false data injection attack (FDIA) is one of the common cyber attacks in power systems, first proposed by Liu et al. in 2009 [3]. The attacker constructs false measurement values that can bypass bad data detection and mislead the system’s state estimation by tampering, deleting, or forging measurement data, thereby disrupting the system control and interfering with its normal operation. Therefore, designing an effective FDIA detection and localization method to promptly identify the attacked data, respond quickly to attacks, and mitigate their impact is of great significance for ensuring the secure and stable operation of power systems.

Since Liu et al. first introduced the concept of FDIA, numerous researchers have developed a spectrum of detection methodologies, predominantly categorized into model-based [4,5] and data-driven approaches [6,7,8,9,10,11,12,13,14,15,16]. The model-driven method does not need the training of historical data sets and can build the FDIA detection model only by the relationship between the measured data and the system state. However, the accuracy of this detection can be significantly swayed by the system parameters, and the ongoing expansion of power grids invariably amplifies the model complexity. Conversely, data-driven FDIA detection algorithms bypass the need for parameter tuning, focusing instead on the volume of available data. With an abundance of power system data acting as a robust foundation, these algorithms are garnering increasing attention. For example, to capture the spatial difference between false data and normal data, the literature [6,7,8] employs the multi-layered structure of convolutional neural networks (CNNs) to extract spatial features from measured data, which is then integrated with bad data detection for identifying FDIAs. Addressing the challenge of hyperparameter selection inherent in CNNs when localizing faults within power systems, such as the size of convolutional kernels and the number of convolutional layers, the literature [9] introduces the sparrow search algorithm (SSA). It proposes an SSA-augmented CNN optimization method to tackle the intricate issue of CNN hyperparameter tuning, thereby enhancing detection accuracy. To capture the temporal characteristics of power grid measurement data during the FDIA process, the literature [10,11] leverages the robust memory capabilities of long short-term memory (LSTM) networks in time series prediction to identify potentially compromised measurement values. In order to effectively capture the dependency between the nodes of the power systems and consider the spatial correlation of various measurement devices, the literature [12,13] proposes employing graph neural networks (GNNs) to extract spatial features of the measurement data through the power system’s inherent graph topology and identify FDIAs. The literature [14] integrates the spatial characteristics of the data with the topological information of the power systems to propose a GNN-based detection method capable of effectively detecting FDIAs even when the system’s topology evolves. To cope with the diversity and uncertainty of FDIAs, the literature [15] introduces an FDIA detection method based on deep reinforcement learning, modeling the power system as a reinforcement learning environment by using the deep reinforcement learning algorithm to train the agent to make decisions, and the agent observes the current state to select actions and judge whether an FDIA exists according to the environmental feedback. Moreover, the literature [16] presents a method for detecting large-scale FDIAs using an autoencoder. This method first generates normal measurement vectors employing an adversarial network, then compares the discrepancies between the attacked and the normal vectors using an algorithmic analysis to accurately pinpoint the location of the attack.

Despite the progress achieved by the current methods for detecting FDIAs in power systems, they still encounter several challenges when applied in real-world scenarios. Most methods only study whether there is an attack in the system but cannot accurately locate the attacked position, which limits a rapid response and the effective handling of the attack, and only a few methods can achieve the positioning of the attacked data. For example, [7] studied the use of CNNs through a multi-label classification method, classifying each element in the processing vector in two types, i.e., attack exists or does not exist, to identify the specific attacked node position. At the same time, most methods only extract features from a single spatial or temporal dimension when processing data. However, when an FDIA occurs in the power system, the measured data show a strong correlation of time and space, and the feature extraction of a single dimension may lead to an insufficient detection accuracy of the model. At present, many studies have explored improving the model performance by combining spatial and temporal features, such as [17], where researchers used the CNN-LSTM model to extract spatial and temporal features in a video, significantly improving the performance of video classification. Similarly, in [18], a neural network framework combining a CNN and LSTM was proposed to extract temporal and spatial features of data for short-term user load forecasting. In addition, research in [19] also shows that the CNN-LSTM model has a significant advantage in extracting the spatial and temporal features of dynamic gestures, achieving an effective recognition of dynamic gestures. However, deep learning-based positioning methods face the problem of difficult hyperparameter selection especially when the model forms an ensemble, as the parameters of the entire model will accumulate, and if the hyperparameters are not selected properly, the model will not be able to provide good results. For example, [20] studied the optimization of hyperparameters for an ensemble deep learning network based on AdaBoost using a genetic algorithm, and [21] studied the optimization of structural parameters for a combination model consisting of Support Vector Machine (SVM) and Gated Recurrent Unit Network (GRU-LSTM) using Particle Swarm Optimization (PSO). By using optimization algorithms to select hyperparameters, the performance of the model is improved.

In response to the aforementioned challenges and building upon the existing research, this paper introduces a method for locating FDIAs in power systems by optimizing the hyperparameters of the CNN-LSTM model using a genetic algorithm. This approach treats the localization of FDIAs in power systems as a multi-label classification problem, utilizing the CNN-LSTM model to extract the temporal and spatial features of power system data. It distinguishes between compromised and normal measurement values, thereby achieving the localization of the attacked data. The main contributions are outlined as follows:

A deep learning-based method for locating FDIAs in power systems is proposed, utilizing a CNN-LSTM classifier to locate the compromised bus or line.
The method employs a GA to optimize various hyperparameters of the neural network, thereby achieving a model with superior positioning performance.
The performance and practicality of the model are demonstrated through extensive experiments, and this method is compared with other advanced FDIA localization methods.

The rest of this paper is organized as follows. Section 2 provides a brief introduction to power system state estimation and the false data injection attack model. Section 3 elaborates on the architecture and implementation issues of the proposed FDIA localization method. Section 4 demonstrates the performance of the proposed FDIA localization method through experimental simulations. Section 5 discusses the paper and provides an outlook for future research. Finally, Section 6 summarizes this paper.

2. Power System State Estimation and False Data Injection Attack

2.1. State Estimation and Bad Data Detection

State estimation plays a crucial role in power systems. The supervisory control and data acquisition (SCADA) system collects measurement data from the field equipment and utilizes it as input for the state estimator, which determines the actual state of the power systems based on these measurements [22]. Through the accurate estimation of the state estimator, operators can monitor the power system’s real-time operation, including key indicators such as voltage, current, and power. These estimates enable operators to identify potential issues and failures, allowing them to promptly implement adjustments and repairs to maintain the power system’s stability. Furthermore, state estimation provides reliable data necessary for other applications and systems, facilitating operational strategy optimization and decision-making support. In modern power systems, the adoption of the state estimation technology has become essential for ensuring power supply security and economic operation. An accurate state estimation enhances the operational efficiency of power systems, reduces energy consumption, and supports the integration of renewable energy sources. This paper focuses on state estimation in direct-current (DC) power systems, which offer higher reliability and simpler control compared to alternating-current (AC) grids.

In the DC model, the relationship between measured values and state variables can be approximately expressed as

z = H x + e

, where

z = {(z_{1}, z_{2}, \dots, z_{n})}^{T}

represents the n-dimensional measured value,

x = {(x_{1}, x_{2}, \dots, x_{n})}^{T}

represents the m-dimensional state variable, and

e = {(e_{1}, e_{2}, \dots, e_{n})}^{T}

and H represent the measurement noise and Jacobian matrix respectively.

On this basis, the objective function F(x) for state estimation using linear weighted least squares is obtained:

\min F (x) = {(z - Hx)}^{T} R^{- 1} (z - Hx)

(1)

where R is the error covariance matrix of the measurements.

By using the weighted least squares method to solve the minimization of the objective function (1), the system state estimate

\hat{x}

is obtained:

\hat{x} = {(H^{T} R^{- 1} H)}^{- 1} H^{T} R^{- 1} z

(2)

To ensure the reliability of the state estimation results in power systems, there exists a Bad Data Detection (BDD) mechanism. BDD involves substituting the optimal solution of the state estimation into the measurement model and then calculating the residual between the actual measurement vector and the estimated value, denoted as

r = {‖z - H x‖}_{2}

. If the residual r is less than or equal to the threshold τ, there are no bad data in the measurement data. Conversely, if the residual r exceeds the threshold, bad data are present in the measurement data.

2.2. False Data Injection Attack

Assuming the attacker constructs an attack vector

a = {(a_{1}, a_{2}, \dots, a_{m})}^{T}

that has the same dimension as the original measurement vector z, then the tampered measurement data are represented as

z_{a} = z + a

. At this point, the system state estimated from the false data is

{\hat{x}}_{a} = \hat{x} + c

, where c is the deviation from the normal state.

The purpose of an FDIA is to mislead the control center to regard the attacked state estimate

{\hat{x}}_{a} = \hat{x} + c

as a valid estimate. To achieve this, the attacker needs that the residual r after the attack remains unchanged or is smaller than the threshold τ, thereby bypassing BDD. In the DC model, the attack vector a is usually constructed as a linear combination of the column vectors of the Jacobian H matrix, that is,

a = H c

, where c is any n × 1 non-zero vector. The tampered measurement vector can be expressed as

z_{a} = z + a

. The L2-norm of the measurement residual is shown in Equation (3).

\begin{matrix} r_{a} & = ‖z_{a} - {H {\hat{x}}_{a}‖}_{2} \\ = ‖z + a - {H (\hat{x} + c)‖}_{2} \\ = ‖z - H \hat{x} + {(a - H c)‖}_{2} \\ = ‖z - {H \hat{x}‖}_{2} \\ = r \end{matrix}

(3)

At this juncture, the compromised measurement vector

z_{a}

generates residuals identical to those of the actual measurement vector z, rendering the BDD based on residuals incapable of detecting the falsified data intermingled within the measurement data.

3. Methodology

In this study, we formulated the problem of FDIA location detection as a multi-label classification issue, aiming to locate the measurements within the power system that were compromised by the attack. The CNN, a deep learning model widely utilized in image recognition and computer vision tasks [23], offers significant advantages in large-scale data classification tasks. By applying the CNN model to network attack detection, its powerful processing capabilities can analyze data from various network systems. LSTM, a special recurrent neural network capable of utilizing time series data for analysis [24], incorporates memory units to enable the modeling of long sequences. Additionally, LSTM can selectively update, forget, and output information through the design of gated units, addressing the challenge of gradient vanishing or exploding, thereby better capturing long-term dependencies.

Due to the close temporal and spatial correlations in power systems measurement data, the failure of one node may lead to changes in the measurement values of multiple related nodes. Therefore, we utilized a CNN to extract the local correlation and spatial features from the measurement data of the power system, capturing local patterns in the data through the convolutional operations of its kernels. LSTM, leveraging its unique gating mechanism, effectively captures long-term dependencies within data, analyzing trends and periodicities in the power system’s measurement data over time. By combining the strengths of CNN and LSTM, our model is capable of a comprehensive analysis of the spatiotemporal characteristics of a power system, accurately identifying differences between compromised and normal measurement values, thereby achieving a precise localization of the attacked positions.

Selecting appropriate hyperparameters is crucial for constructing an effective deep learning classifier. Hyperparameter optimization aims to enhance the performance of deep neural networks on a given dataset. However, the manual selection of hyperparameters is often time-consuming and relies on experience and intuition, which may not lead to the optimal solution. The GA is a search algorithm that simulates the principles of natural selection and genetics and, compared to other algorithms, can efficiently search for the optimal solution in a complex parameter space. We employed a GA to assist in managing the cumulative increase in the number of parameters during model integration, ensuring the model’s optimal performance.

3.1. Convolutional Neural Network Model

The CNN structure consists of an input layer, a convolutional layer, a pooling layer, a fully connected layer, and an output layer, as illustrated in Figure 1.

At the input layer, n nodes correspond to n real-time measurements. In the first convolutional layer, filters are applied to the input windows, creating features via convolution, normalization, and ReLU for nonlinearity. The feature map

c_{1, j}

of the initial convolutional layer generated from the input data z can be expressed as:

c_{1, j} = ReLU (z * h_{1, j} + b_{1, j})

(4)

where

h_{1, j}

denotes the j-th convolution kernel, and

b_{1, j}

denotes the corresponding biases. The features generated by the filters in the (q-1) convolution layer are used as inputs to the q convolution layer and are processed similarly. The output can be expressed as:

c_{q, j} = ReLU (c_{q - 1, j} * h_{q, j} + b_{q, j})

(5)

where

c_{q, j}

denotes the j-th feature map of the q-th convolution layer. The extracted features learned by the last convolution layer are combined into a single vector in the flattened layer and fed into the fully connected layer via the ReLU activation function, as follows:

c_{F, j} = ReLU (w_{F} \times c_{q^{\max}} + b_{F})

(6)

where

c_{F, j}

,

w_{F}

, and

b_{F}

denote the feature map, the weight, and the deviation of the flattened layer, respectively. In the fully connected layer, every node is connected to n nodes in the output layer. Outputs are normalized to a 0–1 range by the sigmoid function, enabling the categorization of each measured value’s nature. The ultimate multi-label classification result

{\hat{y}}_{j}^{t}

is formulated as:

{\hat{y}}_{j}^{t} = sigmoid (w_{D} \times c_{F} + b_{D})

(7)

where

w_{D}

and

b_{D}

denote the weight and the bias of the fully connected layer, respectively.

3.2. Long Short-Term Memory Model

LSTM, which is designed to overcome the short-term memory limitations of traditional recurrent neural networks (RNNs), features a cell state that retains information for capturing both short- and long-term dependencies. As shown in Figure 2, it employs three key gate mechanisms, i.e., the input gate, the forget gate, and the output gate, all of which are essential for controlling cell state modifications and output production.

These gate mechanisms enable the learning and determination of whether information should be updated, retained, or outputted. This capability is crucial for LSTM to effectively capture and remember long-term dependencies within sequence data, thereby enhancing its ability to understand contextual information.

The forget gate determines the retention of information from the previous moment’s cell state

C_{t - 1}

to the current moment’s cell state

C_{t}

f_{t} = σ (w_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(8)

In the equation, σ represents the sigmoid function,

w_{f}

denotes the weight of the forget gate,

h_{t - 1}

is the hidden state of the cell at time t−1,

x_{t}

is the input data at time tt, and

b_{f}

is the bias of the forget gate.

The input gate determines what information is fed into the cell module, which consists of a sigmoid layer and a tanh layer

\begin{matrix} i_{t} & = σ (w_{i} \times [h_{t - 1}, x_{t}] + b_{i}) \\ a_{t} & = \tanh (w_{c} \times [h_{t - 1}, x_{t}] + b_{c}) \end{matrix}

(9)

In the formula,

w_{i}

represents the weight of the input gate,

b_{i}

signifies the bias of the input gate, tanh denotes the hyperbolic tangent function,

w_{c}

refers to the candidate state information that is about to be updated into the cell state, and

b_{c}

is the bias of the candidate cell state.

The update process of the LSTM cell module is:

c_{t} = f_{t} \times c_{t - 1} + i_{t} \times a_{t}

(10)

The output gate determines the output value based on the state of the cell module, controlling the influence of long-term memory on the current output and selectively emitting information

\begin{matrix} o_{t} & = σ (w_{0} \times [h_{t - 1}, x_{t}]) + b_{0} \\ h_{t} & = o_{t} \times \tanh (c_{t}) \end{matrix}

(11)

In the formula,

o_{t}

represents the output of the output gate,

w_{o}

denotes the weight of the output gate,

b_{o}

signifies the bias of the output gate, and

h_{t}

is the hidden state at time t.

3.3. Genetic Algorithm

The GA is a probabilistic optimization strategy inspired by natural selection and evolution [25]. Essentially, it is an efficient, parallel, and global search method. It is more suitable for optimizing networks with high parameter dimensions than random search algorithms.

When using the GA to solve problems, it is first necessary to define parameters such as population size, number of genes, crossover and mutation probability, and termination rules. Then, starting from any initial population, a group of individuals more adapted to the environment is produced through random selection, crossover, and mutation operations, causing the population to continuously evolve to a better area in the search space. After multiple rounds of reproduction and evolution, it eventually converges to the group of individuals most adapted to the environment, thereby obtaining a high-quality solution to the problem.

The flowchart of the GA is shown in Figure 3. The steps of the algorithm are as follows:

Population initialization: a random initial population is generated, and individuals within the population are encoded.
Fitness evaluation: the fitness of each individual is calculated, and individuals with low fitness are eliminated.
Selection: based on fitness assessment, the individuals with high fitness are selected to participate in the reproduction of the next generation.
Crossover: the selected individuals are paired according to a set crossover rate, and offspring are produced through crossover operations.
Mutation: random variations are introduced in the genotype, simulating the process of biological mutation.
Iteration and termination: the genetic operations are repeated until the termination condition is met, and the individual with the highest fitness is selected as the optimal solution to the problem.

3.4. FDIA Localization Method Based on GA-CNN-LSTM

The architecture of the designed model is depicted in Figure 4 and includes an input layer, a 1D convolutional layer, an LSTM layer, a fully connected layer, and an output layer. The input data are a one-dimensional measurement vector representing the measurement values of n instruments at each time instance. The CNN-LSTM model first utilizes the convolutional operations in the CNN part to extract key spatial features from the input vector. Subsequently, the LSTM network conducts a temporal sequence analysis on these features to identify and learn dynamic patterns and long-term dependencies that evolve over time. This allows the model to more comprehensively understand the operating state of the power system and effectively capture FDIAs. The fully connected layer maps the feature vectors to the final classification results, and the output layer applies a sigmoid activation function to output the probability of each instrument being attacked. We introduced a threshold between the model output and the classification labels. When the output value of the sigmoid function is greater than or equal to 0.5, it is classified as 1, indicating an attack; when the output value is less than 0.5, it is classified as 0, indicating no attack.

Based on the impact on the model’s learning capability, generalization ability, training efficiency, and ability to prevent overfitting, the hyperparameters selected for optimization by the GA included learning rate, number of iterations, batch size, activation function, number of convolutional kernels, kernel size, number of convolutional and LSTM layers, size of the LSTM units, and dropout rate. During the initialization phase of the GA, we randomly generated a set of hyperparameter combinations to form the initial population. Subsequently, using a fitness function, the performance of each individual was assessed. In the iterative process, the population evolved through the genetic operations of selection, crossover, and mutation, in the hope of discovering a hyperparameter combination with superior performance. After several generations of optimization, when the preset maximum number of iterations was reached, we selected the hyperparameter combination with the highest fitness from the population. Finally, this optimal set of parameters was used to train the CNN-LSTM model, ensuring that the model could achieve excellent performance on the given dataset. Figure 5 shows a detailed flowchart of the genetic algorithm in the process of optimizing the model parameters, further explaining the logic of the optimization steps.

4. Simulation Experiments

4.1. Simulation Experiment Settings

In this research, the experimental setup was executed on an Ubuntu 20.04 operating system platform, configured with an Intel Core i7-10700 CPU (Intel, Santa Clara, CA, USA), an NVIDIA Quadro RTX 6000 GPU, and 32GB of RAM (NVIDIA, Santa Clara, CA, USA). Throughout the experimental procedures, Python version 3.8.10 and TensorFlow version 2.13.0 were utilized for computational operations.

To validate the proposed GA-CNN-LSTM-based FDIA detection method for power systems, simulation experiments were conducted on the IEEE bus test systems. The topological structure and parameter information were obtained from MATPOWER [26]. The load data utilized in the experiments were from the “household_power_consumption” data set [27], which collects the electricity consumption records of a single household for more than four years.

In the simulation experiments, we utilized the node voltage amplitude and phase angle as state variables and considered the active power at the nodes and the active power of the branches as the measured values. Taking the IEEE 14-bus system as an example, the measurement vector included a total of 34 dimensions, and its topology is shown in Figure 6. First, we obtained the true value of the measurement vector from MATPOWER and simulated the actual measurement error by adding Gaussian noise with a standard deviation of 0.05. We then added the attack vectors to the actual measurement vectors to form the attacked measurement vectors and generated corresponding classification labels for each vector. For the IEEE 14-bus test system and the IEEE 118-bus test system, we generated 50,000 and 12,000 compromised measurement vectors and divided them into training and testing sets in a 4:1 ratio. To enhance the convergence velocity and mitigate the risk of overfitting, we employed the mini-batch gradient descent method for training the network. Within each batch, 7/10 of the data was allocated to the training set, while 3/10 was designated for the validation set. Due to the imbalance of the false data injection attack dataset, we used the Weighted Cross Entropy Loss function as the loss function for model training [28]. By assigning higher weights to the attack data, the model’s ability to identify attack data was effectively enhanced.

4.2. Evaluation Index

This study used precision, recall, and F1-score, three prevalent evaluation metrics in the field of FDIA detection, to verify the feasibility and effectiveness of this method. First, the following variables were defined:

True Negative (TN) denotes the number of normal samples correctly identified as normal samples.

False Negative (FN) denotes the number of samples misidentified as FDIA samples from normal measurements.

True Positive (TP) denotes the number of FDIA samples correctly identified as FDIA samples.

False Positive (FP) denotes the number of FDIA samples that are misidentified as normal measurements.

In this paper, accuracy represents the proportion of correctly classified samples in the total number of samples; precision is defined as the probability that all samples predicted to be attacked are actually attacked; recall is defined as the probability that the samples actually attacked are predicted to be attacked; the F1-score is a composite indicator that represents the balance between accuracy and recall. Accuracy, precision, recall, and F1-score are expressed as follows:

Accuracy = \frac{TP + TN}{TP + FP + TN + FN} Precision = \frac{TP}{TP + FP} Recall = \frac{TP}{TP + FN} F_{1} - score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(12)

4.3. Simulation Results

For the CNN-LSTM model, there are 10 hyperparameters that need optimization: learning rate, number of iterations, batch size, activation function, number of convolution kernels, size of the convolution kernels, number of convolution layers, number of LSTM layers, lstm_units size, and dropout size. The optimization range of the GA under different hyperparameters and the optimization results for the test systems are shown in Table 1. The search range of all hyperparameters was determined based on relevant research experience and test experiments [29,30,31,32]. For instance, learning rate, batch size, and epochs are three fundamental hyperparameters in the model training process that determine the learning dynamics and final performance of the model. Based on experience and testing, the range of learning rates we selected aimed to provide a balance, allowing the model to learn quickly and converge stably; the range of batch size allowed the model to make a trade-off between training efficiency and convergence stability; the range of epochs was designed to ensure that the model achieved good generalization ability at different stages. The hyperparameters related to the model architecture, such as the number of convolutional layers, the number of convolutional kernels, and the size of the kernels, as well as the number of LSTM layers and LSTM units, determine the model’s feature extraction ability and the processing ability of time series data. These parameters’ ranges fully consider the dimensions and complexity of the measured data, as well as the learning ability of the model. The range of the number of convolutional layers was chosen to balance the depth of the model and the computational cost; the range of the number of convolutional kernels enabled the model to capture rich features while avoiding overfitting issues; the range of kernel sizes took into account the spatial dimensions of the measured data; the range of LSTM layers and LSTM units was designed to enhance the model’s ability to process time series data while avoiding excessive increases in model complexity. Due to differences in system sizes and data, the optimization results of hyperparameters vary for different test systems.

In this paper, we chose the F1 score as the indicator for performance evaluation. As the harmonic mean of precision and recall, the F1 score not only ensures the accuracy of the model predictions but also emphasizes the model’s ability to identify positive instances. Especially when dealing with imbalanced datasets, the advantage of the F1 score becomes particularly prominent, as it can more fairly evaluate the model performance and mitigate biases arising from uneven class distributions. We set the population size of the genetic algorithm to 30 and the number of iterations to 20. The changes in the maximum F1 score of the IEEE bus test systems in each iteration are depicted in Figure 7. It is evident from the figure that in the IEEE bus test systems, as the number of iterations increased, the fitness value gradually ascended and eventually stabilized. In the IEEE 14-bus test system, it reached its maximum value of 99.71% after the 14th iteration and remained stable thereafter. Conversely, in the IEEE 118-bus test system, it achieved its maximum value of 99.10% after the 15th iteration and stabilized as well.

The optimal hyperparameters obtained through genetic algorithm optimization were utilized to build a model for testing, with the results presented in Table 2. Notably, the detection F1 scores for the two test systems were 99.71% and 99.10%, respectively, indicating excellent detection performance in both cases.

The curve graphs depicting the detection performance of the two test systems over epochs are presented in Figure 8 and Figure 9. It is evident that, following the 23rd training iteration of the IEEE14-bus, the F1 score reached the maximum value and stabilized. Similarly, after the 11th training iteration of the IEEE118-bus, the F1 score also reached the maximum value and became stable.

4.4. Comparative Experiments and Analysis

4.4.1. Robustness Comparison

To assess the robustness of the proposed method, the test systems were compared under three conditions: no noise and 5% and 10% standard deviation Gaussian noise. The test results are presented in Table 3. It is evident that the model demonstrated satisfactory detection performance across all three scenarios. Specifically, for the IEEE14-bus test system, the F1 scores were 99.73%, 99.71%, and 99.70% under the respective conditions. Similarly, for the IEEE118-bus test system, the F1 scores were 99.15%, 99.10%, and 99.08% under the three conditions.

Additionally, we tested the model under different attack intensities to evaluate its performance under various levels of attack strength. As per Section 2.2, it was assumed that an attacker constructed an attack vector

a = {(a_{1}, a_{2}, \dots, a_{m})}^{T}

. We introduced the L2-norm as a measure of the intensity of the attack. The L2-norm of the attack vector can be expressed as:

{‖a‖}_{2} = \sqrt{a_{1}^{2} + a_{2}^{2} + \dots + a_{m}^{2}}

(13)

The larger the L2-norm of the attack vector, the greater the intensity of the attack. The specific detection effects are shown in Figure 10. It can be seen that the proposed detection method could maintain an effective localization ability under different attack intensities. Moreover, as the attack intensity increased, the localization tended to improve. This was because when the attack strength increased significantly, normal data and attacked data became easier to distinguish.

4.4.2. Comparison with Other Detection Methods

Figure 11 and Figure 12 illustrate the comparison between the proposed GA-CNN-LSTM positioning algorithm and several other detection algorithms, i.e., GA-CNN-MLP [33], GA-CNN [34], CNN [35], deep neural networks (DNN) [36], and multi-layer perception (MLP) [37]. To ensure a fair comparison, all algorithms were evaluated using the same dataset. It can be seen that for the IEEE14-bus test system, the proposed positioning algorithm demonstrated superior performance, with accuracy, F1 score, and recall values of 99.88%, 99.71%, and 99.84%, respectively, outperforming all other algorithms in this evaluation. Similarly, for the IEEE118-bus test system, the proposed algorithm also demonstrated the best detection performance, with accuracy of 99.63%, F1 score of 99.10%, and recall of 99.45%.

4.4.3. Comparison of the Detection Times

To thoroughly assess the performance of the different models, we compared the time expenditure of each model during the detection process, with the results presented in Table 4.

The method proposed in this paper, due to its more complex model and a larger number of hyperparameters, had a longer detection time compared to the other models. However, generally, power systems acquire measurement data every few seconds and perform a state estimation every few minutes. This means that in the case of FDIAs, our method can effectively locate the attacked data.

5. Discussion

Accurately locating the position of false data injection attacks is of great significance for maintaining the stability and security of power systems. By quickly and accurately identifying attack behaviors, effective measures can be taken in a timely manner, thereby reducing potential losses. The method proposed in this paper performs excellently in terms of localization accuracy and response speed, demonstrating outstanding adaptability and robustness. Especially in the simulation experiments on the IEEE 14- and 118-bus test systems, the model achieved F1-scores of 99.71% and 99.10% for localization, respectively, demonstrating superior localization performance compared to existing methods. In the actual deployment process, considerations of computational resources and system compatibility are critical factors. It is essential to continue optimizing the model structure to reduce resource dependencies while maintaining high levels of detection performance.

6. Conclusions

This study presented an FDIA localization method based on GA-CNN-LSTM. The proposed approach regards FDIA localization as a multi-label classification problem, utilizing a CNN for local feature extraction and LSTM for integrating time series information to capture global features. By leveraging the strengths of CNN and LSTM, it could accurately differentiate between attacked and normal measurement values, facilitating the precise localization of the attack location. Furthermore, we incorporated genetic algorithms to optimize the neural network’s hyperparameters, thereby enhancing the detection performance of the model.

The method proposed in this paper identifies compromised data by analyzing historical measurement data characteristics, independent of the specific power system topology. Consequently, the proposed approach remains effective even when the power system topology undergoes changes. Simulation experiments conducted on the IEEE 14-bus and IEEE 118-bus test systems validated the efficacy of our proposed method. Comparative experiments against other detection methods demonstrated superior detection results achieved by the method presented in this paper. Future research will focus on further optimizing the model structure to reduce the dependence on computational resources, striving to maintain or even enhance the detection performance.

Author Contributions

Conceptualization, Z.L. and Y.X.; methodology, Z.L. and Y.X.; software, Z.L. and Y.X.; validation, Z.L. and Z.W.; formal analysis, Y.X. and R.M.; investigation, Z.L. and Z.W.; resources, Z.L. and Y.X.; data curation, Z.L. and Y.X.; writing—original draft preparation, Z.L.; writing—review and editing, Y.X. and R.M.; visualization, Z.W.; supervision, Y.X.; project administration, R.M.; funding acquisition, R.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the NSFC Young Scientist Fund, grant number 62372465.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in the study are openly available in https://github.com/llzzwwyy/ga-cnn-lstm.git, accessed on 30 June 2024; see Section 4.1 for details.

Acknowledgments

We would like to express our gratitude to the anonymous referees for their valuable suggestions, which significantly contributed to the improvement of this work.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Amin, M.; El-Sousy, F.F.M.; Aziz, G.A.A.; Gaber, K.; Mohammed, O.A. CPS attacks mitigation approaches on power electronic systems with security challenges for smart grid applications: A review. IEEE Access 2021, 9, 38571–38601. [Google Scholar] [CrossRef]
Shahidehpour, M.; Tinney, F.; Fu, Y. Impact of security on power systems operation. Proc. IEEE 2005, 93, 2013–2025. [Google Scholar] [CrossRef]
Liu, Y.; Ning, P.; Reiter, M.K. False data injection attacks against state estimation in electric power grids. ACM Trans. Inf. Syst. Secur. 2011, 14, 1–33. [Google Scholar] [CrossRef]
Qu, Z.; Zhang, J.; Wang, Y.; Georgievitch, P.M.; Guo, K. False data injection attack detection and improved WLS power system state estimation based on node trust. J. Electr. Eng. Technol. 2022, 17, 803–817. [Google Scholar] [CrossRef]
Meng, A.; Wang, H.; Aziz, S.; Peng, J.; Jiang, H. Kalman filtering based interval state estimation for attack detection. Energy Procedia 2019, 158, 6589–6594. [Google Scholar] [CrossRef]
He, Y.; Li, L.; Qian, H.; Yao, S. CNN-GRU Based Fake Data Injection Attack Detection Method for Power Grid. In Proceedings of the 2022 2nd International Conference on Electrical Engineering and Control Science (IC2ECS), Nanjing, China, 16 December 2022. [Google Scholar]
Wang, S.; Bi, S.; Zhang, Y.J.A. Locational detection of the false data injection attack in a smart grid: A multilabel classification approach. IEEE Internet Things J. 2020, 7, 8218–8227. [Google Scholar] [CrossRef]
Zhang, G.; Li, J.; Bamisile, O.; Xing, Y.; Cao, D.; Huang, Q. Identification and classification for multiple cyber attacks in power grids based on the deep capsule CNN. Eng. Appl. Artif. Intell. 2023, 126, 106771. [Google Scholar] [CrossRef]
Shen, K.; Yan, W.; Ni, H.; Chu, J. Localization of False Data Injection Attack in Smart Grids Based on SSA-CNN. Information 2023, 14, 180. [Google Scholar] [CrossRef]
Yang, L.; Zhai, Y.; Li, Z. Deep learning for online AC false data injection attack detection in smart grids: An approach using LSTM-autoencoder. J. Netw. Comput. Appl. 2021, 193, 103178. [Google Scholar] [CrossRef]
Mohammadpourfard, M.; Khalili, A.; Genc, I.; Konstantinou, C. Cyber-resilient smart cities: Detection of malicious attacks in smart grids. Sustain. Cities Soc. 2021, 75, 103116. [Google Scholar] [CrossRef]
Boyaci, O.; Narimani, M.R.; Davis, K.; Serpedin, E. Cyberattack detection in large-scale smart grids using chebyshev graph convolutional networks. In Proceedings of the 2022 9th International Conference on Electrical and Electronics Engineering (ICEEE), Alanya, Turkey, 29–31 March 2022. [Google Scholar]
Boyaci, O.; Umunnakwe, A.; Sahu, A.; Narimani, M.R.; Lsmail, M.; Davis, K.R.; Serpedin, E. Graph neural networks based detection of stealth false data injection attacks in smart grids. IEEE Syst. J. 2021, 16, 2946–2957. [Google Scholar] [CrossRef]
Li, X.; Wang, Y.; Lu, Z. Graph-based detection for false data injection attacks in power grid. Energy 2023, 263, 125865. [Google Scholar] [CrossRef]
An, D.; Yang, Q.; Liu, W.; Zhang, Y. Defending against data integrity attacks in smart grid: A deep reinforcement learning-based approach. IEEE Access 2019, 7, 110835–110845. [Google Scholar] [CrossRef]
Huang, X.; Qin, Z.; Xie, M.; Liu, H.; Meng, L. Defense of massive false data injection attack via sparse attack points considering uncertain topological changes. J. Mod. Power Syst. Clean Energy 2021, 10, 1588–1598. [Google Scholar] [CrossRef]
Wu, Z.; Wang, X.; Jiang, Y.; Ye, H.; Xue, X. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In Proceedings of the 2015 23rd ACM international conference on Multimedia, New York, NY, USA, 13 October 2015. [Google Scholar]
Alhussein, M.; Aurangzeb, K.; Haider, S.I. Hybrid CNN-LSTM model for short-term individual household load forecasting. IEEE Access 2020, 8, 180544–180557. [Google Scholar] [CrossRef]
Wu, J.; Wang, C.; Xie, L. Device-Free In-Air Gesture Recognition Based on RFID Tag Array. ZTE Commun. 2021, 19, 13–21. [Google Scholar]
Qu, Z.; Liu, H.; Wang, Z.; Xu, J.; Zhang, P.; Zeng, H. A combined genetic optimization with AdaBoost ensemble model for anomaly detection in buildings electricity consumption. Energy Build. 2021, 248, 111193. [Google Scholar] [CrossRef]
Xing, F.; Song, X.; Wang, Y.; Qin, C. A new combined prediction model for ultra-short-term wind power based on variational mode decomposition and gradient boosting regression tree. Sustainability 2023, 15, 11026. [Google Scholar] [CrossRef]
Alayande, A.S.; Nwulu, N.; Bakare, A.E. Modelling and countermeasures of false data injection attacks against state estimation in power systems. In Proceedings of the 2018 International Conference on Computational Techniques, Electronics and Mechanical Systems (CTEMS), Belgaum, India, 21–22 December 2018. [Google Scholar]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmiduber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Mirjalili, S. Genetic algorithm. In Evolutionary Algorithms and Neural Networks: Theory and Applications; Springer: Cham, Switzerland, 2019; Volume 780, pp. 43–55. [Google Scholar]
Zimmerman, R.D.; Murillo-Sánchez, C.E.; Gan, D. Matpower. Available online: http://www.pserc.cornell.edu/matpower (accessed on 30 June 2024).
Hebrail, G.; Berard, A. Individual Household Electric Power Consumption Data Set. UCI Machine Learning Repository, 2012. Available online: https://archive.ics.uci.edu/dataset/235/individual+household+electric+power+consumption (accessed on 30 June 2024).
Rezaei-Dastjerdehei, M.R.; Mijani, A.; Fatemizadeh, E. Addressing imbalance in multi-label classification using weighted cross entropy loss function. In Proceedings of the 2020 27th National and 5th International Iranian Conference on Biomedical Engineering (ICBME), Tehran, Iran, 26–27 November 2020. [Google Scholar]
Liao, L.; Li, H.; Shang, W.; Ma, L. An empirical study of the impact of hyperparameter tuning and model optimization on the performance properties of deep neural networks. ACM Trans. Softw. Eng. Methodol. (TOSEM) 2022, 31, 1–40. [Google Scholar] [CrossRef]
Mishkin, D.; Sergievskiy, N.; Matas, J. Systematic evaluation of convolution neural network advances on the imagenet. Comput. Vis. Image Underst. 2017, 161, 11–19. [Google Scholar] [CrossRef]
Yoo, J.; Yoon, H.; Kim, H.; Yoon, H.; Han, S. Optimization of hyper-parameter for CNN model using genetic algorithm. In Proceedings of the 2019 1st International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), Kuala Lumpur, Malaysia, 25 November 2019. [Google Scholar]
Gorgolis, N.; Hatzilygeroudis, I.; Istenes, Z.; Gyenne, L. Hyperparameter optimization of LSTM network models through genetic algorithm. In Proceedings of the 2019 10th International Conference on Information, Intelligence, Systems and Applications (IISA), Patras, Greece, 15–17 July 2019. [Google Scholar]
Shawky, O.A.; Hagag, A.; El-Dahshan, E.S.A.; Ismail, M.A. Remote sensing image scene classification using CNN-MLP with data augmentation. Optik 2020, 221, 165356. [Google Scholar] [CrossRef]
Fan, Z.; Bai, K.; Zheng, X. Hybrid GA and Improved CNN Algorithm for Power Plant Transformer Condition Monitoring Model. IEEE Access 2023, 12, 60255–60263. [Google Scholar] [CrossRef]
Mukherjee, D. Detection of data-driven blind cyber-attacks on smart grid: A deep learning approach. Sustain. Cities Soc. 2023, 92, 104475. [Google Scholar] [CrossRef]
Li, J.; Yang, Y.; Sun, J.S.; Tomsovic, K.; Qi, H. Towards adversarial-resilient deep neural networks for false data injection attack detection in power grids. In Proceedings of the 2023 32nd International Conference on Computer Communications and Networks (ICCCN), Honolulu, HI, USA, 24–27 July 2023. [Google Scholar]
Alimi, O.A.; Ouahada, K.; Abu-Mahfouz, A.M. Real time security assessment of the power system using a hybrid support vector machine and multilayer perceptron neural network algorithms. Sustainability 2019, 11, 3586. [Google Scholar] [CrossRef]

Figure 1. CNN model structure.

Figure 2. LSTM model structure.

Figure 3. Flow chart of the genetic algorithm.

Figure 4. CNN-LSTM model.

Figure 5. Flow chart of the genetic algorithm optimization model’s hyperparameters.

Figure 6. Topology of the IEEE 14-bus test system.

Figure 7. Evolution effect of the F1 score.

Figure 8. IEEE14-bus localization effect.

Figure 9. IEEE118-bus localization effect.

Figure 10. Localization effects under varying attack intensities.

Figure 11. IEEE14-bus comparison experiment.

Figure 12. IEEE118-bus comparison experiment.

Table 1. Range and results of hyperparameters for the GA-optimized CNN-LSTM model.

	Parameters Range	IEEE 14-Bus	IEEE 118-Bus
learning rate	0.0005, 0.0008, 0.001, 0.002, 0.003	0.0005	0.0005
epoch	25, 30, 35, 40	35	25
batch_size	32, 64, 128, 256	128	64
activation function	relu, tanh	relu	tanh
number of layers	1, 2, 3, 4	1	4
number of kernels	32, 64, 128, 256	32	256, 128, 256, 32
size of kernels	3, 4, 5, 6	6	3, 4, 3, 6
LSTM layers	1, 2	1	1
lstm_units	64, 128, 256, 512	512	256
dropout	0.1, 0.2, 0.3	/	0.2

Table 2. Detection results.

	F1 Score	Accuracy	Recall
IEEE14-bus	99.71%	99.88%	99.84%
IEEE118-bus	99.10%	99.63%	99.45%

Table 3. Detection effects under different noises.

	0% Noise	5% Noise	10% Noise
IEEE14-bus	99.73%	99.71%	99.70%
IEEE118-bus	99.15%	99.10%	99.08%

Table 4. Comparison of detection time.

	IEEE 14	IEEE 118
GA-CNN-LSTM	14 ms	31 ms
GA-CNN-MLP	13 ms	29 ms
GA-CNN	11 ms	26 ms
CNN	12 ms	27 ms
DNN	6 ms	14 ms
MLP	2 ms	5 ms

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Z.; Xie, Y.; Ma, R.; Wei, Z. Optimizing CNN-LSTM for the Localization of False Data Injection Attacks in Power Systems. Appl. Sci. 2024, 14, 6865. https://doi.org/10.3390/app14166865

AMA Style

Li Z, Xie Y, Ma R, Wei Z. Optimizing CNN-LSTM for the Localization of False Data Injection Attacks in Power Systems. Applied Sciences. 2024; 14(16):6865. https://doi.org/10.3390/app14166865

Chicago/Turabian Style

Li, Zhuo, Yaobin Xie, Rongkuan Ma, and Zihan Wei. 2024. "Optimizing CNN-LSTM for the Localization of False Data Injection Attacks in Power Systems" Applied Sciences 14, no. 16: 6865. https://doi.org/10.3390/app14166865

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing CNN-LSTM for the Localization of False Data Injection Attacks in Power Systems

Abstract

1. Introduction

2. Power System State Estimation and False Data Injection Attack

2.1. State Estimation and Bad Data Detection

2.2. False Data Injection Attack

3. Methodology

3.1. Convolutional Neural Network Model

3.2. Long Short-Term Memory Model

3.3. Genetic Algorithm

3.4. FDIA Localization Method Based on GA-CNN-LSTM

4. Simulation Experiments

4.1. Simulation Experiment Settings

4.2. Evaluation Index

4.3. Simulation Results

4.4. Comparative Experiments and Analysis

4.4.1. Robustness Comparison

4.4.2. Comparison with Other Detection Methods

4.4.3. Comparison of the Detection Times

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI