Model-Free Resilient Grid-Forming and Grid-Following Inverter Control Against Cyberattacks Using Reinforcement Learning

Beikbabaei, Milad; Kwiatkowski, Brian Michael; Mehrizi-Sani, Ali

doi:10.3390/electronics14020288

Open AccessArticle

Model-Free Resilient Grid-Forming and Grid-Following Inverter Control Against Cyberattacks Using Reinforcement Learning

by

Milad Beikbabaei

,

Brian Michael Kwiatkowski

and

Ali Mehrizi-Sani

^*

The Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA 24061, USA

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(2), 288; https://doi.org/10.3390/electronics14020288

Submission received: 22 November 2024 / Revised: 6 January 2025 / Accepted: 8 January 2025 / Published: 13 January 2025

(This article belongs to the Section Power Electronics)

Download

Browse Figures

Versions Notes

Abstract

:

The U.S. movement toward clean energy generation has increased the number of installed inverter-based resources (IBR) in the grid, introducing new challenges in IBR control and cybersecurity. IBRs receive their set point through the communication link, which may expose them to cyber threats. Previous work has developed various techniques to detect and mitigate cyberattacks on IBRs, developing schemes for new inverters being installed in the grid. This work focuses on developing model-free control techniques for already installed IBR in the grid without the need to access IBR internal control parameters. The proposed method is tested for both the grid-forming and grid-following inverter control. Different detection and mitigation algorithms are used to enhance the accuracy of the proposed method. The proposed method is tested using the modified CIGRE 14-bus North American grid with seven IBRs in PSCAD/EMTDC. Finally, the performance of the detection algorithm is tested under grid normal transients, such as set point change, load change, and short-circuit fault, to make sure the proposed detection method does not provide false positives.

Keywords:

cyberattack; inverter-based resources (IBR); power system control; reinforcement learning (RL); renewable energy sources

1. Introduction

The U.S. has the goal of clean energy generation by 2035, and the number of inverter-based resources (IBR), such as photovoltaic systems and wind turbines, is on the rise [1]. Furthermore, there is a global effort to move from fossil-fueled energy generation to 100% clean power grids [2]. The vast number of IBRs in the grid introduces new challenges to the control and cybersecurity of the power system [3]. IBRs have faster transient responses compared to synchronous machines, enhancing the importance of the inverter control algorithm in maintaining the stability of the grid under various transients. Furthermore, IBRs can be controlled either in a grid-forming or grid-following mode, where the IBR set points are adjusted using the communication link, which offer openings for cyberattacks [4,5]. Moreover, the advent of fast and reliable communication has facilitated the broad installation of these devices in the power system. They are primarily used for exchanging electrical data and commands to breakers and generation units [6]. Multiple cyberattacks have been reported previously on the power system. The Ukrainian power system was the target of a major cyberattack, causing a blackout that lasted for 6 hours, affecting 225,000 Ukrainians in 2015 [7]. Attackers targeted the SolarWinds’ Orion software through a supply chain attack, gaining unauthorized access to government and other systems in 2020 [8]. In 2023 alone, the U.S. Department of Energy (DOE) reported 120 disturbances to the power system, where 114 were due to vandalism and 6 were due to cyber events. The attack on March 2023 successfully interrupted the monitoring and control of a control center for roughly 30 min [9]. Standard protocols have known vulnerabilities, making equipment being used in the power system more susceptible to cyberattacks [10]. For instance, flaws in the transmission control protocol/internet protocol (TCP/IP) have been reported [8]. Using secure communication protocols increases the power system’s resilience against cyberattacks [11]. As a result, designing cyber-resilient IBR control techniques is necessary for preventing system outages due to cyberattacks.

Previous work has focused on designing an inverter control which either needs the microgrid model, can only be installed on the microgrid controller, or required an inverter control update [12,13,14,15]. The lack of a detailed grid model, often due to substantial changes in the grid, poses a significant challenge. Moreover, updating the control algorithms or parameters of the already installed IBR control algorithm is often not possible due to vendor policies. Model-free control approaches modifying the IBR set points without accessing or modifying the internal control parameter can be used to enhance the performance of installed IBRs [16,17,18]. However, the aforementioned model-free control approaches are still susceptible to cyberattacks, thus creating a research gap. A separate detection and mitigation layer can be added to develop a cyber-resilient model-free algorithm. Moreover, reinforcement learning (RL) is a popular choice for developing an optimal controller since it is model-free, can handle nonlinear behavior, and performs well for unknown dynamics and under highly uncertain conditions [19]. A model-free RL-based inverter control increases the grid’s resilience against cyberattacks.

Machine learning (ML) algorithms are widely used to detect different types of cyberattacks, such as false data injection (FDI) and denial of service attack (DoS). Moreover, ML algorithms can be used to mitigate the adverse effect of the attack after the attack has been detected [20,21,22,23]. Reference [5] detected anomalies on the real and reactive power set point of a grid-following inverter using an LSTM algorithm. Reference [24] proposes a machine learning-based mitigation method against DoS attacks in T-S fuzzy networked control systems. A semi-supervised learning approach is used in [25] to detect and mitigate FDI attacks that can pass conventional bad data detection algorithms utilizing autoencoders and generative adversarial networks. In [26], deep reinforcement learning (RL) is used to adjust compromised DERs, mitigating the voltage instability during a FDI attack.

This work uses a model-free approach, modifying the inverter set points after the attack has been detected to mitigate the adverse effects of the attack. This work’s proposed control algorithm has the following salient features:

A detection algorithm for FDI attacks for both the grid-forming and grid-following inverter is developed.
A mitigation algorithm is developed using reinforcement learning for FDI attacks.
The detection algorithm correctly avoids interfering with the grid’s normal transients, such as load change, step change, and short circuit faults.

Section 2 discusses the inverter control, and Section 3 discusses the details of the proposed detection and mitigation method for both the grid-forming and grid-following inverter. Section 4 discusses the test system implementation, and Section 5 offers the performance evaluation. Section 6 discusses future works and Section 7 concludes the paper.

2. Inverter Control

An inverter can be controlled in an either grid-following or grid-forming mode [4].

2.1. Grid-Following Control

An inverter in the grid-following mode of operation receives real and reactive power set points and adjusts the inverter output voltage to keep the output’s real and reactive power as close to their set points as possible. Since the inverter output voltage is generated by switching and includes harmonics, the inverter is connected to the grid through an

R L

filter [4]. The grid-following inverter uses a phase-locked loop (PLL) to estimate the voltage angle, which is later used in the grid-following control loop to convert voltage from

a b c

-frame to

d q

-frame and vice versa. Moreover,

P I

controllers are used to adjust the inverter output voltage. Figure 1a shows the control loop of a grid-following inverter, where the real power set point,

P_{set}

, and the reactive power set point,

Q_{set}

, are the inputs used to calculate the inverter current set point,

I_{set}

. The inverter control uses electrical measurements such as the real power, P; reactive power, Q; inverter output current, I; terminal voltage,

V_{s}

; and inverter output voltage

V_{t}

, as shown in Figure 1a [4].

2.2. Model-Free RL-Based Grid-Following Control

The real and reactive power set point of an inverter in the grid-following mode of operation can be altered from its original value by launching cyberattacks, controller operator mistakes, or even communication noise, which may lead to voltage violation. Detection and mitigation techniques can be combined with the conventional grid-following inverter control [5,27]. This work proposes a cyber-resilient inverter control for grid-following inverters using a detection and mitigation algorithm based on an RL algorithm, as shown in Figure 1b. After the attack is detected, the RL algorithm modifies the real and reactive power set point by either increasing or decreasing it, ensuring the voltage is within the nominal range, from 0.96 pu to 1.04 pu, in this work.

2.3. Conventional Grid-Forming Control

An inverter in the grid-forming mode of operation helps maintain the grid voltage and frequency within the nominal range by adjusting the inverter output’s real and reactive power [4]. P − f and Q − V droop control algorithms can be used to maintain the grid voltage and frequency. Q − V droop modifies the inverter output reactive power based on the difference between the magnitude of the inverter terminal voltage and the set point voltage, as shown in Figure 2a. The P − f droop modifies the inverter output’s real power based on the frequency deviation, as shown in Figure 2a. Droop control outputs are the input of the voltage control loop, and the output of the voltage control loop is the input of the current control loop, as shown in Figure 2b. In a microgrid operating in the islanded mode, a specific number of inverters need to operate in the grid-forming mode to support the grid voltage and frequency.

2.4. Model-Free RL-Based Grid-Forming Control

The droop frequency, real power set point, and voltage are the inputs of an inverter in the grid-forming mode. The inverter voltage set point can be adjusted by the control center; however, the frequency is usually fixed and set to the grid frequency, 60 Hz. The voltage set point of an inverter in the grid-forming mode of operation can be altered from its original value by launching cyberattacks, controller operator mistakes, or even communication noise, which may lead to voltage violations. Detection and mitigation techniques can be combined with the grid-forming inverter control. This work proposes a cyber-resilient inverter control for grid-forming inverters using a detection and mitigation algorithm based on an RL algorithm as shown in Figure 2c. After the attack is detected, the RL algorithm modifies the voltage set point by either increasing or decreasing it, ensuring the voltage is within the nominal range, from 0.96 pu to 1.04 pu, in this work.

3. Detection and Mitigation Method

This section discusses the basics of RL, the basics of inverter control, the proposed inverter control, the detection method, and RL implementation.

3.1. Reinforcement Learning (RL) Basics

Reinforcement learning (RL), along with supervised and unsupervised learning, is one of the three categories that encompass machine learning algorithms. Unlike supervised and unsupervised learning, RL interacts with an environment, taking actions and evaluating their effect [28]. This is done by establishing a reward system that evaluates the taken action on the environment and connecting it to the state that the system is at when the action is taken. Over a period of trial-and-error, an RL system can analyze the state of the system and, using its developed dataset, predict the best action [28].

RL includes several methods that can be used to train the system. The Markov Decision Process (MDP) is one of the methods used for decision making in RL that uses the state–reward relationship to predict optimal actions at a given state [29]. Unlike Q-learning or other RL methods, an MDP is only concerned with the current state of the environment and does not consider any previous data in predicting the optimal action. The prediction of the action is independent of any previous states and is only concerned with the most up to date information.

Setting up a reinforcement learning system requires the establishment of the following information before further progress may ensue. RL requires an environment to interact with before any further progress can occur. An environment is the set of rules and boundaries that the system can interact with over the course of training and testing. This includes defining the actions that the system can take, the states that can be evaluated, as well as the reward system that is implemented [29]. The set of variables that the RL system intakes for prediction is defined as the state. A system may have a list of one or more variables that make up a state; however, this must be clearly defined in the environment. Actions are the changes that an agent may be allowed to perform within an environment [30]. In an RL environment, the number of actions may be large or small; however, a longer list of actions requires additional state-action pairs to be trained. After an action has been performed by an agent in the environment, a reward function is applied to develop a numerical value that represents the effectiveness of the previous action in performing the intended function. Typically, the reward function is structured so that the optimal reward is of the highest positive value; however, an agent may be customized to search for rewards that are mostly negative or closest to zero [29]. Once a training dataset has been developed, an agent can be used to analyze said dataset and select an action that best suits the intended output of the RL system.

In [31], a similar approach is introduced to establish a method for RL to be used for real-time strategy games. The method proposed uses a Semi-Markov Decision Process (SMDP) and an

ϵ

-greedy approach for the RL system. An

ϵ

-greedy selection method is used for training an RL system that has a large number of state–action pairs.

ϵ

-greedy is performed by taking a random action with probability

ϵ

and the greedy action with probability

1 - ϵ

[31]. Training the system occurs by selecting an action via

ϵ

-greedy, performing the action, and evaluating the reward. The recorded information is used to adjust the option-value function used to select the optimal action. This work follows a similar approach to accomplish the designated task; however,

ϵ

-greedy is not used, since the number of state-action pairs is smaller compared to that of other RL algorithms.

This work details the use of RL and MDP to mitigate FDI attacks on a microgrid once an attack has been detected. A training method is developed to train the RL agent to choose the best action for the state that the system is in. Additionally, a reward function is implemented that aids in the agent’s search for the optimal action to take given the state of the system.

3.2. RL Implementation

This section discusses the implementation of RL for this work. In total, there are three RL models that are used in the construction of the system. For each GFL inverter, there is a RL model that makes predictions based on the changes in

P_{s e t}

and another for changes in

Q_{s e t}

. For each GFM inverter, there is a RL model that makes predictions based on changes in

V_{r e f}

. The RL models for

P_{s e t}

,

Q_{s e t}

, and

V_{r e f}

have states that correspond to the current voltage measured at the output of their respective inverters. The RL is designed as follows:

3.2.1. State s

s is defined as the discrete state of the system. The state for each RL implementation is the voltage which is read by the inverter at the given time step. It is used for calculating the reward as well as for tuning the agent to find the best action at a particular state. Error e is defined as the difference between s and the voltage of 1.0 pu.

3.2.2. Action a

a is the action that is chosen by the agent to be performed on the system. For this work, the action is the value

a (t)

that changes the value of the given reference point. For

P_{r e f}

,

Q_{r e f}

, and

V_{r e f}

,

a (t)

has the following values: 0.01, −0.01, 0.001, −0.001, and 0. Testing indicated that

a (t)

did not require a difference in its magnitude due to the observed effect that it has on the system with the time step used during simulation.

3.2.3. Reward Function R

R is the reward function that defines the reward that is applied to a state as a result of an action a. The purpose of the mitigation algorithm is to reduce the effects on voltage as a result of a FDI attack. The reward function is designed such that the mitigation system returns the voltage to a level within a specified range as quickly as possible. This reward function is designed with the following criteria in mind:

R (t) = \{\begin{matrix} - e - λ Δ a, & if |e| > 0.04 \\ 0, & if |e| \leq 0.04 and Δ a = 0 \\ - 10 . & if |e| \leq 0.04 and Δ a \neq 0 \end{matrix}

(1)

In (1),

λ

is defined as the penalty factor applied to the value of

a (t)

to optimize the agent for selecting an action that approaches the outer limit of the voltage [32].

3.2.4. Training

Given that the previous sections have established the criteria on which the system relies, the RL agent needs to be trained. To produce the intended results of the agent, a training algorithm must be developed for the agent to have enough information. This requires having a sufficient dataset that contains enough states with different actions for the agent to choose from. Moreover, the generated dataset needs to ensure the robustness of the RL algorithm against diverse attack scenarios. Collecting all possible attack scenarios is impractical since it leads to a very large dataset. Additionally, attackers can develop novel attacks that are not considered in the training dataset. One way to overcome the aforementioned challenges is to use a pseudo-random binary sequence (PRBS)-based technique. PRBS-based techniques are used in system identifications and for testing the controller under various transients [33]. Using a PRBS-based technique allows for the testing of the controller under various transients, mainly due to random sequence generation. This work uses PRBS-based techniques to generate the dataset needed to train the RL agent to ensure that the RL agent gives the intended results under various scenarios. Since there are three different agents (one each for

P_{s e t}

,

Q_{s e t}

, and

V_{r e f}

), this requires three separate training algorithms to be developed.

Each dataset is trained over the course of 100 s, taking actions every 25 ms. Initially, the co-simulation is ran for 5 s for the microgrid to reach steady-state before values are taken. The agent is then be allowed to take a random action for 1.5 s at a given set point, with a reward being recorded along with the state and action that occurred. The system then returns to the starting set point for a period of 0.5 s. This occurs in intervals of 10 s before a new starting set point is chosen. This is done to capture the broadest number of states with enough actions taken to find the most suitable reward for a state. Training for

P_{s e t}

has starting set points between 0 and 1 pu; for

Q_{s e t}

, it has starting set points between −1 and 1 pu; and for

V_{r e f}

, the starting set points are between 0.85 and 1.15 pu. Figure 3 shows the plot for the collection of data for

P_{s e t}

RL. Contrary to the other RL training methods, the value of

P_{s e t}

can only have values between 0 and 1, which does not cover the entire range of values that the voltage may have. To remedy this,

Q_{s e t}

is changed to increase the starting voltage for a third of the training length and to decrease for another third of the training length. This allows for further voltage ranges to be reached and actions to be taken. Figure 4 shows the plot for the collection of data for the

Q_{s e t}

RL. Figure 5 shows the plot for the collection of data for the

V_{r e f}

RL. The training datasets for each of the RLs can now be used for the mitigation of simulated cyberattacks.

3.2.5. Agent

In the previous section, the training datasets were developed for each of the set points mitigated. The agent developed in this section uses the datasets developed to find the best action at the state the system is in at a given time step. The first step of the agent to predict the best action is to analyze the list of states that are available to choose from. The agent reads the current state of the system and then proceeds to find states inside the dataset that are within a range. For each of the agents, the range is set for states in the dataset that fall within 0.01 pu of the voltage read by the inverter. The agent then categorizes all of the states within the range with respect to the action that corresponds to them. Following the grouping of state–action relationships, the reward is averaged for each action that may be taken. This average reward for each action is known as the Q-value. The agent proceeds to select the action with a Q-value closest to a value of 0. The reward function is structured so that the optimal action to take is the one with the reward closest to 0, reducing the error. This process repeats during each time step for which the mitigation method is called to take action.

The value of 0.01 pu voltage for the search criteria is chosen to encompass enough data so that the Q-value is the most accurate. Cases where the search criteria are smaller may lead to taking the wrong action if one or more actions are not taken during training at that particular state range. In cases where the search criteria are too large, this could lead to taking actions when the voltage is inside the allowed range of tolerance or not taking an action when the voltage is outside the range of tolerance.

3.2.6. Libraries Used

This work utilizes Python 3.9.13 as the virtual environment where the system is designed. Additionally, this work utilizes three libraries that are fundamental to the development of the RL system. Numpy 1.26.3 is used for formatting arrays in the agent’s search algorithm. Additionally, mhi-cosim 1.1.3 is used for the interaction between PSCAD and the virtual environment. Finally, Pandas 2.2.0 is utilized for formatting the dataframes used for the collection and importation of data for the agent to search.

3.3. Detection and Mitigation Algorithm Implementation

The detection algorithm, presented in Algorithm 1, monitors the current and previous set points imposed on each inverter. In the event that the detection algorithm identifies that a FDI attack has occurred, the system calls for the mitigation method to take action. The detection algorithm identifies that at least 100 ms have lapsed since the last event. This is represented in the pseudocode by the change in the discrete time of the system and the updating_timer. Additionally, less than five actions must have been taken by the mitigation method before checking for the following conditions. If the detection algorithm identifies that a change in set point has occurred, then the system invokes a ready-flag on the system. When action = 4, then, change = 0, and therefore, no difference in set point occurs during the time step. In the event that the ready-flag has been invoked, and the predicted action = 4, then, the detection algorithm raises a detection-flag. This calls for the system to predict the action necessary to mitigate the system. This is represented in line 14 of the pseudocode.

V (t)

is the discrete voltage state of the system. Otherwise, the mitigation algorithm does not take action and allows for the original adjustment of the set point to be made to the system. After the detection flag becomes 1, the RL action will be applied to the received set point and update the inverter set point to mitigate the adverse effect of the attack.

The size of the actions taken in each time step and the time-step values are tunable variables of the mitigation method. Varying the time step of the PSCAD co-simulation requires adjusting the reward function and action size to produce consistent results. The results indicate that the use of a smaller time step reduces the detection and mitigation times. However, less time for computation makes the real-time implementation of the algorithm unfeasible. A time step of 25 ms is used to perform all PSCAD simulations since the average mitigation algorithm takes 12.263 ms to find the best action. The size of the actions used is tuned to prevent the system from becoming unstable and unable to return to the original voltage. The value of

λ

is chosen to create an optimal range between actions of different sizes. Larger distances of voltage from the tolerant range require a larger action. A value of 0.3 is selected for this work.

Algorithm 1 Detection algorithm pseudocode.

1:: if $(time - Activated - time) > 100 ms and$
$Action - counter \leq 5$ then
2:: if ${INV}_{set} - {INV}_{set, prev} \neq 0$ then
3:: $Ready - flag = 0$
4:: else
5:: $Ready - flag = 1$
6:: $Activated - time = 0$
7:: end if
8:: end if
9:: if $Ready - flag = 1 and action \neq 4$ then
10:: $Detection - flag = 1$
11:: $Action - counter = Action - counter + 1$
12:: end if
13:: if $Detection - flag = 1$ then
14:: ${INV}_{s e t, c h a n g e} = find_action (V (t))$
15:: ${INV}_{s e t}$ = ${INV}_{s e t}$ + ${INV}_{s e t, c h a n g e}$
16:: end if

4. Implementation Test System

The detection and mitigation methods require a physical layer to interact with for both training and testing the system. The physical layer of this work is represented by PSCAD, since it is capable of modeling and simulating IBRs with high accuracy. Figure 6 shows the physical layer of the system, represented by a single-line diagram with seven IBRs. The base power of inverters 1 and 12 are 10 MVA, that of inverters 3, 4, 8, 10, and 14 is 5 MVA, and the base voltage of all inverters is 480 V. The primary and secondary voltages of the transformers connecting the IBRs to the grid are 480 V and 12.47 kV. Each IBR has an output voltage of 480 V and a DC bus voltage of 1.2 kV. The filter resistance and reactance of the IBRs is 2 mΩ and 30 uH. IBRs 1, 4, 12, and 14 are set to the grid-following mode, and IBRs 3, 8, and 10 are set to the grid-forming mode. Gains of the PI block for the grid-forming mode in P − f droop control are

k_{P} = 01.5

and

k_{I} = 0.02

. In the Q − V droop control, the grid-forming gains of the PI block are

k_{P} = 0.65

and

k_{I} = 0.015

. The gains of the PI block for the grid-following mode are

k_{P}

= 0.5 and

k_{I} = 0.05

.

5. Performance Evaluation

This section discusses the simulation results for FDI attacks on the P and Q channels of the GFL units and V channel of the GFM units and under normal grid transients.

5.1. FDI Attack Cases

5.1.1. Positive Ramp Attack of GFL Inverters

At

t = 0.05

s, a FDI cyberattack is launched by attackers that increases the real power set point of IBR 1 from 5 MW to 5.5 MW, IBR 4 from 3 MW to 3.3 MW, IBR 12 from 5 MW to 5.5 MW, and IBR 14 from 3.5 MW to 3.85 MW. The reactive power set point is increased for IBR 1 from 2.1 MVAR to 2.31 MVAR, IBR 4 from 1.26 MVAR to 1.39 MVAR, IBR 12 from 2.1 MVAR to 2.31 MVAR, and IBR 14 from 1.47 MVAR to 1.62 MVAR over the course of 200 ms. Figure 7a,g,m,s shows the detection signal with and without mitigation applied and without an attack. Figure 7b shows the real power set point of IBR 1, where its pre-attack value is 5 MW and increases to 5.5 MW after the attack is launched in an undetected attack case. Figure 7c shows the reactive power set point of IBR 1, where its pre-attack value is 2.1 MVAR and increases to 2.31 MVAR after the attack is launched in an undetected attack case. IBR 1 does not call for the mitigation to be applied since the attack does not increase the voltage of IBR 1 above 1.04 pu. Figure 7h shows the real power set point of IBR 4, where its pre-attack value is 3 MW and increases to 3.3 MW after the attack is launched in an undetected attack case. Figure 7i shows the reactive power set point of IBR 4, where its pre-attack value is 1.26 MVAR and increases to 1.39 MVAR after the attack is launched in an undetected attack case. IBR 4 does not call for the mitigation to be applied since the attack does not increase the voltage of IBR 4 above 1.04 pu. Figure 7n shows the real power set point of IBR 12, where its pre-attack value is 5 MW and increases to 5.5 MW after the attack is launched in an undetected attack case. Figure 7o shows the reactive power set point of IBR 12, where its pre-attack value is 2.1 MVAR and increases to 2.31 MVAR after the attack is launched in an undetected attack case. With the mitigation method applied, the real power set point is adjusted to 5.1 MW and the reactive power set point is adjusted to 1.81 MVAR. Figure 7r shows IBR 12 bus voltage, where its pre-attack value is 1.03 pu and increases to 1.07 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 12 bus voltage results in a voltage of 1.02 pu. Figure 7t shows the real power set point of IBR 14, where its pre-attack value is 3.5 MW and increases to 3.85 MW after the attack is launched in an undetected attack case. Figure 7u shows the reactive power set point of IBR 14, where its pre-attack value is 1.47 MVAR and increases to 1.62 MVAR after the attack is launched in an undetected attack case. With the mitigation method applied, the real power set point is adjusted to 3.71 MW and the reactive power set point is adjusted to 1.34 MVAR. Figure 7x shows the IBR 14 bus voltage, where its pre-attack value is 1.03 pu and increases to 1.05 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 14 bus voltage results in a voltage of 1.03 pu. The detection method for IBR 12 takes 100 ms to recognize the attack and an additional 125 ms to mitigate the attack, The detection method for IBR 14 takes 100 ms to recognize the attack and an additional 100 ms to mitigate the attack.

5.1.2. Negative Bias Attack of GFL Inverters

At

t = 0.05

s, a FDI cyberattack is launched by attackers that decreases the real power set point of IBR 1 from 5 MW to 4.5 MW, IBR 4 from 3 MW to 2.7 MW, IBR 12 from 5 MW to 4.5 MW, and IBR 14 from 3.5 MW to 3.15 MW. The reactive power set point is decreased for IBR 1 from 1.5 MVAR to 1.35 MVAR, IBR 4 from 0.9 MVAR to 0.81 MVAR, IBR 12 from 1.5 MVAR to 1.35 MVAR, and IBR 14 from 1.05 MVAR to 0.95 MVAR. Figure 8a,g,m,s shows the detection signal with and without mitigation applied and without an attack. Figure 8b shows the real power set point of IBR 1, where its pre-attack value is 5 MW and decreases to 4.5 MW after the attack is launched in an undetected attack case. Figure 8c shows the reactive power set point of IBR 1, where its pre-attack value is 1.5 MVAR and decreases to 1.35 MVAR after the attack is launched in an undetected attack case. With the mitigation method applied, the real power set point is adjusted to 4.9 MW and the reactive power set point is adjusted to 1.63 MVAR. Figure 8r shows the IBR 1 bus voltage, where its pre-attack value is 0.96 pu and decreases to 0.95 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 1 bus voltage results in a voltage of 0.98 pu. Figure 8h shows the real power set point of IBR 4, where its pre-attack value is 3 MW and decreases to 2.7 MW after the attack is launched in an undetected attack case. Figure 8i shows the reactive power set point of IBR 4, where its pre-attack value is 0.9 MVAR and decreases to 0.81 MVAR after the attack is launched in an undetected attack case. With the mitigation method applied, the real power set point is adjusted to 2.94 MW and the reactive power set point is adjusted to 1.01 MVAR. Figure 8l shows the IBR 4 bus voltage, where its pre-attack value is 0.96 pu and decreases to 0.95 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 4 bus voltage results in a voltage of 0.98 pu. Figure 8n shows the real power set point of IBR 12, where its pre-attack value is 5 MW and decreases to 4.5 MW after the attack is launched in an undetected attack case. Figure 8o shows the reactive power set point of IBR 12, where its pre-attack value is 1.5 MVAR and decreases to 1.35 MVAR after the attack is launched in an undetected attack case. With the mitigation method applied, the real power set point is adjusted to 5 MW and the reactive power set point is adjusted to 1.85 MVAR. Figure 8r shows the IBR 12 bus voltage, where its pre-attack value is 0.97 pu and decreases to 0.94 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 12 bus voltage results in a voltage of 0.99 pu. Figure 8t shows the real power set point of IBR 14, where its pre-attack value is 3.5 MW and decreases to 3.15 MW after the attack is launched in an undetected attack case. Figure 8u shows the reactive power set point of IBR 14, where its pre-attack value is 1.05 MVAR and decreases to 0.95 MVAR after the attack is launched in an undetected attack case. With the mitigation method applied, the real power set point is adjusted to 3.29 MW and the reactive power set point is adjusted to 1.09 MVAR. Figure 8x shows the IBR 14 bus voltage, where its pre-attack value is 0.98 pu and decreases to 0.96 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 14 bus voltage results in a voltage of 1 pu. The detection method for IBR 1 takes 50 ms to recognize the attack and an additional 100 ms to mitigate the attack. The detection method for IBR 4 takes 50 ms to recognize the attack and an additional 100 ms to mitigate the attack. The detection method for IBR 12 takes 50 ms to recognize the attack and an additional 125 ms to mitigate the attack. The detection method for IBR 14 takes 75 ms to recognize the attack and an additional 50 ms to mitigate the attack.

5.1.3. Positive Ramp Attack on GFM Inverters

At

t = 0.05

s, a FDI cyberattack is launched by attackers that increases the voltage reference points of IBR 3 from 1 pu to 1.04 pu, IBR 8 from 1 pu to 1.04 pu, and IBR 10 from 1 pu to 1.04 pu over the course of 200 ms. Figure 9a,d,g shows the detection signal with and without mitigation applied and without an attack. Figure 9b shows the voltage reference point of IBR 3, where its pre-attack value is 1 pu and increases to 1.04 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the voltage reference point is adjusted to 1.01 pu. Figure 9c shows the IBR 3 bus voltage, where its pre-attack value is 1.01 pu and increases to 1.05 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 3 bus voltage results in a voltage of 1.02 pu. Figure 9e shows the voltage reference point of IBR 8, where its pre-attack value is 1 pu and increases to 1.04 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the voltage reference point is adjusted to 1.01 pu. Figure 9f shows the IBR 8 bus voltage, where its pre-attack value is 1.01 pu and increases to 1.05 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 8 bus voltage results in a voltage of 1.03 pu. Figure 9h shows the voltage reference point of IBR 10, where its pre-attack value is 1 pu and increases to 1.04 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the voltage reference point is adjusted to 1.02 pu. Figure 9i shows IBR 10 bus voltage, where its pre-attack value is 1.01 pu and increases to 1.05 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 10 bus voltage results in a voltage of 1.02 pu. The detection method for IBR 3 takes 225 ms to recognize the attack and an additional 75 ms to mitigate the attack. The detection method for IBR 8 takes 225 ms to recognize the attack and an additional 75 ms to mitigate the attack. The detection method for IBR 10 takes 225 ms to recognize the attack and an additional 50 ms to mitigate the attack.

5.1.4. Negative Bias Attack on GFM Inverters

At

t = 0.05

s, a FDI cyberattack is launched by attackers that decreases the voltage reference points of IBR 3 from 1 pu to 0.96 pu, IBR 8 from 1 pu to 0.96 pu, and IBR 10 from 1 pu to 0.96 pu. Figure 10a,d,g shows the detection signal with and without mitigation applied and without an attack. Figure 10b shows the voltage reference point of IBR 3, where its pre-attack value is 1 pu and decreases to 0.96 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the voltage reference point is adjusted to 0.991 pu. Figure 10c shows the IBR 3 bus voltage, where its pre-attack value is 0.98 pu and decreases to 0.94 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 3 bus voltage results in a voltage of 0.98 pu. Figure 10e shows the voltage reference point of IBR 8, where its pre-attack value is 1 pu and decreases to 0.96 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the voltage reference point is adjusted to 1 pu. Figure 10f shows the IBR 8 bus voltage, where its pre-attack value is 0.98 pu and decreases to 0.94 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 8 bus voltage results in a voltage of 0.98 pu. Figure 10h shows the voltage reference point of IBR 10, where its pre-attack value is 1 pu and decreases to 0.96 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the voltage reference point is adjusted to 1.01 pu. Figure 10i shows the IBR 10 bus voltage, where its pre-attack value is 0.98 pu and decreases to 0.94 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 10 bus voltage results in a voltage of 0.98 pu. The detection method for IBR 3 takes 225 ms to recognize the attack and an additional 100 ms to mitigate the attack. The detection method for IBR 8 takes 225 ms to recognize the attack and an additional 100 ms to mitigate the attack. The detection method for IBR 10 takes 225 ms to recognize the attack and an additional 125 ms to mitigate the attack.

5.1.5. Positive Ramp Attack on GFL Inverters and Positive Bias Attack on GFM Inverters

At

t = 0.05

s, a FDI cyberattack is launched by attackers that increases the real power set point of IBR 1 from 5 MW to 5.5 MW and IBR 14 from 3.5 MW to 3.85 MW. The reactive power set points are increased for IBR 1 from 2.1 MVAR to 2.31 MVAR and IBR 14 from 1.47 MVAR to 1.62 MVAR over the course of 200 ms. The voltage reference points are increased for IBR 3 from 1 pu to 1.05 pu and IBR 8 from 1 pu to 1.05 pu. Figure 11a,g and Figure 12a,d show the detection signal with and without mitigation applied and without an attack. Figure 11b shows the real power set point of IBR 1, where its pre-attack value is 5 MW and increases to 5.5 MW after the attack is launched in an undetected attack case. Figure 11c shows the reactive power set point of IBR 1, where its pre-attack value is 2.1 MVAR and increases to 2.31 MVAR after the attack is launched in an undetected attack case. With the mitigation method applied, the real power set point is adjusted to 5.2 MW and the reactive power set point is adjusted to 2.12 MVAR. Figure 11f shows the IBR 1 bus voltage, where its pre-attack value is 1.02 pu and increases to 1.06 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 1 bus voltage results in a voltage of 1.03 pu. Figure 11h shows the real power set point of IBR 14, where its pre-attack value is 3.5 MW and increases to 3.85 MW after the attack is launched in an undetected attack case. Figure 11i shows the reactive power set point of IBR 14, where its pre-attack value is 1.47 MVAR and increases to 1.62 MVAR after the attack is launched in an undetected attack case. With the mitigation method applied, the real power set point is adjusted to 3.15 MW and the reactive power set point is adjusted to 0.98 MVAR. Figure 11l shows the IBR 14 bus voltage, where its pre-attack value is 1.04 pu and increases to 1.07 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 14 bus voltage results in a voltage of 1.04 pu. Figure 12b shows the voltage reference point of IBR 3, where its pre-attack value is 1 pu and increases to 1.05 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the voltage reference point is adjusted to 1.04 pu. Figure 12c shows the IBR 3 bus voltage, where its pre-attack value is 1.01 pu and increases to 1.05 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 3 bus voltage results in a voltage of 1.02 pu. Figure 12e shows the voltage reference point of IBR 8, where its pre-attack value is 1 pu and increases to 1.05 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the voltage reference point is adjusted to 1.03 pu. Figure 12f shows the IBR 8 bus voltage, where its pre-attack value is 1.01 pu and increases to 1.05 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 8 bus voltage results in a voltage of 1.02 pu. The detection method for IBR 1 takes 100 ms to recognize the attack and an additional 75 ms to mitigate the attack. The detection method for IBR 14 takes 75 ms to recognize the attack and an additional 250 ms to mitigate the attack. The detection method for IBR 3 takes 125 ms to recognize the attack and an additional 25 ms to mitigate the attack. The detection method for IBR 8 takes 100 ms to recognize the attack and an additional 50 ms to mitigate the attack.

5.1.6. Negative Bias Attack on a GFL Inverter and Negative Ramp Attack on a GFM Inverter

At

t = 0.05

s, a FDI cyberattack is launched by attackers that decreases the real power set point of IBR 1 from 5 MW to 4.5 MW and IBR 12 from 5 MW to 4.5 MW. The reactive power set points are increased for IBR 1 from 1.5 MVAR to 1.35 MVAR and IBR 12 from 1.5 MVAR to 1.35 MVAR. The voltage reference points are decreased for IBR 3 from 1 pu to 0.95 pu and IBR 8 from 1 pu to 0.95 pu over the course of 200 ms. Figure 13a,g and Figure 14a,d show the detection signal with and without mitigation applied and without an attack. Figure 13b shows the real power set point of IBR 1, where its pre-attack value is 5 MW and decreases to 4.5 MW after the attack is launched in an undetected attack case. Figure 13c shows the reactive power set point of IBR 1, where its pre-attack value is 1.5 MVAR and decreases to 1.35 MVAR after the attack is launched in an undetected attack case. With the mitigation method applied, the real power set point is adjusted to 4.9 MW and the reactive power set point is adjusted to 1.65 MVAR. Figure 13f shows the IBR 1 bus voltage, where its pre-attack value is 0.97 pu and decreases to 0.92 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 1 bus voltage results in a voltage of 0.97 pu. Figure 13h shows the real power set point of IBR 12, where its pre-attack value is 5 MW and decreases to 4.5 MW after the attack is launched in an undetected attack case. Figure 13i shows the reactive power set point of IBR 12, where its pre-attack value is 1.5 MVAR and decreases to 1.35 MVAR after the attack is launched in an undetected attack case. With the mitigation method applied, the real power set point is adjusted to 5 MW and the reactive power set point is adjusted to 1.76 MVAR. Figure 13l shows the IBR 12 bus voltage, where its pre-attack value is 0.97 pu and decreases to 0.9 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 12 bus voltage results in a voltage of 0.98 pu. Figure 14b shows the voltage reference point of IBR 3, where its pre-attack value is 1 pu and decreases to 0.95 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the voltage reference point is adjusted to 0.99 pu. Figure 14c shows the IBR 3 bus voltage, where its pre-attack value is 0.96 pu and decreases to 0.92 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the IBR 3 bus voltage results in a voltage of 0.97 pu. Figure 14e shows the voltage reference point of IBR 8, where its pre-attack value is 1 pu and decreases to 0.95 pu after the attack is launched in an undetected attack case. With the mitigation method applied, the voltage reference point is adjusted to 0.99 pu. Figure 14f shows the IBR 8 bus voltage, where its pre-attack value is 0.96 pu and decreases to 0.92 pu after the attack is launched in an undetected attack case.With the mitigation method applied, the IBR 8 bus voltage results in a voltage of 0.97 pu. The detection method for IBR 1 takes 50 ms to recognize the attack and an additional 100 ms to mitigate the attack. The detection method for IBR 12 takes 75 ms to recognize the attack and an additional 125 ms to mitigate the attack. The detection method for IBR 3 takes 50 ms to recognize the attack and an additional 100 ms to mitigate the attack. The detection method for IBR 8 takes 50 ms to recognize the attack and an additional 100 ms to mitigate the attack.

5.1.7. Overview of FDI Attack Cases

The results of the FDI attack cases are compiled into Table 1. The results show a low relative error with the application of the proposed mitigation method. The minimum detection and mitigation times are 50 ms, with two consecutive time steps. The maximum detection and mitigation times are 225 and 250 ms, respectively. The minimum mitigation accuracy in the study cases is 97.917%. In all cases, the proposed method successfully recovers the voltage to below 1.05 and above 0.95 pu.

5.2. Under Grid Transient

This section studies the performance of the proposed method under the grid transients.

5.2.1. Simultaneous Set Point and Load Changes

A load increase of 0.655 MW and 0.658 MVAR is attached to bus 1 at

t = 0.05

s. After 100 ms, the power set points of IBR 1, both real and reactive, are increased to manage the increase in load. Figure 15a shows the detection signal with and without the mitigation method. Figure 15b shows the decrease of bus 1 voltage to 0.96 pu after the load change and the increase to 1 pu following the IBR set point adjustments. Figure 15c,d show the power set points, real and reactive, with and without the mitigation method.

5.2.2. Three-Phase Short-Circuit Fault

A three-phase short circuit occurs on bus 1 at

t = 0.05

s. Figure 16a shows the detection signal with and without the mitigation method. Figure 16b shows the voltage of bus 1 with and without the mitigation method. The detection method ensures that the mitigation method does not take any actions during the fault.

6. Future Work

The proposed detection and mitigation method is tested using the modified CIGRE 14-bus North American grid with seven IBRs in PSCAD. Real-time simulation shows higher fidelity compared to software-based simulations and ensures the implementation of the proposed method in real time. Future work includes testing and validating the proposed method using a real-time testbed. However, simulating a grid with seven inverters requires a powerful real-time simulator with multiple cores, which may not always be available. Testing of the mitigation method found that it takes an average of 12.263 ms to find an action for

P_{s e t}

,

Q_{s e t}

, and

V_{r e f}

. This was done using a PC with an i7 core CPU and 16 GB of RAM. Achieving 25 ms using an off-the-shelf microcontroller can be challenging due to computational complexity. The solution is to decrease the computational complexity of the RL model. Moreover, running the detection and mitigation algorithm for all inverters on one microcontroller can be computationally expensive. One solution to this problem is to use a separate microcontroller for each inverter. Using a larger grid also requires a more powerful real-time simulator capable of modeling a grid with more nodes and inverters.

7. Conclusions

IBRs receive their set point through the communication link, which may expose them to cyber threats. Previous work has developed various techniques to detect and mitigate cyberattacks on IBRs, developing scheme for new inverters being installed in the grid. This work focuses on developing model-free control techniques for already installed IBR in the grid without the need to access IBR internal control parameters. The proposed method is tested for both grid-forming and grid-following inverter control. Separate detection and mitigation algorithms are used to enhance the accuracy of the proposed method. The proposed method is tested using the modified CIGRE 14-bus North American grid with seven IBRs in PSCAD/EMTDC. Finally, the performance of the detection algorithm is tested under grid normal transients, such as set point change, load change, and short-circuit fault, to make sure the proposed detection method does not provide false positives. The proposed method works even when multiple inverters are under cyberattack.

Author Contributions

Writing—original draft: M.B. and B.M.K.; writing—review and editing: A.M.-S. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the National Science Foundation (NSF) under award ECCS-1953213, in part by the State of Virginia’s Commonwealth Cyber Initiative (www.cyberinitiative.org, (accessed on 9 January 2025)) in part by the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) under the Solar Energy Technologies Office Award Number 38637 (UNIFI Consortium led by NREL), and in part by Manitoba Hydro International. The views expressed herein do not necessarily represent the views of the U.S. Department of Energy or the United States Government.

Data Availability Statement

Restrictions apply to the datasets. The datasets presented in this article are not readily available because the data are part of an ongoing research. Requests to access the datasets should be directed to mehrizi@vt.edu.

Conflicts of Interest

The authors declare no conflicts of interest.

References

U.S. Department of Energy (DOE). On the Path to 100% Clean Electricity; U.S. Department of Energy (DOE): Washington, DC, USA, 2023.
Angizeh, F.; Bae, J.; Chen, J.; Klebnikov, A.; Jafari, M.A. Impact Assessment Framework for Grid Integration of Energy Storage Systems and Renewable Energy Sources Toward Clean Energy Transition. IEEE Access 2023, 11, 134995–135005. [Google Scholar] [CrossRef]
Hatziargyriou, N.; Milanovic, J.; Rahmann, C.; Ajjarapu, V.; Canizares, C.; Erlich, I.; Hill, D.; Hiskens, I.; Kamwa, I.; Pal, B.; et al. Definition and Classification of Power System Stability–Revisited & Extended. IEEE Trans. Power Syst. 2021, 36, 3271–3281. [Google Scholar] [CrossRef]
Du, W.; Tuffner, F.K.; Schneider, K.P.; Lasseter, R.H.; Xie, J.; Chen, Z.; Bhattarai, B. Modeling of Grid-Forming and Grid-Following Inverters for Dynamic Simulation of Large-Scale Distribution Systems. IEEE Trans. Power Deliv. 2021, 36, 2035–2045. [Google Scholar] [CrossRef]
Zadehgol-Mohammadi, A.; Baker, M.; Shadmand, M.B. An AI-Based Real-time Intrusion Detection System for Power Electronics-Dominated Grid: Attack on Inverters PQ Set-Points. In Proceedings of the 2023 IEEE Energy Conversion Congress and Exposition (ECCE), Nashville, TN, USA, 29 October–2 November 2023; pp. 1561–1566. [Google Scholar] [CrossRef]
Omitaomu, O.A.; Niu, H. Artificial Intelligence Techniques in Smart Grid: A Survey. Smart Cities 2021, 4, 548–568. [Google Scholar] [CrossRef]
Whitehead, D.; Owens, K.; Gammel, D.; Smith, J. Ukraine cyber-induced power outage Analysis and practical mitigation strategies. In Proceedings of the 70th Annual Conference for Protective Relay Engineers (CPRE), College Station, TX, USA, 3–6 April 2017. [Google Scholar] [CrossRef]
DRAGOS Group Reports. ICS Cybersecurity Year in Review 2020 [Online]; DRAGOS: Hanover, MD, USA, 2024. [Google Scholar]
U.S. Department of Energy. Electric Disturbance Events (OE-417) Annual Summaries [Online]; U.S. Department of Energy: Washington, DC, USA, 2024.
DRAGOS Group Reports. CRASHOVERRIDE Analysis of the Threat to Electric Grid Operations [Online]; DRAGOS: Hanover, MD, USA, 2024. [Google Scholar]
Hasan, M.K.; Habib, A.A.; Shukur, Z.; Ibrahim, F.; Islam, S.; Razzaque, M.A. Review on cyber-physical and cyber-security system in smart grid: Standards, protocols, constraints, and recommendations. J. Netw. Comput. Appl. 2023, 209, 103540. [Google Scholar] [CrossRef]
Liu, X.; Li, H. Data-Driven Cyberphysical Anomaly Detection for Microgrids with GFM Inverters. IEEE Open J. Power Electron. 2023, 4, 498–511. [Google Scholar] [CrossRef]
Wang, Y.; Deng, C.; Liu, Y.; Wei, Z. A cyber-resilient control approach for islanded microgrids under hybrid attacks. Int. J. Electr. Power Energy Syst. 2023, 147, 108889. [Google Scholar] [CrossRef]
Tang, Q.; Deng, C.; Wang, Y.; Guo, F.; Fan, S. Iterative Observer-Based Resilient Control for Energy Storage Systems in Microgrids Under FDI Attacks. IEEE Trans. Smart Grid 2024, 15, 4744–4753. [Google Scholar] [CrossRef]
Jadidi, S.; Badihi, H.; Zhang, Y. Active Cyber-Resilient Control for a PV System at Microgrid Level. In Proceedings of the IEEE 4th International Conference on Renewable Energy and Power Engineering (REPE), Beijing, China, 9–11 October 2021; pp. 339–344. [Google Scholar] [CrossRef]
Mehrizi-Sani, A.; Iravani, R. Online Set Point Modulation to Enhance Microgrid Dynamic Response: Theoretical Foundation. IEEE Trans. Power Syst. 2012, 27, 2167–2174. [Google Scholar] [CrossRef]
Syed, M.H.; Guillo-Sansano, E.; Mehrizi-Sani, A.; Burt, G.M. Prediction Strategies for Smooth Set Point Modulation to Improve Sensitive DER Response. In Proceedings of the IEEE PES General Meeting, Montreal, QC, Canada, 2–6 August 2020. [Google Scholar] [CrossRef]
Beikbabaei, M.; Alexander, B.; Venkataramanan, A.; Mehrizi-Sani, A. Memory-Based Set Point Modulation for Improved Transient Response of Distributed Energy Resources. In Proceedings of the IEEE Industrial Electronics Society Conference (IECON), Chicago, IL, USA, 3–6 November 2024. [Google Scholar]
Buşoniu, L.; de Bruin, T.; Tolić, D.; Kober, J.; Palunko, I. Reinforcement learning for control: Performance, stability, and deep approximators. Annu. Rev. Control 2018, 46, 8–28. [Google Scholar] [CrossRef]
Sahani, N.; Zhu, R.; Cho, J.H.; Liu, C.C. Machine Learning-Based Intrusion Detection for Smart Grid Computing: A Survey. ACM Trans. Cyber-Phys. Syst. 2023, 7, 1–31. [Google Scholar] [CrossRef]
Mtukushe, N.; Onaolapo, A.K.; Aluko, A.; Dorrell, D.G. Review of Cyberattack Implementation, Detection, and Mitigation Methods in Cyber-Physical Systems. Energies 2023, 16, 5206. [Google Scholar] [CrossRef]
Tuyen, N.D.; Quan, N.S.; Linh, V.B.; Van Tuyen, V.; Fujita, G. A Comprehensive Review of Cybersecurity in Inverter-Based Smart Power System Amid the Boom of Renewable Energy. IEEE Access 2022, 10, 35846–35875. [Google Scholar] [CrossRef]
Ortega-Fernandez, I.; Liberati, F. A Review of Denial of Service Attack and Mitigation in the Smart Grid Using Reinforcement Learning. Energies 2023, 16, 635. [Google Scholar] [CrossRef]
Cai, X.; Shi, K.; Sun, Y.; Cao, J.; Wen, S.; Tian, Z. Intelligent Event-Triggered Control Supervised by Mini-Batch Machine Learning and Data Compression Mechanism for T-S Fuzzy NCSs Under DoS Attacks. IEEE Trans. Fuzzy Syst. 2024, 32, 804–815. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, J.; Chen, B. Detecting False Data Injection Attacks in Smart Grids: A Semi-Supervised Deep Learning Approach. IEEE Trans. Smart Grid 2021, 12, 623–634. [Google Scholar] [CrossRef]
Roberts, C.; Ngo, S.T.; Milesi, A.; Peisert, S.; Arnold, D.; Saha, S.; Scaglione, A.; Johnson, N.; Kocheturov, A.; Fradkin, D. Deep Reinforcement Learning for DER Cyber-Attack Mitigation. In Proceedings of the IEEE International Conference onfv Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), Virtual, 11–13 November 2020. [Google Scholar] [CrossRef]
Beikbabaei, M.; Larsen, C.; Mehrizi-Sani, A. Model-Free Cyber-Resilient Coordinated Inverter Control in a Microgrid. IEEE Access 2024, 12, 137790–137804. [Google Scholar] [CrossRef]
Naeem, M.; Rizvi, S.T.H.; Coronato, A. A Gentle Introduction to Reinforcement Learning and its Application in Different Fields. IEEE Access 2020, 8, 209320–209344. [Google Scholar] [CrossRef]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; The MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Cao, D.; Hu, W.; Zhao, J.; Zhang, G.; Zhang, B.; Liu, Z.; Chen, Z.; Blaabjerg, F. Reinforcement Learning and Its Applications in Modern Power and Energy Systems: A Review. J. Mod. Power Syst. Clean Energy 2020, 8, 1029–1042. [Google Scholar] [CrossRef]
Tavares, A.R.; Chaimowicz, L. Tabular Reinforcement Learning in Real-Time Strategy Games via Options. In Proceedings of the IEEE Conference on Computational Intelligence and Games (CIG), Maastricht, The Netherlands, 14–17 August 2018. [Google Scholar] [CrossRef]
Venkataramanan, A.; Mehrizi-Sani, A. Reinforcement Learning–Based Transient Response Shaping for Microgrids. In Proceedings of the Bulk Power Systems Dynamics Furthermore, Control Symposium, Banff, AB, Canada, 25–30 July 2022. [Google Scholar]
Hadavi, S.; Rathnayake, D.B.; Jayasinghe, G.; Mehrizi-Sani, A.; Bahrani, B. A Robust Exciter Controller Design for Synchronous Condensers in Weak Grids. IEEE Trans. Power Syst. 2022, 37, 1857–1867. [Google Scholar] [CrossRef]

Figure 1. Grid-following control loop for (a) conventional control and (b) the proposed cyber resilient method.

Figure 2. Grid-forming control loop for: (a) droop controls, (b) conventional voltage and current control, and (c) the proposed cyber resilient method.

Figure 3. Example training simulation for data collection. (a) IBR 1 real power set point and (b) voltage of bus 1.

Figure 4. Example training simulation for data collection. (a) IBR 1 reactive power set point and (b) Voltage of bus 1.

Figure 5. Example training simulation for data collection. (a) IBR 3 voltage reference point and (b) voltage of bus 3.

Figure 6. Single-line diagram of the 100% inverter-based microgrid.

Figure 7. Case 1: Simulation results for the GFL positive ramp attack. (a) Detection signal of IBR 1, (b) IBR 1 real power set point, (c) IBR 1 reactive power set point, (d) IBR 1 injected real power, (e) IBR 1 injected reactive power, (f) voltage of bus 1, (g) detection signal of IBR 4, (h) IBR 4 real power set point, (i) IBR 4 reactive power set point, (j) IBR 4 injected real power, (k) IBR 4 injected reactive power, (l) voltage of bus 4, (m) detection signal of IBR 12, (n) IBR 12 real power set point, (o) IBR 12 reactive power set point, (p) IBR 12 injected real power, (q) IBR 12 injected reactive power, (r) voltage of bus 12, (s) detection signal of IBR 14, (t) IBR 14 real power set point, (u) IBR 14 reactive power set point, (v) IBR 14 injected real power, (w) IBR 14 injected reactive power, and (x) voltage of bus 14.

Figure 8. Case 2: Simulation results for the GFL negative bias attack. (a) Detection signal of IBR 1, (b) IBR 1 real power set point, (c) IBR 1 reactive power set point, (d) IBR 1 injected real power, (e) IBR 1 injected reactive power, (f) voltage of bus 1, (g) detection signal of IBR 4, (h) IBR 4 real power set point, (i) IBR 4 reactive power set point, (j) IBR 4 injected real power, (k) IBR 4 injected reactive power, (l) voltage of bus 4, (m) detection signal of IBR 12, (n) IBR 12 real power set point, (o) IBR 12 reactive power set point, (p) IBR 12 injected real power, (q) IBR 12 injected reactive power, (r) voltage of bus 12, (s) detection signal of IBR 14, (t) IBR 14 real power set point, (u) IBR 14 reactive power set point, (v) IBR 14 injected real power, (w) IBR 14 injected reactive power, and (x) voltage of bus 14.

Figure 9. Case 3: Simulation results for the GFM positive ramp attack: (a) detection signal of IBR 3, (b) IBR 3 voltage reference point, (c) voltage of bus 3, (d) detection signal of IBR 8, (e) IBR 8 voltage reference point, (f) voltage of bus 8, (g) detection signal of IBR 10, (h) IBR 10 voltage reference point, and (i) voltage of bus 10.

Figure 10. Case 4: Simulation results for the GFM negative bias attack. (a) Detection signal of IBR 3, (b) IBR 3 voltage reference point, (c) voltage of bus 3, (d) detection signal of IBR 8, (e) IBR 8 voltage reference point, (f) voltage of bus 8, (g) detection signal of IBR 10, (h) IBR 10 voltage reference point, and (i) voltage of bus 10.

Figure 11. Case 5: Simulation results for the positive ramp attack on GFL inverters and positive bias attack on GFM inverters. (a) Detection signal of IBR 1, (b) IBR 1 real power set point, (c) IBR 1 reactive power set point, (d) IBR 1 injected real power, (e) IBR 1 injected reactive power, (f) voltage of bus 1, (g) detection signal of IBR 14, (h) IBR 14 real power set point, (i) IBR 14 reactive power set point, (j) IBR 14 injected real power, (k) IBR 14 injected reactive power, and (l) voltage of bus 14.

Figure 12. Case 5: Simulation results for the positive ramp attack on GFL inverters and positive bias attack on GFM inverters. (a) Detection signal of IBR 3, (b) IBR 3 voltage reference point, (c) voltage of bus 3, (d) detection signal of IBR 8, (e) IBR 8 voltage reference point, and (f) voltage of bus 8.

Figure 13. Case 6: Simulation results for the negative bias attack on GFL inverters and negative ramp attack on GFM inverters. (a) Detection signal of IBR 1, (b) IBR 1 real power set point, (c) IBR 1 reactive power set point, (d) IBR 1 injected real power, (e) IBR 1 injected reactive power, (f) voltage of bus 1, (g) detection signal of IBR 12, (h) IBR 12 real power set point, (i) IBR 12 reactive power set point, (j) IBR 12 injected real power, (k) IBR 12 injected reactive power, and (l) voltage of bus 12.

Figure 14. Case 6: Simulation results for the negative bias attack on GFL inverters and negative ramp attack on GFM inverters. (a) Detection signal of IBR 3, (b) IBR 3 voltage reference point, (c) voltage of bus 3, (d) detection signal of IBR 8, (e) IBR 8 voltage reference point, and (f) voltage of bus 8.

Figure 15. Case 1: Simulation results for the load and Q set point test. (a) Detection signal of IBR 1, (b) IBR 1 real power set point, (c) IBR 1 reactive power set point, (d) IBR 1 injected real power, (e) IBR 1 injected reactive power, and (f) voltage of bus 1.

Figure 16. Case 2: Simulation results for the short circuit test. (a) Detection signal of IBR 1, and (b) voltage of bus 1.

Table 1. Results of all FDI attack cases.

Attack Case	Inverter	Detection Time (ms)	Mitigation Time (ms)	Mitigation Accuracy (%)
GFL positive ramp attack	IBR 12 IBR 14	100 100	125 100	99.029 100
GFL negative bias attack	IBR 1 IBR 4 IBR 12 IBR 14	50 50 50 75	100 100 125 50	97.917 97.917 97.938 97.959
GFM positive ramp attack	IBR 3 IBR 8 IBR 10	225 225 225	75 75 50	99.01 98.02 99.01
GFM negative bias attack	IBR 3 IBR 8 IBR 10	225 225 225	100 100 125	100 100 100
GFL positive ramp attack and GFM positive bias attack	IBR 1 IBR 3 IBR 8 IBR 14	100 125 100 75	75 25 50 250	99.02 99.01 99.01 100
GFL negative bias attack and GFM negative ramp attack	IBR 1 IBR 3 IBR 8 IBR 12	50 50 50 75	100 100 100 125	100 98.958 98.958 98.969

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Beikbabaei, M.; Kwiatkowski, B.M.; Mehrizi-Sani, A. Model-Free Resilient Grid-Forming and Grid-Following Inverter Control Against Cyberattacks Using Reinforcement Learning. Electronics 2025, 14, 288. https://doi.org/10.3390/electronics14020288

AMA Style

Beikbabaei M, Kwiatkowski BM, Mehrizi-Sani A. Model-Free Resilient Grid-Forming and Grid-Following Inverter Control Against Cyberattacks Using Reinforcement Learning. Electronics. 2025; 14(2):288. https://doi.org/10.3390/electronics14020288

Chicago/Turabian Style

Beikbabaei, Milad, Brian Michael Kwiatkowski, and Ali Mehrizi-Sani. 2025. "Model-Free Resilient Grid-Forming and Grid-Following Inverter Control Against Cyberattacks Using Reinforcement Learning" Electronics 14, no. 2: 288. https://doi.org/10.3390/electronics14020288

APA Style

Beikbabaei, M., Kwiatkowski, B. M., & Mehrizi-Sani, A. (2025). Model-Free Resilient Grid-Forming and Grid-Following Inverter Control Against Cyberattacks Using Reinforcement Learning. Electronics, 14(2), 288. https://doi.org/10.3390/electronics14020288

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Model-Free Resilient Grid-Forming and Grid-Following Inverter Control Against Cyberattacks Using Reinforcement Learning

Abstract

1. Introduction

2. Inverter Control

2.1. Grid-Following Control

2.2. Model-Free RL-Based Grid-Following Control

2.3. Conventional Grid-Forming Control

2.4. Model-Free RL-Based Grid-Forming Control

3. Detection and Mitigation Method

3.1. Reinforcement Learning (RL) Basics

3.2. RL Implementation

3.2.1. State s

3.2.2. Action a

3.2.3. Reward Function R

3.2.4. Training

3.2.5. Agent

3.2.6. Libraries Used

3.3. Detection and Mitigation Algorithm Implementation

4. Implementation Test System

5. Performance Evaluation

5.1. FDI Attack Cases

5.1.1. Positive Ramp Attack of GFL Inverters

5.1.2. Negative Bias Attack of GFL Inverters

5.1.3. Positive Ramp Attack on GFM Inverters

5.1.4. Negative Bias Attack on GFM Inverters

5.1.5. Positive Ramp Attack on GFL Inverters and Positive Bias Attack on GFM Inverters

5.1.6. Negative Bias Attack on a GFL Inverter and Negative Ramp Attack on a GFM Inverter

5.1.7. Overview of FDI Attack Cases

5.2. Under Grid Transient

5.2.1. Simultaneous Set Point and Load Changes

5.2.2. Three-Phase Short-Circuit Fault

6. Future Work

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI