Perspectives on Soft Actor–Critic (SAC)-Aided Operational Control Strategies for Modern Power Systems with Growing Stochastics and Dynamics
Abstract
:1. Introduction
2. Principles of MDP and Reinforcement Learning
2.1. Markov Decision Process
2.2. Principles of RL
Algorithm 1. Algorithm for training the soft actor–critic (SAC) agent for power flow control. |
1. Initialize the weights of neural networks, θ and ϕ, for the policy and value function , respectively; initialize weights and for the two functions; initialize replay buffer ; set up training environment, env |
2. for: k = 1, 2, … (k is the counter of episodes for training) |
3. for: t = 1, 2, … (t stands for control iteration) |
4. reset environment env. reset() 5. obtain states and actions |
6. apply action a and obtain the next states , reward value and termination signal done |
7. store tuple < in |
8. |
9. if satisfying policy updating conditions, conduct |
10. for a required number of policy updates, conduct |
11. randomly sample from |
12. update Q function, : |
13. update value function |
14. update policy network 15. update target network 16. update temperature coefficient, α |
3. Proposed RL-Based Real-Time Control Framework
3.1. Control Objectives and Constraints
3.2. Overall Flowchart of Training RL Agents for Power System Operation
- (1)
- The RL agent reaches the maximum control iteration;
- (2)
- Power flow diverges;
- (3)
- The RL agent’s action successfully meets the desired control performance goal.
3.3. Design of Episode, Reward, State Space, and Action Space
3.3.1. Episode
3.3.2. Reward
3.3.3. State Space
3.3.4. Action Space
3.4. Implementation of the Proposed RL-Based Control Framework
4. Case Studies
4.1. Corrective Voltage Control
4.1.1. Case Study 1
4.1.2. Case Study 2
4.2. Corrective Line Flow Control
4.3. Discussion
- (1)
- Given the importance of valid samples when training good RL agents, it is essential to periodically include new samples in the SAC agent training, from either a real-time EMS system or planning cases, considering various types of disturbances that can capture major changes in the power system. From the authors’ experience, it is a good practice to update the model training daily.
- (2)
- Once the SAC agent is trained with satisfactory performance, adoption of the agent for use in real time is rapid, typically within dozens of ms. However, it is very important to ensure the quality of the input samples when constructing the state space of the SAC agent before obtaining control strategies.
- (3)
- The proposed method in this paper mainly tackles the regulation of system voltage violation, line overloading, and system losses. When extending this approach towards more complicated control tasks in daily operation with different objectives (sometimes with conflict), using multiple RL agents can be a good research direction, considering the tradeoff among different control objectives.
5. Conclusions and Future Work
- (1)
- It is important to have high-fidelity power system simulators to accurately capture the system behavior before and after each control action, providing a reliable environment for training the agents.
- (2)
- It is also important to include large-scale representative operating conditions in the power system in the form of samples that cover the feature space more evenly so that RL agents can learn directly from interacting with these samples, especially those with operational risks.
- (3)
- Design of reward functions plays an important role in the effectiveness and efficiency of RL agent training in performing specific control tasks. Hyperparameter tuning is always a good practice to ensure better control performance of the agents.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Standard TPL-001-4; Transmission System Planning Performance Requirements. NERC: Abuja, Nigeria, 2014.
- Kamwa, I.; Grondin, R.; Hebert, Y. Wide-area measurement based stabilizing control of large power systems-a decentralized/hierarchical approach. IEEE Trans. Power Syst. 2001, 16, 136–153. [Google Scholar] [CrossRef]
- Paul, J.P.; Leost, J.Y.; Tesseron, J.M. Survey of the secondary voltage control in France: Present realization and investigations. IEEE Trans. Power Syst. 1987, 2, 505–511. [Google Scholar] [CrossRef]
- Corsi, S.; Pozzi, M.; Sabelli, C.; Serrani, A. The coordinated automatic voltage control of the Italian transmission grid—Part I: Reasons of the choice and overview of the consolidated hierarchical system. IEEE Trans. Power Syst. 2004, 19, 1723–1732. [Google Scholar] [CrossRef]
- Corsi, S.; Pozzi, M.; Sforna, M.; Dell’Olio, G. The coordinated automatic voltage control of the Italian transmission grid—Part II: Control apparatuses and field performance of the consolidated hierarchical system. IEEE Trans. Power Syst. 2004, 19, 1733–1741. [Google Scholar] [CrossRef]
- Sun, H.; Guo, Q.; Zhang, B.; Wu, W.; Wang, B. An adaptive zone-division-based automatic voltage control system with applications in China. IEEE Trans. Power Syst. 2013, 28, 1816–1828. [Google Scholar] [CrossRef]
- Sun, H.; Zhang, B. A systematic analytical method for quasi-steady-state sensitivity. Electr. Power Syst. Res. 2002, 63, 141–147. [Google Scholar] [CrossRef]
- Sun, H.; Guo, Q.; Zhang, B.; Wu, W.; Tong, J. Development and applications of the system-wide automatic voltage control system in China. In Proceedings of the IEEE PES General Meeting, Calgary, AB, Canada, 26–30 July 2009. [Google Scholar]
- Guo, R.; Chiang, H.; Wu, H.; Li, K.; Deng, Y. A two-level system-wide automatic voltage control system. In Proceedings of the IEEE PES General Meeting, San Diego, CA, USA, 22–26 July 2012. [Google Scholar]
- Shi, B.; Wu, C.; Sun, W.; Bao, W.; Guo, R. A practical two-level automatic voltage control system: Design and field experience. In Proceedings of the International Conference on Power System Technologies, Guangzhou, China, 6–8 November 2018. [Google Scholar]
- Duan, J.; Xu, H.; Liu, W. Q-learning-based damping control of wide-area power systems under cyber uncertainties. IEEE Trans. Smart Grid 2018, 9, 6408–6418. [Google Scholar] [CrossRef]
- Liu, X.; Konstantinou, C. Reinforcement learning for cyber-physical security assessment of power systems. In Proceedings of the 2019 IEEE Milan PowerTech Conference, Milan, Italy, 23–27 June 2019. [Google Scholar]
- Yan, Z.; Xu, Y. Data-driven load frequency control for stochastic power systems: A deep reinforcement learning method with continuous action search. IEEE Trans. Power Syst. 2019, 34, 1653–1656. [Google Scholar] [CrossRef]
- Feng, C.; Zhang, J. Reinforcement learning based dynamic model selection for short-term load forecasting. In Proceedings of the 2019 IEEE PES ISGT Conference, Washington, DC, USA, 18–21 February 2019. [Google Scholar]
- Dai, P.; Yu, W.; Wen, G.; Baldi, S. Distributed reinforcement learning algorithm for dynamic economic dispatch with unknown generation cost functions. IEEE Trans. Ind. Inform. 2020, 16, 2256–2267. [Google Scholar] [CrossRef]
- Huang, Q.; Huang, R.; Hao, W.; Tan, J.; Fan, R.; Huang, Z. Adaptive power system emergency control using deep reinforcement learning. IEEE Trans. Smart Grid 2020, 11, 1171–1182. [Google Scholar] [CrossRef]
- Lan, T.; Duan, J.; Zhang, B.; Shi, D.; Wang, Z.; Diao, R.; Zhang, X. AI-based autonomous line flow control via topology adjustment for maximizing time-series ATCs. In Proceedings of the IEEE PES General Meeting, Montreal, QC, Canada, 2–6 August 2020. [Google Scholar]
- Diao, R.; Wang, Z.; Shi, D.; Chang, Q.; Duan, J.; Zhang, X. Autonomous voltage control for grid operation using deep reinforcement learning. In Proceedings of the IEEE PES General Meeting, Atlanta, GA, USA, 4–8 August 2019. [Google Scholar]
- Duan, J.; Shi, D.; Diao, R.; Li, H.; Wang, Z.; Zhang, B.; Bian, D.; Yi, Z. Deep-reinforcement-learning-based autonomous voltage control for power grid operations. IEEE Trans. Power Syst. 2020, 35, 814–817. [Google Scholar] [CrossRef]
- Zimmerman, R.D.; Sanchez, C.E.; Thomas, R.J. MATPOWER: Steady-state operations, planning, and analysis tools for power systems research and education. IEEE Trans. Power Syst. 2010, 26, 12–19. [Google Scholar] [CrossRef]
- Xu, T.; Birchfield, A.B.; Overbye, T.J. Modeling, tuning and validating system dynamics in synthetic electric grids. IEEE Trans. Power Syst. 2018, 33, 6501–6509. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Abbee, P.; Levine, S. Soft actor-critic: Off policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018. [Google Scholar]
- Lee, W.; Kim, H. Deep Reinforcement Learning-Based Dynamic Droop Control Strategy for Real-Time Optimal Operation and Frequency Regulation. IEEE Trans. Sustain. Energy 2025, 16, 284–294. [Google Scholar] [CrossRef]
- Wu, Z.; Zhang, M.; Gao, S.; Wu, Z.G.; Guan, X. Physics-Informed Reinforcement Learning for Real-Time Optimal Power Flow with Renewable Energy Resources. IEEE Trans. Sustain. Energy 2025, 16, 216–226. [Google Scholar] [CrossRef]
- Hu, B.; Gong, Y.; Liang, X. Safe Deep Reinforcement Learning-Based Real-Time Multi-Energy Management in Combined Heat and Power Microgrids. IEEE Access 2024, 12, 193581–193593. [Google Scholar] [CrossRef]
- Belyakov, B.; Sizykh, D. Adaptive Algorithm for Selecting the Optimal Trading Strategy Based on Reinforcement Learning for Managing a Hedge Fund. IEEE Access 2024, 12, 189047–189063. [Google Scholar] [CrossRef]
- Hou, S.; Fu, A.; Duque, E.; Palensky, P.; Chen, Q.; Vergara, P.P. DistFlow Safe Reinforcement Learning Algorithm for Voltage Magnitude Regulation in Distribution Networks. J. Mod. Power Syst. Clean Energy 2024, 1–12. [Google Scholar]
- Vora, K.; Liu, S.; Dhulipati, H. Deep Reinforcement Learning Based MPPT Control for Grid Connected PV System. In Proceedings of the 2024 IEEE 7th International Conference on Industrial Cyber-Physical Systems (ICPS), St. Louis, MO, USA,, 12–15 May 2024. [Google Scholar]
- Liao, J.; Lin, J. A Distributed Deep Reinforcement Learning Approach for Reactive Power Optimization of Distribution Networks. IEEE Access 2024, 12, 113898–113909. [Google Scholar] [CrossRef]
- Gan, J.; Li, S.; Lin, X.; Tang, X. Multi-Agent Deep Reinforcement Learning-Based Multi-Objective Cooperative Control Strategy for Hybrid Electric Vehicles. IEEE Trans. Veh. Technol. 2024, 73, 11123–11135. [Google Scholar] [CrossRef]
Voltage Control | Line Flow Control | |
---|---|---|
Control Objectives | ||
Corrective Actions | minimum reactive power control actions: min | minimum active power control actions: min |
Loss Minimization | as an objective function: min | |
Constraints Modeled | ||
AC Power Flow Constraints | where Pij and Qij are active power and reactive power on branches, respectively | |
Generation Limits | ||
Voltage Limits | ||
Transmission Line Limits |
Objective | State Space | Action Space | Reward |
---|---|---|---|
voltage security | bus voltage magnitudes, phase angles, active power on lines, reactive power on lines, controller status, controller settings | generator terminal setting, shunt elements, transformer tap changing, flexible alternating current transmission system (FACTS) devices | penalize voltage violations and/or total amount of control , where |
voltage security + loss reduction | bus voltage magnitudes, phase angles, active power on lines, reactive power on lines, controller status, controller settings | generator terminal setting, shunt elements, transformer tap changing, FACTS devices | penalize voltage violations, transission losses and/or total amount of control if delta_p_loss < 0: else if delta_p_loss ≥ 0.02 else: where p_loss is the present transmission loss value and p_loss_pre is the line loss at the base case |
line flow | bus voltage magnitudes, phase angles, active power on lines, reactive power on lines, controller status, controller settings | generator active power, controllable load | penalize line flow violations and/or the total amount of control where is a parameter and is the amount of control action of the generator |
line flow + loss reduction | bus voltage magnitudes, phase angles, active power on lines, reactive power on lines, controller status, controller settings | generator active power, controllable load | penalize line flow violations, transmission losses, and/or the total amount of control
where is a parameter and is the amount of control action of generator |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, J.; Guo, Q.; Zhang, J.; Diao, R.; Xu, G. Perspectives on Soft Actor–Critic (SAC)-Aided Operational Control Strategies for Modern Power Systems with Growing Stochastics and Dynamics. Appl. Sci. 2025, 15, 900. https://doi.org/10.3390/app15020900
Liu J, Guo Q, Zhang J, Diao R, Xu G. Perspectives on Soft Actor–Critic (SAC)-Aided Operational Control Strategies for Modern Power Systems with Growing Stochastics and Dynamics. Applied Sciences. 2025; 15(2):900. https://doi.org/10.3390/app15020900
Chicago/Turabian StyleLiu, Jinbo, Qinglai Guo, Jing Zhang, Ruisheng Diao, and Guangjun Xu. 2025. "Perspectives on Soft Actor–Critic (SAC)-Aided Operational Control Strategies for Modern Power Systems with Growing Stochastics and Dynamics" Applied Sciences 15, no. 2: 900. https://doi.org/10.3390/app15020900
APA StyleLiu, J., Guo, Q., Zhang, J., Diao, R., & Xu, G. (2025). Perspectives on Soft Actor–Critic (SAC)-Aided Operational Control Strategies for Modern Power Systems with Growing Stochastics and Dynamics. Applied Sciences, 15(2), 900. https://doi.org/10.3390/app15020900