An Optimal Scheduling Method for Power Grids in Extreme Scenarios Based on an Information-Fusion MADDPG Algorithm
Abstract
1. Introduction
- An MADDPG algorithm based on the integration of specific information is proposed, in which both the policy network and the value network adopt lightweight MLP modules. By incorporating structured experiential knowledge into the input of the value network and introducing decaying Gaussian noise to the state and action spaces, the robustness of the value network input is enhanced, thereby improving the stability of model training.
- A hybrid prioritized experience sampling strategy is designed, which integrates hierarchical sampling to extract experience samples across different time steps, while combining them with the most recent experience samples to construct fused experience data. This ensures that the model can promptly learn from and adapt to recent environmental changes or policy updates. In addition, reference experiences obtained from coach-guided training are inserted into the replay buffer after multiple training iterations, further enhancing the learning efficiency of the model.
2. Framework for Voltage-Security-Oriented Optimal Dispatch of the Power Grid
3. Method for Constructing an Extreme Scenario Dataset with Safety Boundaries Considered
4. Power Grid Optimal Dispatch Model Based on the Improved MADDPG Algorithm
4.1. Mathematical Model of Resource-Interactive Dispatch for Voltage-Security-Oriented Optimal Scheduling
4.1.1. Objective Function
- (1)
- Network loss cost:
- (2)
- Voltage violation cost:
4.1.2. Constraints
- (1)
- Power Balance Constraints:
- (2)
- DPV Power Constraints
- (3)
- DWP Power Constraints
- (4)
- MT Power Constraints
- (5)
- SVC Power Constraints
4.2. Multi-Agent Interactive Dispatch Model for Distribution Network Resources
- (1)
- State Space
- (2)
- Action Space
- (3)
- Reward Function
4.3. Hybrid Prioritized Experience Sampling Strategy
4.4. Information-Fusion MADDPG Algorithm
5. Case Study
5.1. Data Basis and Algorithm Parameters
5.2. Model Training Comparison
5.3. Analysis of Model Testing Results
6. Conclusions
- Guided by a scenario dissimilarity maximization principle, extreme scenarios with the greatest dissimilarity within each cluster are selected and combined with the cluster center data to construct an extreme scenario dataset suitable for optimal dispatch. The datasets thus obtained maintain similar overall trends while exhibiting pronounced differences, thereby supplying maximized incremental information for subsequent optimization tasks.
- The proposed improved MADDPG algorithm surpasses conventional MADDPG, GRPO, and other counterparts in training stability and efficiency. Compared with benchmark algorithms, it converges at least 50 steps earlier and achieves an overall performance gain of approximately 5%, thereby minimizing training resource consumption and demonstrating better suitability for large-scale optimization dispatch problems.
- In terms of voltage compliance rate, average voltage deviation, and average network loss, the improved MADDPG outperforms traditional MADDPG, GRPO, and other RL algorithms; it also surpasses the traditional optimization method on average network loss and approaches it on average voltage deviation and network loss. Moreover, the combined time for model training and decision making is far lower than that of the optimization approach, and the voltage distribution obtained by the improved MADDPG is more compact. These results indicate that the improved MADDPG possesses significant advantages for voltage-security-oriented optimal dispatch, combining the adaptability of reinforcement learning with the accuracy of optimization methods.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Wu, L.; Chen, C.; Hu, J.; Wang, C.; Tong, Y. User-Side Resource Applications and Key Technologies Supporting the Flexibility Requirements of New-Energy Power Systems. Power Syst. Technol. 2024, 48, 1435–1450. [Google Scholar]
- Bie, C.; Li, G. Risk Assessment and Resilience Enhancement of New-Type Power Systems under Extreme Weather Conditions. Glob. Energy Interconnect. 2024, 7, 1–2. [Google Scholar] [CrossRef]
- Zhou, Y.; Zhao, Y.; Ma, Z. Resilience analysis and improvement strategy of microgrid system considering new energy connection. PLoS ONE 2024, 19, e0301910. [Google Scholar]
- Liu, W.; Liu, J.; Wan, H.; Wang, Y.; Zhang, S.; Feng, W.; Yang, T. Full-Scenario Risk Assessment for New Power System Planning Scheme Facing Multiple Types of Extreme Weather. Autom. Electr. Power Syst. 2025, 49, 65–78. [Google Scholar]
- Cheng, L.; Peng, P.; Huang, P.; Zhang, M.; Meng, X.; Lu, W. Leveraging evolutionary game theory for cleaner production: Strategic insights for sustainable energy markets, electric vehicles, and carbon trading. J. Clean. Prod. 2025, 512, 145682. [Google Scholar] [CrossRef]
- Cheng, L.; Yu, F.; Huang, P.; Liu, G.; Zhang, M.; Sun, R. Game-theoretic evolution in renewable energy systems: Advancing sustainable energy management and decision optimization in decentralized power markets. Renew. Sustain. Energy Rev. 2025, 217, 115776. [Google Scholar] [CrossRef]
- Wei, B.; Qiao, S.; Meng, R.; Li, J. Two-Stage Robust Optimization Dispatch of Microgrids Based on Data-Driven Uncertainty Sets. High Volt. Eng. 2025, 51, 852–863. [Google Scholar] [CrossRef]
- Yang, M.; Liu, Y.; Guo, L.; Zhang, Y.; Wang, Z.; Wang, C. Data-Driven Security Region-Based Multi-Level Distributed Economic Dispatch Method for Transmission-Distribution-Microgrid. Chinese Society for Electrical Engineering, 30 March 2025; 1–17. [Google Scholar]
- Huang, J.; Luo, Z.; Li, X.; Zhou, R. System Optimization Dispatch Considering Internal Operating Scenarios and Network Losses of Virtual Power Plants. Power System Technology, 30 March 2025; 1–8. [Google Scholar]
- Aguilar, D.; Quinones, J.J.; Pineda, L.R.; Ostanek, J.; Castillo, L. Optimal scheduling of renewable energy microgrids: A robust multi-objective approach with machine learning-based probabilistic forecasting. Appl. Energy 2024, 369, 123548. [Google Scholar] [CrossRef]
- Morais, H.; Kádár, P.; Faria, P.; Vale, Z.A.; Khodr, H.M. Optimal scheduling of a renewable micro-grid in an isolated load area using mixed-integer linear programming. Renew. Energy 2010, 35, 151–156. [Google Scholar] [CrossRef]
- Li, X.; Ma, X.; Zhao, T. Energy Optimization Dispatch of Photovoltaic and Energy Storage Two-Phase Access Traction Power Supply System Considering Regenerative Braking Energy Uncertainty. High Voltage Engineering, 30 March 2025; 1–13. [Google Scholar]
- Güven, A.F.; Yörükeren, N.U.R.A.N.; Tag-Eldin, E.; Samy, M.M. Multi-objective optimization of an islanded green energy system utilizing sophisticated hybrid metaheuristic approach. IEEE Access 2023, 11, 103044–103068. [Google Scholar] [CrossRef]
- Xu, Z.; Gong, Y.; Zhou, Y.; Bao, Q.; Qian, W. Enhancing kubernetes automated scheduling with deep learning and reinforcement techniques for large-scale cloud computing optimization. In Proceedings of the Ninth International Symposium on Advances in Electrical, Electronics, and Computer Engineering (ISAEECE 2024), Changchun, China, 16 October 2024; Volume 13291, pp. 1595–1600. [Google Scholar]
- Cheng, L.; Wei, X.; Li, M.; Tan, C.; Yin, M.; Shen, T.; Zou, T. Integrating Evolutionary Game-Theoretical Methods and Deep Reinforcement Learning for Adaptive Strategy Optimization in User-Side Electricity Markets: A Comprehensive Review. Mathematics 2024, 12, 3241. [Google Scholar] [CrossRef]
- Zhu, Z.; Zhang, X.; Chen, H. Voltage Control Method for Distribution Networks with Intelligent Soft Switches Based on Deep Reinforcement Learning. High Volt. Eng. 2024, 50, 1214–1224. [Google Scholar]
- Feng, C.; Tang, F.; Wang, G.; Wen, F.; Zhang, Y. Voltage Control of Distribution Networks Based on Fusion Experience Safety Reinforcement Learning. Automation of Electric Power Systems, 30 March 2025; 1–12. [Google Scholar]
- Gao, F.; Yao, H.; Gao, Q.; Ying, L.; Cai, Y.; Jin, Y.; Pan, Y. Deep Reinforcement Learning-Based Two-Stage Distributed Power Source Optimization Considering Parameter Sharing. Chinese Society for Electrical Engineering, 30 March 2025; 1–18. [Google Scholar]
- Hua, X.; Kai, L.; Jingbiao, Q.; Zhang, P.; Wang, Z.; Lu, X. Source-Grid-Load-Storage Collaborative Optimization Dispatch Based on Generative Adversarial Network Modification. Proc. Chin. Soc. Electr. Eng. 2025, 45, 1668–1680. [Google Scholar]
- Yang, Y.; Lu, X.; Zhang, L.; Zhou, S.; Pei, W. Reinforcement Learning-Based Power Grid Dispatch Method Considering Agent Pre-States and Environmental Feature Adaptation Mechanism. High Volt. Eng. 2024, 50, 3497–3509. [Google Scholar]
- Zhu, J.; Xu, S.; Li, B.; Wang, Y.; Wang, Y.; Yu, L.; Xiong, X.; Wang, C. Real-Time Dispatch of New Power Systems Based on Power Grid Expert Strategy Imitation Learning. Power Syst. Technol. 2023, 47, 517–530. [Google Scholar]
- Xu, S.; Zhu, J.; Li, B.; Yu, L.; Zhu, X.; Jia, H.; Chung, C.Y.; Booth, C.D.; Terzija, V. Real-time power system dispatch scheme using grid expert strategy-based imitation learning. Int. J. Electr. Power Energy Syst. 2024, 161, 110148. [Google Scholar] [CrossRef]
- Wang, K.; Wan, X.; Wang, J.; Li, Y.; Xu, Y.; ASAD WAQAR. A Review and Outlook on the Model-Data-Knowledge Fusion Method for Power Grid Optimization Dispatch. Proc. Chin. Soc. Electr. Eng. 2024, 44 (Suppl. S1), 131–145. [Google Scholar]
- Dou, X.; Deng, Y.; Wang, S.; Chu, T.; Li, J.; Zhou, J.; Li, C. Reactive Power Coordination Optimization Dispatch of Distribution Networks Based on an Improved Multi-Agent Deep Deterministic Policy Gradient Algorithm. Electric Power Automation Equipment, 10 June 2025; 1–23. [Google Scholar]
- Dou, X.; Niu, P.; Zheng, Y.; Feng, S.; Shi, F.; Yang, H. Method for Generating Extreme-Form Load Scenarios Based on an Improved Diffusion Model. Chinese Society for Electrical Engineering, 9 April 2025; 1–18. [Google Scholar]
- Shao, Z.; Wang, P.; Zhu, Q.; Xu, R.; Song, J.; Bi, X.; Zhang, H.; Zhang, M.; Li, Y.K.; Wu, Y.; et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv 2024, arXiv:2402.03300. [Google Scholar]
Controllable Resource Type | Location | Capacity Configuration |
---|---|---|
MT | Bus 13 | 200 MW |
Bus 21 | 200 MW | |
DPV | Bus 15 | 100 MW |
Bus 19 | 100 MW | |
Bus 29 | 100 MW | |
DWP | Bus 5 | 100 MW |
Bus 17 | 100 MW | |
Bus 26 | 100 MW | |
SVC | Bus 8 | 100 Mvar |
Bus 18 | 100 Mvar | |
Bus 19 | 100 Mvar |
Parameter Type | Value |
---|---|
44 | |
1000 | |
500 | |
5 |
Parameter Type | Actor Network | Critic Network |
---|---|---|
Learning Rate | 0.0001 | 0.0001 |
Number of Network Layers | 3 | 3 |
Neurons per Layer | 256 | 256 |
Soft Update Coefficient for Target Network | 0.15 | 0.15 |
Discount Factor | 0.95 | 0.95 |
Number of Agents | 11 | |
Batch Size | 1024 | |
Training Steps | 144 | |
Initial Sampling Ratio | 0.99 | |
Sampling Ratio Decay Rate | 0.5 |
Algorithm | Average Network Loss | Average Voltage Deviation/p.u | Voltage Compliance Rate/% |
---|---|---|---|
Optimization Algorithm | 0.6945 | 0.0266 | 100% |
MASAC | 1.2855 | 0.0367 | 89.27% |
GRPO | 1.3487 | 0.0387 | 81.31% |
DDPG | 0.7517 | 0.0270 | 99.24% |
MADDPG | 0.9085 | 0.0276 | 98.36% |
Proposed | 0.6813 | 0.0268 | 99.98% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dou, X.; Li, C.; Niu, P.; Sun, D.; Zhang, Q.; Dou, Z. An Optimal Scheduling Method for Power Grids in Extreme Scenarios Based on an Information-Fusion MADDPG Algorithm. Mathematics 2025, 13, 3168. https://doi.org/10.3390/math13193168
Dou X, Li C, Niu P, Sun D, Zhang Q, Dou Z. An Optimal Scheduling Method for Power Grids in Extreme Scenarios Based on an Information-Fusion MADDPG Algorithm. Mathematics. 2025; 13(19):3168. https://doi.org/10.3390/math13193168
Chicago/Turabian StyleDou, Xun, Cheng Li, Pengyi Niu, Dongmei Sun, Quanling Zhang, and Zhenlan Dou. 2025. "An Optimal Scheduling Method for Power Grids in Extreme Scenarios Based on an Information-Fusion MADDPG Algorithm" Mathematics 13, no. 19: 3168. https://doi.org/10.3390/math13193168
APA StyleDou, X., Li, C., Niu, P., Sun, D., Zhang, Q., & Dou, Z. (2025). An Optimal Scheduling Method for Power Grids in Extreme Scenarios Based on an Information-Fusion MADDPG Algorithm. Mathematics, 13(19), 3168. https://doi.org/10.3390/math13193168