Coordinated Control Optimization of Nuclear Steam Supply Systems via Multi-Agent Reinforcement Learning
Abstract
:1. Introduction
2. Problem Formulation
2.1. NSSS Multi-Agent Control Optimization Framework
2.2. Control Optimization Problem Formulation
3. Method
3.1. MADDPG-DEB Algorithm
Algorithm 1 MADDPG-DEB for NSSS Control Optimization |
|
3.2. Observations, Actions, and Rewards
4. Simulation Results
4.1. Training Process
4.2. Training Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
DDP | Distributed Data Parallel |
DDPG | Deep deterministic policy gradient |
DEB | Dynamic experience buffer |
DEC-POMDP | Decentralized partially observable Markov decision process |
DRL | Deep reinforcement learning |
DMC | Dynamic matrix control |
ET-SAC | Soft Actor–Critic with Event-Triggered Mechanism |
FN | Fluid network |
MADDPG | Multi-agent deep deterministic policy gradient |
MARL | Multi-agent reinforcement learning |
MHTGR | Modular high-temperature gas-cooled reactor |
MPC | Model predictive control |
NPP | Nuclear Power Plant |
NSSS | Nuclear steam supply system |
OTSG | Once-through steam generator |
PID | Proportional–integral–derivative |
ReLU | Rectified Linear Unit |
RFP | Reactor full power |
TD | Temporal difference |
References
- Poudel, B.; Gokaraju, R. Small modular reactor (SMR) based hybrid energy system for electricity & district heating. IEEE Trans. Energy Convers. 2021, 36, 2794–2802. [Google Scholar]
- Salehi, A.; Safarzadeh, O.; Kazemi, M.H. Fractional order PID control of steam generator water level for nuclear steam supply systems. Nucl. Eng. Des. 2019, 342, 45–59. [Google Scholar] [CrossRef]
- Al Rashdan, A.; Roberson, D. A frequency domain control perspective on Xenon resistance for load following of thermal nuclear reactors. IEEE Trans. Nucl. Sci. 2019, 66, 2034–2041. [Google Scholar] [CrossRef]
- Panciak, I.; Diab, A. Dynamic Multiphysics Simulation of the Load-Following Behavior in a Typical Pressurized Water Reactor Power Plant. Energies 2019, 17, 6373. [Google Scholar] [CrossRef]
- Gyun, N.M.; Ho, S.S.; Cheol, K.W. A model predictive controller for nuclear reactor power. Nucl. Eng. Technol. 2003, 35, 399–411. [Google Scholar]
- Holkar, K.; Waghmare, L.M. An overview of model predictive control. Int. J. Control Autom. 2010, 3, 47–63. [Google Scholar]
- Vajpayee, V.; Mukhopadhyay, S.; Tiwari, A.P. Data-driven subspace predictive control of a nuclear reactor. IEEE Trans. Nucl. Sci. 2017, 65, 666–679. [Google Scholar] [CrossRef]
- Pradhan, S.K.; Das, D.K. Explicit model predictive controller for power control of molten salt breeder reactor core. Nucl. Eng. Des. 2021, 384, 111492. [Google Scholar] [CrossRef]
- Jiang, D.; Dong, Z. Dynamic matrix control for thermal power of multi-modular high temperature gas-cooled reactor plants. Energy 2020, 198, 117386. [Google Scholar] [CrossRef]
- Li, Y. Deep reinforcement learning: An overview. arXiv 2017, arXiv:1701.07274. [Google Scholar]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
- François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An introduction to deep reinforcement learning. Found. Trends® Mach. Learn. 2018, 11, 219–354. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P. Soft actor-critic algorithms and applications. arXiv 2018, arXiv:1812.05905. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Zhang, T.; Dong, Z.; Huang, X. Multi-objective optimization of thermal power and outlet steam temperature for a nuclear steam supply system with deep reinforcement learning. Energy 2024, 286, 129526. [Google Scholar] [CrossRef]
- Buşoniu, L.; Babuška, R.; De Schutter, B. Multi-agent reinforcement learning: An overview. In Innovations in Multi-Agent Systems and Applications-1; Springer: Berlin/Heidelberg, Germany, 2010; pp. 183–221. [Google Scholar]
- Zhang, K.; Yang, Z.; Başar, T. Multi-agent reinforcement learning: A selective overview of theories and algorithms. In Handbook of Reinforcement Learning and Control; Springer: Cham, Switzerland, 2021; pp. 321–384. [Google Scholar]
- Canese, L.; Cardarilli, G.C.; Di Nunzio, L.; Fazzolari, R.; Giardino, D.; Re, M.; Spanò, S. Multi-agent reinforcement learning: A review of challenges and applications. Appl. Sci. 2021, 11, 4948. [Google Scholar] [CrossRef]
- Dong, Z.; Pan, Y.; Zhang, Z.; Dong, Y.; Huang, X. Model-free adaptive control law for nuclear superheated-steam supply systems. Energy 2017, 135, 53–67. [Google Scholar] [CrossRef]
- Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Pieter Abbeel, O.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Dong, Z.; Pan, Y.; Zhang, Z.; Dong, Y.; Huang, X. Dynamical modeling and simulation of the six-modular high temperature gas-cooled reactor plant HTR-PM600. Energy 2018, 155, 971–991. [Google Scholar] [CrossRef]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In Proceedings of the Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, T.; Cheng, Z.; Dong, Z.; Huang, X. Coordinated Control Optimization of Nuclear Steam Supply Systems via Multi-Agent Reinforcement Learning. Energies 2025, 18, 2223. https://doi.org/10.3390/en18092223
Zhang T, Cheng Z, Dong Z, Huang X. Coordinated Control Optimization of Nuclear Steam Supply Systems via Multi-Agent Reinforcement Learning. Energies. 2025; 18(9):2223. https://doi.org/10.3390/en18092223
Chicago/Turabian StyleZhang, Tianhao, Zhonghua Cheng, Zhe Dong, and Xiaojin Huang. 2025. "Coordinated Control Optimization of Nuclear Steam Supply Systems via Multi-Agent Reinforcement Learning" Energies 18, no. 9: 2223. https://doi.org/10.3390/en18092223
APA StyleZhang, T., Cheng, Z., Dong, Z., & Huang, X. (2025). Coordinated Control Optimization of Nuclear Steam Supply Systems via Multi-Agent Reinforcement Learning. Energies, 18(9), 2223. https://doi.org/10.3390/en18092223