Multi-Layer Energy Management and Strategy Learning for Microgrids: A Proximal Policy Optimization Approach
Abstract
:1. Introduction
1.1. Train of Thoughts and Literature Review
1.2. Contributions
- Novelty in system architecture: A novel hierarchical multi-layer EMS is proposed for a microgrid containing wind turbines, photovoltaics, pumped storage, battery storage, and regional users. A supply layer adopts a centralized generation management scheme to fully consume renewable energy and prevent overcapacity; a demand layer adopts distributed consumption autonomous decision-making to participate in the energy auction; a scheduling layer, as a neutral operator, implements the upper- and lower-layer deployment and information exchange.
- Novelty in optimization methodology: The centralized PPO is applied in the supply layer to efficiently obtain the optimal generation strategy. In the energy auction market of the demand layer, a multi-agent PPO is utilized to develop an independent energy usage strategy for each user, which is the first combination of PPO, multi-agent reinforcement learning (MARL), and market auction mechanism.
- Novelty in multi-agent learning: In the distributed multi-agent learning for the auction decision, an action-guidance-based method is proposed to promote the equilibrium selection of the MAPPO algorithm.
1.3. Organization
2. System Model
2.1. Supply Layer
2.1.1. Renewable Generation
2.1.2. Pumped Storage Plant
2.1.3. Energy Storage System
2.1.4. Microturbine Group
2.2. Demand Layer
2.2.1. User Loads
2.2.2. Utility Function
2.3. Scheduling Layer
2.3.1. Energy Pricing
2.3.2. Mechanism of Energy Auction Market
3. Centralized PPO-Based Generation Optimization for Supply Layer
3.1. Markov Decision Process
3.2. Proximal Policy Optimization
3.3. Problem Formulation
3.4. Algorithm Implementation Process
Algorithm 1 Centralized PPO procedure for supply layer |
|
4. MAPPO-Based Distributed Auction Decision for Demand Layer
4.1. Multi-Agent Reinforcement Learning with PPO
4.2. Action-Guidance-Based MAPPO
4.3. Problem Formulation
4.4. Algorithm Implementation Process
Algorithm 2 MAPPO procedures for demand layer |
|
5. Case Studies
5.1. Modeling Settings
5.2. Data Sources
5.3. Scenario Verification
5.4. Comparison of Supply Layer’s Benefits
5.5. Comparison of Demand Layer’s Costs
6. Conclusions and Discussion
6.1. Conclusions
- The hierarchical EMS framework achieves unified management for the supply layer, decision autonomy for the demand layer, and privacy protection for all users by employing a neutral operator.
- The centralized PPO method used at the supply layer makes full use of renewable energy and increases the economic and environmental benefits for the microgrid.
- In the auction market of the demand layer, the multi-agent PPO method enables consumers to independently generate pricing strategies, ensuring an equilibrium of benefits for all participants.
- The action-guidance-based mechanism embedded in MAPPO accelerates the convergence speed of the training process for multi-agent interactive learning.
6.2. Discussion
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zheng, Z.; Shafique, M.; Luo, X.; Wang, S. A systematic review towards integrative energy management of smart grids and urban energy systems. Renew. Sustain. Energy Rev. 2024, 189, 114023. [Google Scholar] [CrossRef]
- Hsu, C.C.; Jiang, B.H.; Lin, C.C. A survey on recent applications of artificial intelligence and optimization for smart grids in smart manufacturing. Energies 2023, 16, 7660. [Google Scholar] [CrossRef]
- Che, L.; Shahidehpour, M.; Alabdulwahab, A.; Al-Turki, Y. Hierarchical coordination of a community microgrid with ac and dc microgrids. IEEE Trans. Smart Grid 2015, 6, 3042–3051. [Google Scholar] [CrossRef]
- Shuai, Z.; Sun, Y.; Shen, Z.J.; Tian, W.; Tu, C.; Li, Y.; Yin, X. Microgrid stability: Classification and a review. Renew. Sustain. Energy Rev. 2016, 58, 167–179. [Google Scholar] [CrossRef]
- She, B.; Li, F.; Cui, H.; Zhang, J.; Bo, R. Fusion of microgrid control with model-free reinforcement learning: Review and vision. IEEE Trans. Smart Grid 2023, 14, 3232–3245. [Google Scholar] [CrossRef]
- Pannala, S.; Patari, N.; Srivastava, A.K.; Padhy, N.P. Effective control and management scheme for isolated and grid connected dc microgrid. IEEE Trans. Ind. Appl. 2020, 56, 6767–6780. [Google Scholar] [CrossRef]
- Tian, J.; Liu, Z.; Shu, J.; Liu, J.; Tang, J. Base on the ultra-short term power prediction and feed-forward control of energy management for microgrid system applied in industrial park. IET Gener. Transm. Distrib. 2016, 10, 2259–2266. [Google Scholar] [CrossRef]
- Yassim, H.M.; Abdullah, M.N.; Gan, C.K.; Ahmed, A. A review of hierarchical energy management system in networked microgrids for optimal inter-microgrid power exchange. Electr. Power Syst. Res. 2024, 231, 110329. [Google Scholar] [CrossRef]
- Al Sumarmad, K.A.; Sulaiman, N.; Wahab, N.I.A.; Hizam, H. Energy management and voltage control in microgrids using artificial neural networks, pid, and fuzzy logic controllers. Energies 2022, 15, 303. [Google Scholar] [CrossRef]
- Olivares, D.E.; Canizares, C.A.; Kazerani, M. A centralized energy management system for isolated microgrids. IEEE Trans. Smart Grid 2014, 5, 1864–1875. [Google Scholar] [CrossRef]
- Petrollese, M.; Valverde, L.; Cocco, D.; Cau, G.; Guerra, J. Real-time integration of optimal generation scheduling with MPC for the energy management of a renewable hydrogen-based microgrid. Appl. Energy 2016, 166, 96–106. [Google Scholar] [CrossRef]
- Aznavi, S.; Fajri, P.; Sabzehgar, R.; Asrari, A. Optimal management of residential energy storage systems in presence of intermittencies. J. Build. Eng. 2020, 29, 101149. [Google Scholar] [CrossRef]
- Bhattar, C.L.; Chaudhari, M.A. Centralized energy management scheme for grid connected dc microgrid. IEEE Syst. J. 2023, 17, 3741–3751. [Google Scholar] [CrossRef]
- Lee, S.; Kwon, B.; Lee, S. Joint energy management system of electric supply and demand in houses and buildings. IEEE Trans. Power Syst. 2014, 29, 2804–2812. [Google Scholar] [CrossRef]
- Song, B.; Jin, W.; Li, C.; Aroos, K. Economic management and planning based on a probabilistic model in a multi-energy market in the presence of renewable energy sources with a demand-side management program. Energy 2023, 269, 126549. [Google Scholar]
- Yin, H.; Zhao, C.; Ma, C. Decentralized real-Time energy management for a reconfigurable multiple-source energy system. IEEE Trans. Ind. Inform. 2018, 14, 4128–4137. [Google Scholar] [CrossRef]
- Afsari, F.; Jirdehi, M.A. Smart grid optimization considering decentralized power distribution and demand side management. IET Gener. Transm. Distrib. 2024, 18, 1663–1671. [Google Scholar] [CrossRef]
- Wynn, S.L.L.; Boonraksa, T.; Boonraksa, P.; Pinthurat, W.; Marungsri, B. Decentralized energy management system in microgrid considering uncertainty and demand response. Electronics 2023, 12, 237. [Google Scholar] [CrossRef]
- Han, Y.; Yang, H.; Li, Q.; Chen, W.; Zare, F.; Guerrero, J.M. Mode-triggered droop method for the decentralized energy management of an islanded hybrid PV/hydrogen/battery DC microgrid. Electronics 2020, 199, 117441. [Google Scholar] [CrossRef]
- Liu, G.; Jiang, T.; Ollis, T.B.; Zhang, X.; Tomsovic, K. Distributed energy management for community microgrids considering network operational constraints and building thermal dynamics. Appl. Energy 2019, 239, 83–95. [Google Scholar] [CrossRef]
- Babaei, M.A.; Hasanzadeh, S.; Karimi, H. Cooperative energy scheduling of interconnected microgrid system considering renewable energy resources and electric vehicles. Electr. Power Syst. Res. 2024, 229. [Google Scholar] [CrossRef]
- Tabar, V.S.; Jirdehi, M.A.; Hemmati, R. Energy management in microgrid based on the multi objective stochastic programming incorporating portable renewable energy resource as demand response option. Energy 2017, 118, 827–839. [Google Scholar] [CrossRef]
- Comodi, G.; Giantomassi, A.; Severini, M.; Squartini, S.; Ferracuti, F.; Fonti, A.; Cesarini, D.N.; Morodo, M.; Polonara, F. Multi-apartment residential microgrid with electrical and thermal storage devices: Experimental analysis and simulation of energy management strategies. Appl. Energy 2015, 137, 854–866. [Google Scholar] [CrossRef]
- Kassab, F.A.; Celik, B.; Locment, F.; Sechilariu, M.; Liaqua, S.; Hansen, T.M. Optimal sizing and energy management of a microgrid: A joint MILPapproach for minimization of energy cost and carbon emission. Renew. Energy 2024, 224, 120186. [Google Scholar] [CrossRef]
- Shuai, H.; Fang, J.; Ai, X.; Tang, Y.; Wen, J.; He, H. Stochastic optimization of economic dispatch for microgrid based on approximate dynamic programming. IEEE Trans. Smart Grid 2019, 10, 2440–2452. [Google Scholar] [CrossRef]
- Khaloie, H.; Mollahassani-Pour, M.; Anvari-Moghaddam, A. Optimal allocation of distributed generation and energy storage system in microgrids. IEEE Trans. Sustain. Energy 2021, 12, 931–943. [Google Scholar] [CrossRef]
- Velasquez, M.A.; Barreiro-Gomez, J.; Quijano, N.; Cadena, A.I.; Shahidehpour, M. Intra-Hour microgrid economic dispatch based on model predictive control. IEEE Trans. Smart Grid 2020, 11, 1968–1979. [Google Scholar] [CrossRef]
- Elsied, M.; Oukaour, A.; Youssef, T.; Gualous, H.; Mohammed, O. An advanced real time energy management system for microgrids. Energy 2016, 114, 742–752. [Google Scholar] [CrossRef]
- Zhu, Q.; Wang, Q. Real-time energy management controller design for a hybrid excavator using reinforcement learning. J. Zhejiang Univ.-Sci. A 2017, 18, 855–870. [Google Scholar] [CrossRef]
- Arroyo, J.; Manna, C.; Spiessens, F.; Helsen, L. Reinforced model predictive control (RL-MPC) for building energy management. Appl. Energy 2022, 309, 118346. [Google Scholar] [CrossRef]
- Sanaye, S.; Sarrafi, A. A novel energy management method based on Deep Q network algorithm for low operating cost of an integrated hybrid system. Energy Rep. 2021, 7, 2647–2663. [Google Scholar] [CrossRef]
- Fang, X.; Zhao, Q.; Wang, J.; Han, Y.; Li, Y. Multi-agent deep reinforcement learning for distributed energy management and strategy optimization of microgrid market. Sustain. Cities Soc. 2021, 74, 103163. [Google Scholar] [CrossRef]
- Zhang, L.; Gao, Y.; Zhu, H.; Tao, L. Bi-level stochastic real-time pricing model in multi-energy generation system: A reinforcement learning approach. Energy 2022, 239, 121926. [Google Scholar] [CrossRef]
- Liu, D.; Zang, C.; Zeng, P.; Li, W.; Wang, X.; Liu, Y.; Xu, S. Deep reinforcement learning for real-time economic energy management of microgrid system considering uncertainties. Front. Energy Res. 2023, 11, 1163053. [Google Scholar] [CrossRef]
- Zhu, Z.; Hu, Z.; Chan, K.W.; Bu, S.; Zhou, B.; Xia, S. Reinforcement learning in deregulated energy market: A comprehensive review. Appl. Energy 2023, 329, 12–25. [Google Scholar] [CrossRef]
- Mei, P.; Karimi, H.R.; Xie, H.; Chen, F.; Huang, C.; Yang, S. A deep reinforcement learning approach to energy management control with connected information for hybrid electric vehicles. Eng. Appl. Artif. Intell. 2023, 123, 106239. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Honarm, M.; Zakariazadeh, A.; Jadid, S. Integrated scheduling of renewable generation and electric vehicles parking lot in a smart microgrid. Energy Convers. Manag. 2014, 86, 745–755. [Google Scholar] [CrossRef]
- Tichi, S.G.; Ardehali, M.M.; Nazari, M.E. Examination of energy price policies in Iran for optimal configuration of CHP and CCHP systems based on particle swarm optimization algorithm. Energy Plicy 2010, 30, 6240–6250. [Google Scholar] [CrossRef]
- Khaloie, H.; Anvari-Moghaddam, A.; Hatziargyriou, N.; Contreras, J. Risk-constrained self-scheduling of a hybrid power plant considering interval-based intraday demand response exchange market prices. J. Clean. Prod. 2021, 282, 125344. [Google Scholar] [CrossRef]
- Jakob, F.; Gregory, F.; Triantafyllos, A.; Nantas, N.; Shimon, W. Counterfactual multi-agent policy gradients. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LO, USA, 2–7 February 2018; AAAI Press: Palo Alto, CA, USA, 2017; pp. 2974–2983. [Google Scholar]
- California Independent System Operator Open Access Same-Time Information System. Available online: http://oasis.caiso.com/mrioasis/logon.do (accessed on 12 November 2023).
Parameter | Value | Parameter | Value |
---|---|---|---|
100 MWh | 800 MWh | ||
1 | 0.75 | ||
200 MW | 300 MWh | ||
0.2 | 0.8 | ||
0.36 | 0.9 | ||
100 MW | 1 |
Parameter | Value | Parameter | Value |
---|---|---|---|
Actor learning rate | Memory size N | 1200 | |
Critic learning rate | Clipping coefficient | 0.2 | |
Discount factor | 0.95 | Minibatch sampling size | 480 |
Time | Output (MW) | Time | Output (MW) | Time | Output (MW) |
---|---|---|---|---|---|
0:00 | 13.0 | 8:00 | 0 | 16:00 | 188.2 |
1:00 | 14.0 | 9:00 | 0 | 17:00 | 218.5 |
2:00 | 6.0 | 10:00 | 0 | 18:00 | 656.9 |
3:00 | 11.0 | 11:00 | 0 | 19:00 | 1021.2 |
4:00 | 78.0 | 12:00 | 0 | 20:00 | 783.9 |
5:00 | 80.0 | 13:00 | 0 | 21:00 | 277.3 |
6:00 | 91.0 | 14:00 | 0 | 22:00 | 226.0 |
7:00 | 0 | 15:00 | 54.0 | 23:00 | 68.0 |
Time | Output (MW) | Time | Output (MW) | Time | Output (MW) |
---|---|---|---|---|---|
0:00 | 0 | 8:00 | 0 | 16:00 | 110.0 |
1:00 | 0 | 9:00 | 0 | 17:00 | 119.8 |
2:00 | 0 | 10:00 | 0 | 18:00 | 420.2 |
3:00 | 0 | 11:00 | 0 | 19:00 | 664.3 |
4:00 | 11.5 | 12:00 | 87.0 | 20:00 | 536.7 |
5:00 | 53.5 | 13:00 | 86.0 | 21:00 | 484.4 |
6:00 | 173.4 | 14:00 | 110.0 | 22:00 | 397.5 |
7:00 | 0 | 15:00 | 138.0 | 23:00 | 134.5 |
Time | Output (MW) | Time | Output (MW) | Time | Output (MW) |
---|---|---|---|---|---|
0:00 | 648.8 | 8:00 | 395.3 | 16:00 | 653.9 |
1:00 | 649.8 | 9:00 | 0 | 17:00 | 827.8 |
2:00 | 706.5 | 10:00 | 0 | 18:00 | 779.7 |
3:00 | 662.3 | 11:00 | 0 | 19:00 | 881.4 |
4:00 | 752.3 | 12:00 | 0 | 20:00 | 902.7 |
5:00 | 731.3 | 13:00 | 0 | 21:00 | 852.4 |
6:00 | 858.3 | 14:00 | 0 | 22:00 | 750.5 |
7:00 | 753.5 | 15:00 | 220.5 | 23:00 | 718.5 |
Algorithm | Single Update Time (Second) | Convergence Step Size (Episode) | Average Convergence Time (Minute) |
---|---|---|---|
Q-learning | 0.48 | 96 | |
DQN | 0.54 | 90 | |
PPO | 1.27 | 106 |
Algorithm | Single Update Time (Second) | Convergence Step Size (Episode) | Average Convergence Time (Minute) |
---|---|---|---|
MADQN | 1.48 | 1500 | 37 |
MADQN with AGB | 1.48 | 1400 | 35 |
MAPPO | 3.61 | 1000 | 60 |
MAPPO with AGB | 3.61 | 800 | 48 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Fang, X.; Hong, P.; He, S.; Zhang, Y.; Tan, D. Multi-Layer Energy Management and Strategy Learning for Microgrids: A Proximal Policy Optimization Approach. Energies 2024, 17, 3990. https://doi.org/10.3390/en17163990
Fang X, Hong P, He S, Zhang Y, Tan D. Multi-Layer Energy Management and Strategy Learning for Microgrids: A Proximal Policy Optimization Approach. Energies. 2024; 17(16):3990. https://doi.org/10.3390/en17163990
Chicago/Turabian StyleFang, Xiaohan, Peng Hong, Shuping He, Yuhao Zhang, and Di Tan. 2024. "Multi-Layer Energy Management and Strategy Learning for Microgrids: A Proximal Policy Optimization Approach" Energies 17, no. 16: 3990. https://doi.org/10.3390/en17163990