A Centralized–Distributed Joint Routing Algorithm for LEO Satellite Constellations Based on Multi-Agent Reinforcement Learning
Abstract
:1. Introduction
- We propose a joint reinforcement learning method named MARL-JR; it has lower end-to-end latency and more balanced network load.
- We propose Q-table Initialization, which allows the routing algorithm to have a better performance during the initial deployment phase of the satellite network.
- We conducted comparative analyses of various algorithms under different link failure rates. Experimental results demonstrate the superior robustness of the MARL-JR algorithm when handling emergent network conditions.
2. System Model
2.1. LEO Satellite Networks
- (1)
- Inclined-orbit constellations
- (2)
- Polar-orbit constellations
2.2. Multi-Agent Reinforcement Learning Algorithm
3. Algorithm Description
3.1. Model Setup
- State: The global environmental state of the satellite network at time , denoted as , is defined as , where and represent the current and destination nodes of the data packet, and indicates the queue length of node at time .
- Action: The action corresponds to the forwarding decision for the data packet, where denotes the Q-value associated with forwarding to the node. In LEO satellite networks, each satellite can establish connections with a maximum of four neighboring satellites [22], thus .
- Reward: The reward function is influenced by two key factors—propagation delay and the load condition . A maximum load threshold is defined and, according to Equation (4), is related to the number of received packets , sent packets , and occupied space , which represents the number of packets that the node has stored.
3.2. Information Exchange
3.3. Q-Table Initialization
3.4. Operational Phase
4. Simulation Analysis
4.1. Scenario Setup
4.2. Parameter Settings
4.3. Resource Utilization
4.4. Simulation Results Analysis
4.4.1. Comparison Algorithms
4.4.2. Comparison Results Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tsai, K.-C.; Fan, L.; Wang, L.-C.; Lent, R.; Han, Z. Multi-Commodity Flow Routing for Large-Scale LEO Satellite Networks Using Deep Reinforcement Learning. In Proceedings of the 2022 IEEE Wireless Communications and Networking Conference (WCNC), Austin, TX, USA, 10–13 April 2022; pp. 626–631. [Google Scholar] [CrossRef]
- Su, Y.; Liu, Y.; Zhou, Y.; Yuan, J.; Cao, H.; Shi, J. Broadband LEO Satellite Communications: Architectures and Key Technologies. IEEE Wirel. Commun. 2019, 26, 55–61. [Google Scholar] [CrossRef]
- Lei, Y.H.; Cao, L.F.; Han, M.D. A Handover Strategy Based on User Dynamic Preference for LEO Satellite. In Proceedings of the 2021 7th International Conference on Computer and Communications (ICCC), Chengdu, China, 10–13 December 2021; IEEE: New York, NY, USA, 2021; pp. 1925–1929. [Google Scholar]
- Werner, M. A dynamic routing concept for ATM-based satellite personal communication networks. IEEE J. Sel. Areas Commun. 1997, 15, 1636–1648. [Google Scholar] [CrossRef]
- Gounder, V.V.; Prakash, R.; Abu-Amara, H. Routing in LEO-based satellite networks. In Proceedings of the 1999 IEEE Emerging Technologies Symposium Wireless Communications and Systems, Richardson, TX, USA, 12–13 April 1999; IEEE: New York, NY, USA, 1999; pp. 22.1–22.6. [Google Scholar]
- Mauger, R.; Rosenberg, C. QoS guarantees for multimedia services on a TDMA-based satellite network. IEEE Commun. Mag. 1997, 35, 56–65. [Google Scholar] [CrossRef]
- Li, J.; Lu, H.; Xue, K.; Zhang, Y. Temporal Netgrid Model-Based Dynamic Routing in Large-Scale Small Satellite Networks. IEEE Trans. Veh. Technol. 2019, 68, 6009–6021. [Google Scholar] [CrossRef]
- Wang, X.; Dai, Z.; Xu, Z. LEO Satellite Network Routing Algorithm Based on Reinforcement Learning. In Proceedings of the 2021 IEEE 4th International Conference on Electronics Technology (ICET), Chengdu, China, 7–10 May 2021; pp. 1105–1109. [Google Scholar]
- Xu, L.; Huang, Y.-C.; Xue, Y.; Hu, X. Deep Reinforcement Learning-Based Routing and Spectrum Assignment of EONs by Exploiting GCN and RNN for Feature Extraction. J. Light. Technol. 2022, 40, 4945–4955. [Google Scholar] [CrossRef]
- Huang, Y.; Wu, S.; Kang, Z.; Mu, Z.; Huang, H.; Wu, X.; Tang, A.J.; Cheng, X. Reinforcement learning based dynamic distributed routing scheme for mega LEO satellite networks. Chin. J. Aeronaut. 2023, 36, 284–291. [Google Scholar] [CrossRef]
- Wang, C.; Wang, H.; Wang, W. A two-hops state-aware routing strategy based on deep reinforcement learning for LEO satellite networks. Electronics 2019, 8, 920. [Google Scholar] [CrossRef]
- Zuo, P.; Wang, C.; Yao, Z.; Hou, S.; Jiang, H. An intelligent routing algorithm for LEO satellites based on deep reinforcement learning. In Proceedings of the 2021 IEEE 94th Vehicular Technology Conference (VTC2021-Fall), Norman, OK, USA, 27–30 September 2021; IEEE Press: New York, NY, USA, 2021; pp. 1–5. [Google Scholar]
- Shi, Y.; Yuan, Z.; Zhu, X.; Zhu, H. An Adaptive Routing Algorithm for Inter-Satellite Networks Based on the Combination of Multipath Transmission and Q-Learning. Processes 2023, 11, 167. [Google Scholar] [CrossRef]
- Ma, F.; Zhang, X.; Li, X.; Cheng, J.; Guo, F.; Hu, J.; Pan, L. Hybrid constellation design using a genetic algorithm for a leo-based navigation augmentation system. GPS Solut. 2020, 24, 62. [Google Scholar] [CrossRef]
- Qu, Z.; Zhang, G.; Cao, H.; Xie, J. Leo satellite constellation for internet of things. IEEE Access 2017, 5, 18391–18401. [Google Scholar] [CrossRef]
- Amanor, D.N.; Edmonson, W.W.; Afghah, F. Intersatellite communication system based on visible light. IEEE Trans. Aerosp. Electron. Syst. 2018, 54, 2888–2899. [Google Scholar] [CrossRef]
- Handley, M. Using ground relays for low-latency wide-area routing in megaconstellations. In Proceedings of the 18th ACM Workshop on Hot Topics in Networks, Princeton, NJ, USA, 13–15 November 2019; pp. 125–132. [Google Scholar]
- Mcdowell, J.C. The Low Earth Orbit Satellite Population and Impacts of the SpaceX Starlink Constellation. Astrophys. J. Lett. 2020, 892, L36. [Google Scholar] [CrossRef]
- Henri, Y. The OneWeb Satellite System. In Handbook of Small Satellites; Pelton, J.N., Madry, S., Eds.; Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
- Hu, Z.; Wen, F.; Yong, J.; Fan, F.; Wu, B.; Qiu, K. Delay performance comparison across seven Low-Earth-Orbit (LEO) satellite constellations. In Proceedings of the Ninth Symposium on Novel Photoelectronic Detection Technology and Applications, Hefei, China, 2–4 November 2022; SPIE: Bellingham, WA, USA, 2023; Volume 12617, pp. 515–524. [Google Scholar]
- Chang, H.S.; Kim, B.W.; Lee, C.G.; Choi, Y.; Min, S.L.; Yang, H.S.; Kim, C.S. Topological design and routing for low-Earth orbit satellite networks. In Proceedings of the GLOBECOM’95, Singapore, 14–16 November 1995; Volume 1, pp. 529–535. [Google Scholar]
- Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Abbeel, P.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6379–6390. [Google Scholar]
- Jung, W.-S.; Yim, J.; Ko, Y.-B. QGeo: Q-Learning-Based Geographic Ad Hoc Routing Protocol for Unmanned Robotic Networks. IEEE Commun. Lett. 2017, 21, 2258–2261. [Google Scholar] [CrossRef]
- Tang, F.; Zhang, H.; Yang, L.T. Multipath Cooperative Routing with Efficient Acknowledgement for LEO Satellite Networks. IEEE Trans. Mob. Comput. 2019, 18, 179–192. [Google Scholar] [CrossRef]
- Dijkstra, E.W. A note on two problems in connexion with graphs. Numer. Math. 1959, 1, 269–271. [Google Scholar] [CrossRef]
- Soret, B.; Leyva-Mayorga, I.; Lozano-Cuadra, F.; Thorsager, M.D. Q-learning for distributed routing in LEO satellite constellations. In Proceedings of the 2024 IEEE International Conference on Machine Learning for Communication and Networking (ICMLCN), Stockholm, Sweden, 5–8 May 2024; pp. 208–213. [Google Scholar]
- Boyan, J.A.; Littman, M. Packet Routing in Dynamically Changing Networks: A Reinforcement Learning Approach. In Advances in Neural Information Processing Systems, Proceedings of the 7th International Conference on Neural Information Processing Systems, Denver, CO, USA, 29 November–2 December 1993; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1993; Volume 6. [Google Scholar]
Categories | Advantage | Shortcoming | Delay |
---|---|---|---|
LEO | low launch costs, low communication latency, and high-resolution observation capabilities | short coverage duration and large satellite constellation size | 10~100 ms |
MEO | fixed propagation delay and wider coverage area compared to LEO | antenna design complexities and high-latitude coverage limitations | 50~150 ms |
GEO | high operational stability and extensive coverage capability | long transmission distances with significant latency, coupled with high launch costs | 250~500 ms |
State | Neighbor | |||
---|---|---|---|---|
… | ||||
… | ||||
… | ||||
… | ||||
… | … | … | … | … |
Parameters | Definition |
---|---|
Satellite network graph | |
Set of satellite nodes | |
Set of inter-satellite links | |
Number of satellites | |
Individual satellite nodes | |
Set of satellite states | |
Set of forwarding actions | |
Q-value | |
Maximum satellite load | |
Load condition of node | |
Parameters | Value |
---|---|
Total satellites | 49 |
Number of orbits | 7 |
Number of satellites per orbital plane | 7 |
Orbital altitude | 780 km |
Orbital inclination | 86.4° |
Parameters | Value |
---|---|
Maximum queue length | 200 |
Load weight | 5 |
Delay weight | 1 |
Discount factor | 0.9 |
Greedy factor | 0.8 |
decay factor | 0.998 |
Number of episodes | 40 |
Number of steps peer episode | 300 |
Learning rate for Q-table Initialization | 0.7 |
Learning rate for operational phase | 0.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xia, L.; Lin, B.; Zhao, S.; Zhao, Y. A Centralized–Distributed Joint Routing Algorithm for LEO Satellite Constellations Based on Multi-Agent Reinforcement Learning. Appl. Sci. 2025, 15, 4664. https://doi.org/10.3390/app15094664
Xia L, Lin B, Zhao S, Zhao Y. A Centralized–Distributed Joint Routing Algorithm for LEO Satellite Constellations Based on Multi-Agent Reinforcement Learning. Applied Sciences. 2025; 15(9):4664. https://doi.org/10.3390/app15094664
Chicago/Turabian StyleXia, Licheng, Baojun Lin, Shuai Zhao, and Yanchun Zhao. 2025. "A Centralized–Distributed Joint Routing Algorithm for LEO Satellite Constellations Based on Multi-Agent Reinforcement Learning" Applied Sciences 15, no. 9: 4664. https://doi.org/10.3390/app15094664
APA StyleXia, L., Lin, B., Zhao, S., & Zhao, Y. (2025). A Centralized–Distributed Joint Routing Algorithm for LEO Satellite Constellations Based on Multi-Agent Reinforcement Learning. Applied Sciences, 15(9), 4664. https://doi.org/10.3390/app15094664