Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control with Spatio-Temporal Attention Mechanism
Abstract
1. Introduction
2. Related Works
3. Hybrid Multi-Agent Reinforcement Learning (Hybrid MARL)
3.1. Spatio-Temporal Information Extraction
3.2. Single-Agent Reinforcement Learning Framework
- S (State Space) represents the current traffic conditions, including queue lengths at each lane, average vehicle speeds, and the current traffic light phase.
- A (Action Space) defines the possible traffic signal phase changes at an intersection. Each agent selects an action from a discrete set of phase transition options.
- P (State Transition Probability) specifies the probability of transitioning from one state to another, given a selected action.
- R (Reward Function) evaluates the effectiveness of an action based on metrics such as total vehicle waiting time, throughput, and fairness in traffic signal allocation.
- γ (Discount Factor) determines the importance of future rewards relative to immediate rewards.
3.3. Multi-Agent Reinforcement Learning Framework
- (1)
- Sub-region Agents: These agents control a small cluster of intersections and optimize traffic flow within their local region. Each sub-region agent is responsible for managing a local cluster of intersections. In our experimental setting, each sub-region typically contains 4 to 8 signalized intersections, depending on the density and structure of the urban road layout. This decomposition ensures both manageable learning complexity and sufficient spatial coordination.
- (2)
- Global Agent: This centralized agent aggregates traffic information from all sub-region agents and provides high-level guidance to ensure network-wide coordination.
- (3)
- This hierarchical structure reduces computational complexity while maintaining effective coordination across large road networks. The overall structure of the Hybrid MARL model is presented in Figure 3, where local agents operate within defined regions, and a central agent optimizes traffic flow at a global level.
3.4. Algorithm Workflow
- (1)
- Traffic Data Processing: Data is collected and preprocessed into structured state embeddings.
- (2)
- Spatio-Temporal Feature Extraction: CNNs, LSTMs, and GATs are used to extract relevant spatial and temporal patterns.
- (3)
- Policy Learning: Each sub-region agent optimizes its traffic signal policy using MADDPG.
- (4)
- Global Coordination: The central agent aggregates information and provides coordination signals.
- (5)
- Execution Phase: Trained policies are deployed for real-time traffic signal control.
4. Experiments
4.1. Experimental Setup
4.2. Comparison with Baseline Methods
- (1)
- Fixed-Time Control (FT): A conventional method that assigns pre-defined green time intervals to each phase, irrespective of real-time traffic conditions.
- (2)
- Actuated Control (AC): A semi-adaptive method that adjusts signal timings based on real-time vehicle detection sensors.
- (3)
- Max-Pressure Control (MP): A widely used optimization-based approach that balances incoming and outgoing vehicle flows at intersections.
- (4)
- Deep Q-Network (DQN): A reinforcement learning approach that uses a single-agent Q-learning framework for traffic signal control.
- (5)
- Proximal Policy Optimization (PPO): A policy-gradient reinforcement learning model designed for continuous control tasks.
4.3. Performance Analysis Under Different Traffic Conditions
4.4. Ablation Study
- (1)
- Hybrid MARL (Full Model): The complete proposed model, integrating spatio-temporal feature extraction, hierarchical reinforcement learning, and global coordination.
- (2)
- Without Spatio-Temporal Attention (No-ST): This variant removes Graph Attention Networks (GATs) and LSTMs, preventing agents from learning dynamic traffic dependencies.
- (3)
- Without Global Coordination (No-GC): This model eliminates the global agent, forcing sub-region agents to operate independently.
- (4)
- Without Hierarchical Learning (No-HL): A single-layer reinforcement learning approach without hierarchical control.
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Algorithm Parameters
Parameter | Value | Description | Justification |
Discount Factor (γ\gamma) | 0.95 | Determines the importance of future rewards. | A common value used in reinforcement learning for traffic control systems. |
Learning Rate | 0.0003 | Step size for updating the weights in the optimization process. | Chosen to ensure stable convergence during training. |
Batch Size | 64 | Number of experiences used for each gradient update. | Empirically selected to balance learning speed and memory usage. |
Experience Replay Size | 100,000 | Number of experiences stored in the replay buffer for training. | Large buffer size improves training stability and efficiency in large-scale systems. |
Reward Scaling | Varies | Scaling factor applied to individual reward components for balancing objectives (waiting time, throughput, fairness). | Tuned to ensure a balanced trade-off between competing objectives. |
References
- Wang, B.; He, Z.K.; Sheng, J.F.; Liu, Y.X. Multi-agent deep reinforcement learning with actor-attention-critic for traffic light control. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2024, 238, 2880–2888. [Google Scholar] [CrossRef]
- Lee, S.W.; Heo, Y.J.; Zhang, B.T. Answerer in Questioner’s Mind: Information Theoretic Approach to Goal-Oriented Visual Dialog. arXiv 2018, arXiv:1802.03881. [Google Scholar]
- Ding, L.; Lin, Z.; Shi, X.; Yan, G. Target-Value-Competition-Based Multi-Agent Deep Reinforcement Learning Algorithm for Distributed Nonconvex Economic Dispatch. IEEE Trans. Power Syst. 2023, 38, 204–217. [Google Scholar] [CrossRef]
- Shang, P.; Liu, X.; Yu, C.; Yan, G.; Xiang, Q.; Mi, X. A new ensemble deep graph reinforcement learning network for spatio-temporal traffic volume forecasting in a freeway network. Digit. Signal Process. 2022, 123, 103419. [Google Scholar] [CrossRef]
- Qu, A.; Tang, Y.; Ma, W. Adversarial attacks on deep reinforcement learning-based traffic signal control systems with colluding vehicles. ACM Trans. Intell. Syst. Technol. 2023, 14, 113. [Google Scholar] [CrossRef]
- Feng, Y.; Head, K.L.; Khoshmagham, S.; Zamanipour, M. A real-time adaptive signal control in a connected vehicle environment. Transp. Res. Part C Emerg. Technol. 2015, 55, 460–473. [Google Scholar] [CrossRef]
- Luo, Z.; Xu, J.; Chen, F. Multi-agent Reinforcement Traffic Signal Control Based on Interpretable Influence Mechanism and Biased ReLU Approximation. arXiv 2024, arXiv:2403.13639. [Google Scholar]
- Li, Y.; Guan, Q.; Gu, J.F.; Jiang, X.; Li, Y. A hierarchical deep reinforcement learning method for solving urban route planning problems under large-scale customers and real-time traffic conditions. Int. J. Geogr. Inf. Sci. 2025, 39, 118–141. [Google Scholar] [CrossRef]
- Li, X.; Lu, L.; Ni, W.; Jamalipour, A.; Zhang, D.; Du, H. Federated multi-agent deep reinforcement learning for resource allocation of vehicle-to-vehicle communications. IEEE Trans. Veh. Technol. 2022, 71, 8810–8824. [Google Scholar] [CrossRef]
- Liu, J.; Li, F.; Wang, J.; Han, H. Proximal Policy Optimization Based Decentralized Networked Multi-Agent Reinforcement Learning. In Proceedings of the 2024 IEEE 18th International Conference on Control & Automation (ICCA), Reykjavik, Iceland, 18–21 June 2024; pp. 839–844. [Google Scholar]
- Kim, G.; Sohn, K. Area-wide traffic signal control based on a deep graph Q-Network (DGQN) trained in an asynchronous manner. Appl. Soft Comput. 2022, 119, 108497. [Google Scholar] [CrossRef]
- Paul, A.; Mitra, S. Deep reinforcement learning based cooperative control of traffic signal for multi-intersection network in intelligent transportation system using edge computing. Trans. Emerg. Telecommun. Technol. 2022, 33, e4588. [Google Scholar] [CrossRef]
- Zhu, R.; Ding, W.; Wu, S.; Li, L.; Lv, P.; Xu, M. Auto-learning communication reinforcement learning for multi-intersection traffic light control. Knowl.-Based Syst. 2023, 275, 110696. [Google Scholar] [CrossRef]
- Sun, Z.; Wu, H.; Shi, Y.; Yu, X.; Gao, Y.; Pei, W.; Yang, Z.; Piao, H.; Hou, Y. Multi-agent air combat with two-stage graph-attention communication. Neural Comput. Appl. 2023, 35, 19765–19781. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, P.; Fan, M.; Sartoretti, G. HeteroLight: A General and Efficient Learning Approach for Heterogeneous Traffic Signal Control. In Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Abu Dhabi, United Arab Emirates, 14–18 October 2024; pp. 1010–1017. [Google Scholar]
- Yao, L.; Torabi, A.; Cho, K.; Ballas, N.; Pal, C.; Larochelle, H.; Courville, A. Video description generation incorporating spatio-temporal features and a soft-attention mechanism. arXiv 2015, arXiv:1502.08029. [Google Scholar]
- Wang, Y.; Xu, T.; Niu, X.; Tan, C.; Chen, E.; Xiong, H. STMARL: A spatio-temporal multi-agent reinforcement learning approach for cooperative traffic light control. IEEE Trans. Mob. Comput. 2020, 21, 2228–2242. [Google Scholar] [CrossRef]
- Du, X.; Wang, J.; Chen, S.; Liu, Z. Multi-agent deep reinforcement learning with spatio-temporal feature fusion for traffic signal control. In Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. Proceedings of the European Conference, ECML PKDD 2021, Bilbao, Spain, 13–17 September 2021, Proceedings, Part IV; Springer International Publishing: Cham, Switzerland, 2021; pp. 470–485. [Google Scholar]
- Wang, K.; Shen, Z.; Lei, Z.; Zhang, T. Towards multi-agent reinforcement learning based traffic signal control through spatio-temporal hypergraphs. arXiv 2024, arXiv:2404.11014. [Google Scholar] [CrossRef]
- Li, Y.; Zhang, Y.; Li, X.; Sun, C. Regional multi-agent cooperative reinforcement learning for city-level traffic grid signal control. IEEE/CAA J. Autom. Sin. 2024, 11, 1987–1998. [Google Scholar] [CrossRef]
- Fang, J.; You, Y.; Xu, M.; Wang, J.; Cai, S. Multi-objective traffic signal control using network-wide agent coordinated reinforcement learning. Expert Syst. Appl. 2023, 229, 120535. [Google Scholar] [CrossRef]
- Chergui, O.; Sayad, L. Mitigating congestion in multi-agent traffic signal control: An efficient self-attention proximal policy optimization approach. Int. J. Inf. Technol. 2024, 16, 2273–2282. [Google Scholar] [CrossRef]
- Zhang, Y.; Yu, Z.; Zhang, J.; Wang, L.; Luan, T.H.; Guo, B.; Yuen, C. Learning decentralized traffic signal controllers with multi-agent graph reinforcement learning. IEEE Trans. Mob. Comput. 2023, 23, 7180–7195. [Google Scholar] [CrossRef]
- Mao, F.; Li, Z.; Lin, Y.; Li, L. Mastering arterial traffic signal control with multi-agent attention-based soft actor-critic model. IEEE Trans. Intell. Transp. Syst. 2022, 24, 3129–3144. [Google Scholar] [CrossRef]
- Zhou, B.; Zhou, Q.; Hu, S.; Ma, D.; Jin, S.; Lee, D.H. Cooperative traffic signal control using a distributed agent-based deep reinforcement learning with incentive communication. IEEE Trans. Intell. Transp. Syst. 2024, 25, 10147–10160. [Google Scholar] [CrossRef]
- Kang, L.; Huang, H.; Lu, W.; Liu, L. Optimizing gate control coordination signal for urban traffic network boundaries using multi-agent deep reinforcement learning. Expert Syst. Appl. 2024, 255, 124627. [Google Scholar] [CrossRef]
- Barnhart, C.; Bertsimas, D.; Caramanis, C.; Fearing, D. Equitable and Efficient Coordination in Traffic Flow Management. Transp. Sci. 2012, 46, 262–280. [Google Scholar] [CrossRef]
- Hoogendoorn, S.P.; Knoop, V.L.; van Zuylen, H.J. Robust Control of Traffic Networks under Uncertain Conditions. J. Adv. Transp. 2008, 42, 357–377. [Google Scholar] [CrossRef]
- Wang, X.; Ma, Y.; Wang, Y.; Jin, W.; Wang, X.; Tang, J.; Jia, C.; Yu, J. Traffic Flow Prediction Via Spatial Temporal Graph Neural Network. In Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020. [Google Scholar]
- Ammar, H.; Yasin, Y. Deep Reinforcement Learning for Intelligent Transportation Systems: A Survey. arXiv 2022, arXiv:2005.00935. [Google Scholar]
- Hu, H.; Li, X.; Zhang, Y.; Shang, C.; Zhang, S. Multi-objective Location-Routing Model for Hazardous Material Logistics with Traffic Restriction Constraint in Inter-City Roads. Comput. Ind. Eng. 2019, 128, 861–876. [Google Scholar] [CrossRef]
- Ge, G.; Wei, Y. Short-term Traffic Speed Forecasting Based on Graph Attention Temporal Convolutional Networks. Neurocomputing 2020, 410, 387–393. [Google Scholar] [CrossRef]
Traffic Intensity (vehicles/h) | Fixed-Time Control (FT) | Actuated Control (AC) | Max-Pressure Control (MP) | Deep Q-Network (DQN) | Proximal Policy Optimization (PPO) | MADDPG | Hybrid MARL |
---|---|---|---|---|---|---|---|
Low (800) | 45.6 | 38.2 | 32.4 | 28.9 | 26.7 | 24.5 | 18.3 |
Medium (1200) | 62.5 | 55.8 | 48.2 | 41.3 | 38.9 | 35.7 | 27.4 |
High (1600) | 85.3 | 78.9 | 67.1 | 59.8 | 54.6 | 49.3 | 38.1 |
Very High (2000) | 110.7 | 103.2 | 89.4 | 82.5 | 76.2 | 70.4 | 52.7 |
Method | Throughput (vehicles/h) |
---|---|
Fixed-Time Control (FT) | 1250 |
Actuated Control (AC) | 1360 |
Max-Pressure Control (MP) | 1485 |
Deep Q-Network (DQN) | 1620 |
Proximal Policy Optimization (PPO) | 1680 |
MADDPG | 1745 |
Hybrid MARL | 1890 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jia, W.; Ji, M. Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control with Spatio-Temporal Attention Mechanism. Appl. Sci. 2025, 15, 8605. https://doi.org/10.3390/app15158605
Jia W, Ji M. Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control with Spatio-Temporal Attention Mechanism. Applied Sciences. 2025; 15(15):8605. https://doi.org/10.3390/app15158605
Chicago/Turabian StyleJia, Wenzhe, and Mingyu Ji. 2025. "Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control with Spatio-Temporal Attention Mechanism" Applied Sciences 15, no. 15: 8605. https://doi.org/10.3390/app15158605
APA StyleJia, W., & Ji, M. (2025). Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control with Spatio-Temporal Attention Mechanism. Applied Sciences, 15(15), 8605. https://doi.org/10.3390/app15158605