Assisted-Value Factorization with Latent Interaction in Cooperate Multi-Agent Reinforcement Learning
Abstract
:1. Introduction
- In pursuit of effectively assessing the latent interaction of agents, we analyzed the correlation between observations and history information, introducing a dynamic masking mechanism. The mechanism is capable of excising history information that falls below the dynamic change threshold, ensuring the history information is succinct and precise.
- We propose a novel latent interaction value (LIV) cooperation learning framework, which integrates history information to generate latent interaction values for each agent. The latent interaction values are used to directly correct the individual q-value functions, thereby enhancing the cooperation of agents.
- We conducted a series of experiments to demonstrate the effectiveness of LIV in a multi-agent particle environment and the StarCraft Multi-Agent Challenge. The results show that it significantly outperforms state-of-the-art multi-agent reinforcement learning methods in terms of performance and learning speed.
2. Related Works
3. Preliminaries
3.1. Dec-POMDP
3.2. Centralized Training with Decentralized Execution
4. Methods
4.1. Agent Network
4.2. Latent Interaction Generation
4.3. Dynamic Masking Mechanism
4.4. LIV Cooperation Learning Framework
Algorithm 1 LIV Cooperation Learning Framework |
|
5. Experiment
5.1. Experimental Environment
5.2. Baseline Method
5.3. Parameter Settings
5.4. Evaluation Metrics
5.5. Evaluation Results
5.5.1. Hard-MPE Results
5.5.2. SMAC Results
5.6. Network Scale Comparison
5.7. Ablation Results
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Bertsekas, D. Results in Control and Optimization. Results Control Optim. 2020, 1, 100003. [Google Scholar] [CrossRef]
- Cassano, L.; Yuan, K.; Sayed, A.H. Multiagent Fully Decentralized Value Function Learning With Linear Convergence Rates. IEEE Trans. Autom. Control 2021, 66, 1497–1512. [Google Scholar] [CrossRef]
- Cao, Y.; Yu, W.; Ren, W.; Chen, G. An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination. IEEE Trans. Ind. Informatics 2013, 9, 427–438. [Google Scholar] [CrossRef]
- Zanol, R.; Chiariotti, F.; Zanella, A. Drone Mapping through Multi-Agent Reinforcement Learning. In Proceedings of the 2019 IEEE Wireless Communications and Networking Conference (WCNC), Marrakech, Morocco, 15–19 April 2019; pp. 1–7. [Google Scholar] [CrossRef]
- Hüttenrauch, M.; Šošić, A.; Neumann, G. Guided Deep Reinforcement Learning for Swarm Systems. arXiv 2017, arXiv:1709.06011. [Google Scholar]
- Samvelyan, M.; Rashid, T.; Schroeder de Witt, C.; Farquhar, G.; Nardelli, N.; Rudner, T.G.J.; Hung, C.M.; Torr, P.H.S.; Foerster, J.; Whiteson, S. The StarCraft Multi-Agent Challenge. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, Montreal, QC, Canada, 13–17 May 2019; pp. 2186–2188. [Google Scholar]
- Oliehoek, F.A.; Spaan, M.T.J.; Vlassis, N. Optimal and Approximate Q-Value Functions for Decentralized POMDPs. J. Artif. Intell. Res. 2008, 32, 289–353. [Google Scholar] [CrossRef]
- Kraemer, L.; Banerjee, B. Multi-Agent Reinforcement Learning as a Rehearsal for Decentralized Planning. Neurocomputing 2016, 190, 82–94. [Google Scholar] [CrossRef]
- Kim, G.; Chung, W. Tripodal Schematic Control Architecture for Integration of Multi-Functional Indoor Service Robots. IEEE Trans. Ind. Electron. 2006, 53, 1723–1736. [Google Scholar] [CrossRef]
- Tan, M. Multi-Agent Reinforcement Learning: Independent versus Cooperative Agents. In Proceedings of the 10th International Conference on International Conference on Machine Learning, Amherst, MA, USA, 27–29 June 1993; pp. 330–337. [Google Scholar]
- Tuyls, K.; Weiss, G. Multiagent Learning: Basics, Challenges, and Prospects. AI Mag. 2012, 33, 41. [Google Scholar] [CrossRef]
- Hausknecht, M.; Stone, P. Deep Recurrent Q-Learning for Partially Observable MDPs. In Proceedings of the 29th AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 29–37. [Google Scholar]
- Ni, T.; Eysenbach, B.; Salakhutdinov, R. Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPs. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; pp. 16691–16723. [Google Scholar]
- Laurent, G.J.; Matignon, L.; Le Fort-Piat, N. The World of Independent Learners Is Not Markovian. Int. J. Knowl.-Based Intell. Eng. Syst. 2011, 15, 55–64. [Google Scholar] [CrossRef]
- Mordatch, I.; Abbeel, P. Emergence of Grounded Compositional Language in Multi-Agent Populations. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 1495–1502. [Google Scholar]
- Foerster, J.N.; Assael, Y.M.; de Freitas, N.; Whiteson, S. Learning to Communicate with Deep Multi-Agent Reinforcement Learning. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 2145–2153. [Google Scholar]
- Sukhbaatar, S.; szlam, a.; Fergus, R. Learning Multiagent Communication with Backpropagation. In Proceedings of the 30th International Conference on Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 2252–2260. [Google Scholar]
- Lowe, R.; Foerster, J.; Boureau, Y.L.; Pineau, J.; Dauphin, Y. On the Pitfalls of Measuring Emergent Communication. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, Montreal, QC, Canada, 13–17 May 2019; pp. 693–701. [Google Scholar]
- Kim, W.; Cho, M.; Sung, Y. Message-Dropout: An Efficient Training Method for Multi-Agent Deep Reinforcement Learning. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 6079–6086. [Google Scholar] [CrossRef]
- Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based on Team Reward. In Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems, Stockholm, Sweden, 10–15 July 2018; Volume 3, pp. 2085–2087. [Google Scholar]
- Rashid, T.; Samvelyan, M.; Schroeder, C.; Farquhar, G.; Foerster, J.; Whiteson, S. Qmix: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Volume 80, pp. 4295–4304. [Google Scholar]
- Ha, D.; Dai, A.; Le, Q.V. HyperNetworks. arXiv 2016, arXiv:1609.09106. [Google Scholar]
- Son, K.; Kim, D.; Kang, W.J.; Hostallero, D.; Yi, Y. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 5887–5896. [Google Scholar]
- Rashid, T.; Farquhar, G.; Peng, B.; Whiteson, S. Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; pp. 10199–10210. [Google Scholar]
- Yang, Y.; Hao, J.; Liao, B.; Shao, K.; Chen, G.; Liu, W.; Tang, H. Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning. arXiv 2020, arXiv:2002.03939. [Google Scholar]
- Wang, J.; Ren, Z.; Liu, T.; Yu, Y.; Zhang, C. QPLEX: Duplex Dueling Multi-Agent Q-Learning. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
- Lowe, R.; Wu, Y.; Tamar, A.; Harb, J.; Abbeel, P.; Mordatch, I. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6382–6393. [Google Scholar]
- Iqbal, S.; Sha, F. Actor-Attention-Critic for Multi-Agent Reinforcement Learning. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 2961–2970. [Google Scholar]
- Foerster, J.N.; Farquhar, G.; Afouras, T.; Nardelli, N.; Whiteson, S. Counterfactual Multi-Agent Policy Gradients. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 2974–2982. [Google Scholar]
- Zhou, M.; Liu, Z.; Sui, P.; Li, Y.; Chung, Y.Y. Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; pp. 11853–11864. [Google Scholar]
- Peng, B.; Rashid, T.; Schroeder de Witt, C.; Kamienny, P.A.; Torr, P.; Boehmer, W.; Whiteson, S. FACMAC: Factored Multi-Agent Centralised Policy Gradients. Adv. Neural Inf. Process. Syst. 2021, 34, 12208–12221. [Google Scholar]
- Wang, Y.; Han, B.; Wang, T.; Dong, H.; Zhang, C. Off-Policy Multi-Agent Decomposed Policy Gradients. arXiv 2020, arXiv:2007.12322. [Google Scholar]
- Oliehoek, F.A.; Amato, C. A Concise Introduction to Decentralized POMDPs; Springer International Publishing: Cham, Switzerland, 2016; Volume 1. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-Level Control through Deep Reinforcement Learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the 30th AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; pp. 2094–2100. [Google Scholar]
- Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.; Lanctot, M.; Freitas, N. Dueling Network Architectures for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; pp. 1995–2003. [Google Scholar]
- Agarwal, A.; Kumar, S.; Sycara, K. Learning Transferable Cooperative Behavior in Multi-Agent Teams. arXiv 2019, arXiv:1906.01202. [Google Scholar] [CrossRef]
- Naderializadeh, N.; Hung, F.H.; Soleyman, S.; Khosla, D. Graph Convolutional Value Decomposition in Multi-Agent Reinforcement Learning. arXiv 2021, arXiv:2010.04740. [Google Scholar]
- Sun, W.F.; Lee, C.K.; Lee, C.Y. DFAC Framework: Factorizing the Value Function via Quantile Mixture for Multi-Agent Distributional Q-Learning. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 9945–9954. [Google Scholar]
Scenarios | Methods | Average Test Return | Average Test Time |
---|---|---|---|
Coverage Control | LIV (Ours) | −27.51 | 36.82 |
CW-QMIX | −41.61 | 36.83 | |
OW-QMIX | −44.33 | 38.40 | |
QMIX | −29.42 | 34.26 | |
Formation Control | LIV (Ours) | −16.15 | 13.70 |
CW-QMIX | −16.10 | 13.74 | |
OW-QMIX | −16.36 | 14.17 | |
QMIX | −40.15 | 17.82 |
Methods | Layers | Parameter | |
---|---|---|---|
Maximum | Minimum | ||
QMIX 1 | 1 | 11,457 | 283,105 |
Weighted-QMIX | 2 | 157,891 | 1,021,667 |
Qplex | 6 | 17,165 | 708,581 |
GraphMix | 5 | 50,034 | 471,026 |
LIV (Ours) | 2 | 15,747 | 391,395 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhao, Z.; Zhang, Y.; Wang, S.; Zhou, Y.; Zhang, R.; Chen, W. Assisted-Value Factorization with Latent Interaction in Cooperate Multi-Agent Reinforcement Learning. Mathematics 2025, 13, 1429. https://doi.org/10.3390/math13091429
Zhao Z, Zhang Y, Wang S, Zhou Y, Zhang R, Chen W. Assisted-Value Factorization with Latent Interaction in Cooperate Multi-Agent Reinforcement Learning. Mathematics. 2025; 13(9):1429. https://doi.org/10.3390/math13091429
Chicago/Turabian StyleZhao, Zhitong, Ya Zhang, Siying Wang, Yang Zhou, Ruoning Zhang, and Wenyu Chen. 2025. "Assisted-Value Factorization with Latent Interaction in Cooperate Multi-Agent Reinforcement Learning" Mathematics 13, no. 9: 1429. https://doi.org/10.3390/math13091429
APA StyleZhao, Z., Zhang, Y., Wang, S., Zhou, Y., Zhang, R., & Chen, W. (2025). Assisted-Value Factorization with Latent Interaction in Cooperate Multi-Agent Reinforcement Learning. Mathematics, 13(9), 1429. https://doi.org/10.3390/math13091429