MACA: Multi-Agent with Credit Assignment for Computation Offloading in Smart Parks Monitoring
Abstract
:1. Introduction
- (1)
- In order to solve the video monitoring analysis task in smart parks, the edge computing node and cloud data center are introduced to satisfy the computation offloading requirements. The system model includes multiple devices and multiple edge computing nodes, taking into account the dynamically changing communication channel states and task characteristics. We introduce reinforcement learning to overcome the ultra-high computation time of traditional methods through offline training and online decision-making, which makes the computation offloading utilizable in real-time scenes.
- (2)
- To deal with the curse of dimensionality caused by the expansion of the decision feasible region, we introduce a credit assignment method into value-based reinforcement learning, which is converted from being a single-agent scenario to a multi-agent scenario. The credit assignment method decomposes the global Q-value to each individual Q-value , which enforces the monotonous constraint between global and individual Q-values. Meanwhile, the centralized training and decentralized execution framework makes use of the global statue information when training agents, which makes agents work more cooperatively and accelerates the training process.
- (3)
- In addition, we introduce a double Q-network, dueling Q-network, and priority experience replay method into our proposed multi-agent reinforcement learning algorithm and analyze the contribution of each component via an ablation study. Through numerical simulation, we demonstrate that our proposed MACA algorithm can achieve better performance compared with traditional DQN algorithms and other approaches, especially when the number of agents increases. Furthermore, we also verify the generalization capability of our proposed MACA algorithm.
2. Related Works
2.1. Computation Offloading Task
2.2. Reinforcement Learning
3. The Proposed Approach
3.1. Problem Definition
3.2. Multi-Agent Reinforcement Learning Scenario
3.3. MACA Algorithm Design for Computation Offloading
Algorithm 1: Multi-Agent Deep Reinforcement Learning Algorithm with Credit Assignment. |
4. Experiment
4.1. Simulation Settings
4.2. Simulation Results
4.2.1. Training Process
4.2.2. Agent Number Comparison Experiment
4.2.3. Ablation Experiment
4.3. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
Notation | Definition |
N | The set of camera device group |
f | Edge computing node |
T | The computation offloading cycle |
The amount of data that needs to be uploaded to complete the task | |
The number of CPU cycles required for computing tasks | |
The channel bandwidth between the device n and the edge computing node f | |
P | The power of the computational offloading task |
h | The gain of the communication channel when the task is transmitted |
The variance of the complex Gaussian white channel noise | |
Time delay of execute task locally | |
Local computational capability | |
Edge computing node computational capability | |
Energy consumption for executing task locally | |
Time delay of transmission when executing task with edge node | |
Time delay of computation when executing task with edge node | |
Energy consumption of transmission when executing task with edge node | |
Energy consumption of computation when executing task with edge node | |
The cost of computation offloading process |
References
- Kim, J.B.; Kim, H.J. Efficient region-based motion segmentation for a video monitoring system. Pattern Recognit. Lett. 2003, 24, 113–128. [Google Scholar] [CrossRef]
- Li, C.; Pourtaherian, A.; van Onzenoort, L.; a Ten, W.T.; de With, P. Infant facial expression analysis: Towards a real-time video monitoring system using R-CNN and HMM. IEEE J. Biomed. Health Inform. 2020, 25, 1429–1440. [Google Scholar] [CrossRef] [PubMed]
- Kekki, S.; Featherstone, W.; Fang, Y.; Kuure, P.; Li, A.; Ranjan, A.; Purkayastha, D.; Jiangping, F.; Frydman, D.; Verin, G.; et al. Mec in 5G networks. ETSI White Pap. 2018, 28, 1–28. [Google Scholar]
- Zeng, F.; Tang, J.; Liu, C.; Deng, X.; Li, W. Task-offloading strategy based on performance prediction in vehicular edge computing. Mathematics 2022, 10, 1010. [Google Scholar] [CrossRef]
- Mach, P.; Becvar, Z. Mobile edge computing: A survey on architecture and computation offloading. IEEE Commun. Surv. Tutor. 2017, 19, 1628–1656. [Google Scholar] [CrossRef] [Green Version]
- Lin, L.; Liao, X.; Jin, H.; Li, P. Computation offloading toward edge computing. Proc. IEEE 2019, 107, 1584–1607. [Google Scholar] [CrossRef]
- Henderson, P.; Islam, R.; Bachman, P.; Pineau, J.; Precup, D.; Meger, D. Deep reinforcement learning that matters. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar]
- Nosratabadi, S.; Mosavi, A.; Duan, P.; Ghamisi, P.; Filip, F.; Band, S.S.; Reuter, U.; Gama, J.; Gandomi, A.H. Data science in economics: Comprehensive review of advanced machine learning and deep learning methods. Mathematics 2020, 8, 1799. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing atari with deep reinforcement learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
- François-Lavet, V.; Henderson, P.; Islam, R.; Bellemare, M.G.; Pineau, J. An introduction to deep reinforcement learning. Found. Trends Mach. Learn. 2018, 11, 219–354. [Google Scholar] [CrossRef] [Green Version]
- Hou, Y.; Liu, L.; Wei, Q.; Xu, X.; Chen, C. A novel DDPG method with prioritized experience replay. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; pp. 316–321. [Google Scholar]
- Thomas, P.S.; Brunskill, E. Policy gradient methods for reinforcement learning with function approximation and action-dependent baselines. arXiv 2017, arXiv:1706.06643. [Google Scholar]
- Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust region policy optimization. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
- Heess, N.; TB, D.; Sriram, S.; Lemmon, J.; Merel, J.; Wayne, G.; Tassa, Y.; Erez, T.; Wang, Z.; Eslami, S.; et al. Emergence of locomotion behaviours in rich environments. arXiv 2017, arXiv:1707.02286. [Google Scholar]
- Konda, V.; Tsitsiklis, J. Actor-critic algorithms. Adv. Neural Inf. Process. Syst. 1999, 12, 1008–1014. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Mnih, V.; Badia, A.P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 1928–1937. [Google Scholar]
- Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1861–1870. [Google Scholar]
- Lowe, R.; Wu, Y.I.; Tamar, A.; Harb, J.; Abbeel, O.P.; Mordatch, I. Multi-agent actor-critic for mixed cooperative-competitive environments. Adv. Neural Inf. Process. Syst. 2017, 30, 6382–6393. [Google Scholar]
- Lee, H.; Park, S.; Kim, J.; Kim, J. Auction-based deep learning computation offloading for truthful edge computing: A myerson auction approach. In Proceedings of the 2021 International Conference on Information Networking (ICOIN), Jeju Island, Republic of Korea, 13–16 January 2021; pp. 457–459. [Google Scholar]
- Pradhan, C.; Li, A.; She, C.; Li, Y.; Vucetic, B. Computation offloading for iot in C-RAN: Optimization and deep learning. IEEE Trans. Commun. 2020, 68, 4565–4579. [Google Scholar] [CrossRef] [Green Version]
- Zhang, K.; Zhu, Y.; Leng, S.; He, Y.; Maharjan, S.; Zhang, Y. Deep learning empowered task offloading for mobile edge computing in urban informatics. IEEE Internet Things J. 2019, 6, 7635–7647. [Google Scholar] [CrossRef]
- Ren, Y.; Sun, Y.; Peng, M. Deep reinforcement learning based computation offloading in fog enabled industrial internet of things. IEEE Trans. Ind. Inform. 2020, 17, 4978–4987. [Google Scholar] [CrossRef]
- Yu, S.; Chen, X.; Yang, L.; Wu, D.; Bennis, M.; Zhang, J. Intelligent edge: Leveraging deep imitation learning for mobile edge computation offloading. IEEE Wirel. Commun. 2020, 27, 92–99. [Google Scholar] [CrossRef]
- Rashid, T.; Samvelyan, M.; Schroeder, C.; Farquhar, G.; Foerster, J.; Whiteson, S. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4295–4304. [Google Scholar]
- Bušoniu, L.; Babuxsxka, R.; Schutter, B.D. Multi-agent reinforcement learning: An overview. In Innovations in Multi-Agent Systems and Applications—1; Springer: Berlin/Heidelberg, Germany, 2010; pp. 183–221. [Google Scholar]
Parameter | Value | Description |
---|---|---|
The data of the task to be uploaded | ||
The CPU cycles required for the task | ||
The computational capacity of the edge computing node | ||
The computing constant factor of the edge computing node | ||
The computing constant factor of the local device | ||
Time-varying bandwidth channel | ||
h | The channel gain | |
P | The transmit power | |
The learning rate | ||
128 | The size sample from buffer | |
bit per second | Computing capacity of the edge computing node | |
bit per second | Local computing capacity | |
B | Hz | Channel bandwidth |
W | Communication channel noise |
Reward | Edge Computing Nodes Number | ||||||
---|---|---|---|---|---|---|---|
f = 2 | f = 3 | ||||||
Method | Exhaustive Search | DQN | MACA | Exhaustive Search | DQN | MACA | |
edge computing node number | n = 2 | 2.537 | 2.273 | 2.176 | 2.655 | 2.412 | 2.432 |
n = 3 | 1.906 | 1.583 | 1.764 | 2.387 | 2.053 | 2.124 | |
n = 4 | 1.387 | 1.215 | 1.22 | 2.109 | 1.875 | 2.005 | |
n = 5 | 1.138 | 0.965 | 1.104 | 1.995 | 1.748 | 1.869 | |
n = 6 | 0.406 | 0.351 | 0.362 | 1.803 | 1.558 | 1.687 | |
n = 7 | 0.319 | 0.257 | 0.297 | 1.604 | 1.232 | 1.432 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
She, L.; Wang, J.; Bo, Y.; Zeng, Y. MACA: Multi-Agent with Credit Assignment for Computation Offloading in Smart Parks Monitoring. Mathematics 2022, 10, 4616. https://doi.org/10.3390/math10234616
She L, Wang J, Bo Y, Zeng Y. MACA: Multi-Agent with Credit Assignment for Computation Offloading in Smart Parks Monitoring. Mathematics. 2022; 10(23):4616. https://doi.org/10.3390/math10234616
Chicago/Turabian StyleShe, Liang, Jianyuan Wang, Yifan Bo, and Yangyan Zeng. 2022. "MACA: Multi-Agent with Credit Assignment for Computation Offloading in Smart Parks Monitoring" Mathematics 10, no. 23: 4616. https://doi.org/10.3390/math10234616
APA StyleShe, L., Wang, J., Bo, Y., & Zeng, Y. (2022). MACA: Multi-Agent with Credit Assignment for Computation Offloading in Smart Parks Monitoring. Mathematics, 10(23), 4616. https://doi.org/10.3390/math10234616