Optimal User Scheduling in Multi Antenna System Using Multi Agent Reinforcement Learning
Abstract
:1. Introduction
2. Related Work
3. Technical Background
3.1. Reinforcement Learning
Algorithm 1:Q-learning algorithm. |
1. Initialize Q arbitrarily Q (terminal) =0 Repeat initialize s Repeat choose take action a, observe s is terminal until convergence |
Algorithm 2:Value iteration algorithm. |
Initialize V arbitrarily Repeat For each until (a small positive number) output a deterministic policy, , such that |
- Take a prior distribution ;
- is the belief about RV X with no data observation;
- take a statistical model ;
- is the statistical dependence and belief about RV Y given the X;
- Make observation on data ;
- Find the posterior distribution using the Bayes rule as in [36].
3.2. MIMO Communication
4. System Model
4.1. Problem Statement
4.2. Problem Formulation
- State—Each underlying problem has a state or set of states that an RL agent may visit/explore. We map a state at time t as the combination of multiple parameters. First, we consider that each candidate user can determine its transmit beamforming vector using CSI, which is locally available [39]. Secondly, we use the combination of SINR, Gram Schmidt Orthogonalization, and SLNR as given in Equation (3). Each parameter is weighted equally, normalized between zero and one, and works as prior information for the BS. The former information is available to each user while the latter is received through BS.The indicates the maximum SLNR of a user.When , then .
- Action—Each user (agent) has to choose a resource block to transmit its data to BS. Therefore, an action is chosen by an agent m at time step t and can be written as given in Equation (9) as
- Reward—The natural objective of any user selection technique is to enhance the system capacity and optimal utilization of available radio resources. A reward function defines the goal in a RL problem. In every time step, the environment feedbacks to the RL agent a number as a reward. The agent’s goal is to maximize the total reward it receives over the long run. The reward or feedback thus defines the good and bad actions for the agent [6]. Sumrate is the metric used to indicate system performance in MU-MIMO systems. Therefore, the reward for the RL agent will be the aggregated sum-rate for all selected users as given in Equation (10) according to [28].
4.3. RL-Based User Scheduling
- Each user determines its transmit beamforming vector to quantify the amount of interference from the other users and resource block for data transmission as defined in Equation (9).
- As we are using the Rayleigh fading channel, which is to say that the channel gain from any of antennas to a user is described by ZMCSCG RV and this becomes a model that is suitable for narrow-band networks functioning in non-line-of-sight scenarios [41].
- BS After receiving feedback from users in the form of a priority resource block and amount of interference, calculate at each time TTS. Following BS selects the subset of users to equal the antenna number at BS to allow users for transmission based on all estimated information.
- After selecting a group of users, the aggregate sum-rate is calculated according to Equation (10). This sum-rate acts as feedback (reward), and based on the obtained reward for the selected users; each user can choose the most suitable resource block for transmission, which results in optimal utilization of available resources and enhancement in system capacity.
5. Results
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
AI | Artificial Intelligence |
BS | Base Station |
RV | Random Variable |
RA | Resource Allocation |
SINR | Signal-to-Interference plus Noise Ratio |
POMDP | Partially Observable MDP |
SLNR | Signal-to-Leakage plus Noise Ratio |
SNR | Signal-to-Noise Ratio |
AWGN | Additive White Gaussian Noise |
ZF | Zero Forcing |
DPC | Dirty Paper Coding |
SVD | Singular Value Decomposition |
ML | Machine Learning |
CSI | Channel State Information |
FPA | Flower Pollination Algorithm |
TTS | Transmission Time Slot |
DL | Deep Learning |
DNN | Deep Neural Networks |
RL | Reinforcement Learning |
MIMO | Multiple Input Multiple Output |
MU-MIMO | Multi User MIMO |
MDP | Markov Decision Process |
References
- Foshini, G. On limits of wireless communication in a fading environment when using multiple antenna. Wirel. Pers. Commun. 1998, 6, 315–335. [Google Scholar]
- Sarieddeen, H.; Mansour, M.M.; Jalloul, L.M.; Chehab, A. Efficient near optimal joint modulation classification and detection for MU-MIMO systems. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 3706–3710. [Google Scholar]
- Jindal, N.; Goldsmith, A. Dirty-paper coding versus TDMA for MIMO broadcast channels. IEEE Trans. Inf. Theory 2005, 51, 1783–1794. [Google Scholar] [CrossRef]
- Lee, J.; Jindal, N. Dirty paper coding vs. In linear precoding for MIMO broadcast channels. In Proceedings of the 2006 Fortieth Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 29 October–1 November 2006; pp. 779–783. [Google Scholar]
- Peel, C.B.; Hochwald, B.M.; Swindlehurst, A.L. A vector-perturbation technique for near-capacity multiantenna multiuser communication-part I: Channel inversion and regularization. IEEE Trans. Commun. 2005, 53, 195–202. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
- Di Sarno, C.; Formicola, V.; Sicuranza, M.; Paragliola, G. Addressing Security Issues of Electronic Health Record Systems through Enhanced SIEM Technology. In Proceedings of the 2013 International Conference on Availability, Reliability and Security, Regensburg, Germany, 2–6 September 2013; pp. 646–653. [Google Scholar] [CrossRef]
- Coronato, A.; Di Napoli, C.; Paragliola, G.; Serino, L. Intelligent Planning of Onshore Touristic Itineraries for Cruise Passengers in a Smart City. In Proceedings of the 2021 17th International Conference on Intelligent Environments (IE), Dubai, United Arab Emirates, 21–24 June 2021; pp. 1–7. [Google Scholar] [CrossRef]
- Coronato, A.; de Pietro, G.; Paragliola, G. A Monitoring System Enhanced by Means of Situation-Awareness for Cognitive Impaired People. In Proceedings of the 8th International Conference on Body Area Networks, Boston, MA, USA, 30 September–2 October 2013; ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering): Brussels, Belgium, 2013; pp. 124–127. [Google Scholar] [CrossRef]
- Paragliola, G.; Coronato, A.; Naeem, M.; De Pietro, G. A Reinforcement Learning-Based Approach for the Risk Management of e-Health Environments: A Case Study. In Proceedings of the 2018 14th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Las Palmas de Gran Canaria, Spain, 26–29 November 2018; pp. 711–716. [Google Scholar] [CrossRef]
- Naeem, M.; Rizvi, S.T.H.; Coronato, A. A Gentle Introduction to Reinforcement Learning and its Application in Different Fields. IEEE Access 2020, 8, 209320–209344. [Google Scholar] [CrossRef]
- Van Chien, T.; Canh, T.N.; Björnson, E.; Larsson, E.G. Power control in cellular massive MIMO with varying user activity: A deep learning solution. IEEE Trans. Wirel. Commun. 2020, 19, 5732–5748. [Google Scholar] [CrossRef]
- Nie, J.; Haykin, S. A Q-learning-based dynamic channel assignment technique for mobile communication systems. IEEE Trans. Veh. Technol. 1999, 48, 1676–1687. [Google Scholar]
- Bennis, M.; Niyato, D. A Q-learning based approach to interference avoidance in self-organized femtocell networks. In Proceedings of the 2010 IEEE Globecom Workshops, Miami, FL, USA, 6–10 December 2010; pp. 706–710. [Google Scholar]
- Santos, E.C. A simple reinforcement learning mechanism for resource allocation in lte-a networks with markov decision process and q-learning. arXiv 2017, arXiv:1709.09312. [Google Scholar]
- Kong, P.Y.; Panaitopol, D. Reinforcement learning approach to dynamic activation of base station resources in wireless networks. In Proceedings of the 2013 IEEE 24th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), London, UK, 8–11 September 2013; pp. 3264–3268. [Google Scholar]
- Kurve, A. Multi-user MIMO systems: The future in the making. IEEE Potentials 2009, 28, 37–42. [Google Scholar] [CrossRef]
- Naeem, M.; Bashir, S.; Khan, M.U.; Syed, A.A. Performance comparison of scheduling algorithms for MU-MIMO systems. In Proceedings of the 2016 13th International Bhurban Conference on Applied Sciences and Technology (IBCAST), Islamabad, Pakistan, 12–16 January 2016; pp. 601–606. [Google Scholar]
- Naeem, M.; Bashir, S.; Khan, M.U.; Syed, A.A. Modified SINR based user selection for MU-MIMO systems. In Proceedings of the 2015 International Conference on Information and Communication Technologies (ICICT), Karachi, Pakistan, 12–13 December 2015; pp. 1–4. [Google Scholar]
- Wei, C.; Xu, K.; Xia, X.; Su, Q.; Shen, M.; Xie, W.; Li, C. User-centric access point selection in cell-free massive MIMO systems: A game-theoretic approach. IEEE Commun. Lett. 2022, 26, 2225–2229. [Google Scholar] [CrossRef]
- Carvajal, H.; Orozco, N.; Cacuango, S.; Salazar, P.; Rosero, E.; Almeida, F. A Scheduling Scheme for Improving the Performance and Security of MU-MIMO Systems. Sensors 2022, 22, 5369. [Google Scholar] [CrossRef]
- Yin, Z.; Chen, J.; Li, G.; Wang, H.; He, W.; Ni, Y. A Deep Learning-Based User Selection Scheme for Cooperative NOMA System with Imperfect CSI. Wirel. Commun. Mob. Comput. 2022, 2022, 7732029. [Google Scholar] [CrossRef]
- Jang, J.; Lee, H.; Kim, I.M.; Lee, I. Deep Learning for Multi-User MIMO Systems: Joint Design of Pilot, Limited Feedback, and Precoding. IEEE Trans. Commun. 2022, 20, 4044–4057. [Google Scholar] [CrossRef]
- Ahmed, I.; Shahid, M.K.; Khammari, H.; Masud, M. Machine Learning Based Beam Selection with Low Complexity Hybrid Beamforming Design for 5G Massive MIMO Systems. IEEE Trans. Green Commun. Netw. 2021, 5, 2160–2173. [Google Scholar] [CrossRef]
- Salh, A.; Audah, L.; Abdullah, Q.; Shah, N.S.M.; Shipun, A. Energy-efficient low-complexity algorithm in 5G massive MIMO systems. Comput. Mater. Contin. 2021, 67, 3189–3214. [Google Scholar] [CrossRef]
- Perdana, R.H.Y.; Nguyen, T.V.; An, B. Deep Learning-based Power Allocation in Massive MIMO Systems with SLNR and SINR Criterions. In Proceedings of the 2021 Twelfth International Conference on Ubiquitous and Future Networks (ICUFN), Jeju Island, Korea, 17–20 August 2021; pp. 87–92. [Google Scholar]
- Perdana, R.H.Y.; Nguyen, T.V.; An, B. Deep neural network design with SLNR and SINR criterions for downlink power allocation in multi-cell multi-user massive MIMO systems. ICT Express 2022. [Google Scholar] [CrossRef]
- Xia, X.; Wu, G.; Liu, J.; Li, S. Leakage-based user scheduling in MU-MIMO broadcast channel. Sci. China Ser. F Inf. Sci. 2009, 52, 2259–2268. [Google Scholar] [CrossRef]
- Xia, X.; Wu, G.; Fang, S.; Li, S. SINR or SLNR: In successive user scheduling in mu-mimo broadcast channel with finite rate feedback. In Proceedings of the 2010 International Conference on Communications and Mobile Computing, Shenzhen, China, 12–14 April 2010; pp. 383–387. [Google Scholar]
- Naeem, M.; Khan, M.U.; Bashir, S.; Syed, A.A. Modified leakage based user selection for MU-MIMO systems. In Proceedings of the 2015 13th International Conference on Frontiers of Information Technology (FIT), Islamabad, Pakistan, 14–16 December 2015; pp. 320–323. [Google Scholar]
- Zhao, L.; Li, B.; Meng, K.; Gong, B.; Zhou, Y. A novel user scheduling for multiuser MIMO systems with block diagonalization. In Proceedings of the 2013 IEEE 24th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC), London, UK, 8–11 September 2013; pp. 1371–1375. [Google Scholar]
- Naeem, M.; Bashir, S.; Ullah, Z.; Syed, A.A. A near optimal scheduling algorithm for efficient radio resource management in multi-user MIMO systems. Wirel. Pers. Commun. 2019, 106, 1411–1427. [Google Scholar] [CrossRef]
- Sharifi, S. A POMDP Framework for Antenna Selection and User Scheduling in Multi-User Massive MIMO Systems. Ph.D. Thesis, Ontario Tech University, Oshawa, ON, Canada, 2022. [Google Scholar]
- Mohanty, J.; Pattanayak, P.; Nandi, A.; Baishnab, K.L.; Talukdar, F.A. Binary flower pollination algorithm based user scheduling for multiuser MIMOsystems. Turk. J. Electr. Eng. Comput. Sci. 2022, 30, 1317–1336. [Google Scholar] [CrossRef]
- Rajarajeswarie, B.; Sandanalakshmi, R. Machine learning based hybrid precoder with user scheduling technique for maximizing sum rate in downlink MU-MIMO system. Int. J. Inf. Technol. 2022, 14, 2399–2405. [Google Scholar] [CrossRef]
- Ghavamzadeh, M.; Mannor, S.; Pineau, J.; Tamar, A. Bayesian Reinforcement Learning: A Survey. Found. Trends Mach. Learn. 2016, 8, 102–109. [Google Scholar] [CrossRef] [Green Version]
- Naeem, M.; De Pietro, G.; Coronato, A. Application of reinforcement learning and deep learning in multiple-input and multiple-output (MIMO) systems. Sensors 2021, 22, 309. [Google Scholar] [CrossRef] [PubMed]
- He, S.; Du, J.; Liao, Y. Multi-User Scheduling for 6G V2X Ultra-Massive MIMO System. Sensors 2021, 21, 6742. [Google Scholar] [CrossRef] [PubMed]
- Lee, B.O.; Je, H.W.; Sohn, I.; Shin, O.S.; Lee, K.B. Interference-Aware Decentralized Precoding for Multicell MIMO TDD Systems. In Proceedings of the IEEE GLOBECOM 2008—2008 IEEE Global Telecommunications Conference, New Orleans, LA, USA, 30 November–4 December 2008; pp. 1–5. [Google Scholar] [CrossRef]
- Dearden, R.; Friedman, N.; Andre, D. Model-Based Bayesian Exploration. arXiv 2013, arXiv:1301.6690. [Google Scholar]
- Tse, D.; Viswanath, P. Fundamentals of Wireless Communication; Cambridge University Press: Cambridge, UK, 2005. [Google Scholar]
- Yoo, T.; Goldsmith, A. On the optimality of multiantenna broadcast scheduling using zero-forcing beamforming. IEEE J. Sel. Areas Commun. 2006, 24, 528–541. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Naeem, M.; Coronato, A.; Ullah, Z.; Bashir, S.; Paragliola, G. Optimal User Scheduling in Multi Antenna System Using Multi Agent Reinforcement Learning. Sensors 2022, 22, 8278. https://doi.org/10.3390/s22218278
Naeem M, Coronato A, Ullah Z, Bashir S, Paragliola G. Optimal User Scheduling in Multi Antenna System Using Multi Agent Reinforcement Learning. Sensors. 2022; 22(21):8278. https://doi.org/10.3390/s22218278
Chicago/Turabian StyleNaeem, Muddasar, Antonio Coronato, Zaib Ullah, Sajid Bashir, and Giovanni Paragliola. 2022. "Optimal User Scheduling in Multi Antenna System Using Multi Agent Reinforcement Learning" Sensors 22, no. 21: 8278. https://doi.org/10.3390/s22218278