MDPI - Publisher of Open Access Journals

25 pages, 974 KiB

Open AccessArticle

Thompson Sampling for Non-Stationary Bandit Problems

by Han Qi, Fei Guo and Li Zhu

Entropy 2025, 27(1), 51; https://doi.org/10.3390/e27010051 - 9 Jan 2025

Viewed by 365

Non-stationary multi-armed bandit (MAB) problems have recently attracted extensive attention. We focus on the abruptly changing scenario where reward distributions remain constant for a certain period and change at unknown time steps. Although Thompson sampling (TS) has shown success in non-stationary settings, there [...] Read more.

Non-stationary multi-armed bandit (MAB) problems have recently attracted extensive attention. We focus on the abruptly changing scenario where reward distributions remain constant for a certain period and change at unknown time steps. Although Thompson sampling (TS) has shown success in non-stationary settings, there is currently no regret bound analysis for TS with uninformative priors. To address this, we propose two algorithms, discounted TS and sliding-window TS, designed for sub-Gaussian reward distributions. For these algorithms, we establish an upper bound for the expected regret by bounding the expected number of times a suboptimal arm is played. We show that the regret upper bounds of both algorithms are

$\tilde{O} (\sqrt{T B_{T}})$ , where T is the time horizon and

$B_{T}$ is the number of breakpoints. This upper bound matches the lower bound for abruptly changing problems up to a logarithmic factor. Empirical comparisons with other non-stationary bandit algorithms highlight the competitive performance of our proposed methods. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

19 pages, 402 KiB

Open AccessArticle

A Novel Hyper-Heuristic Algorithm with Soft and Hard Constraints for Causal Discovery Using a Linear Structural Equation Model

by Yinglong Dang, Xiaoguang Gao and Zidong Wang

Entropy 2025, 27(1), 38; https://doi.org/10.3390/e27010038 - 6 Jan 2025

Viewed by 477

Abstract

Artificial intelligence plays an indispensable role in improving productivity and promoting social development, and causal discovery is one of the extremely important research directions in this field. Acyclic directed graphs (DAGs) are the most commonly used tool in causal modeling because of their [...] Read more.

Artificial intelligence plays an indispensable role in improving productivity and promoting social development, and causal discovery is one of the extremely important research directions in this field. Acyclic directed graphs (DAGs) are the most commonly used tool in causal modeling because of their excellent interpretability and structural properties. However, in the face of insufficient data, the accuracy and efficiency of DAGs learning are greatly reduced, resulting in a false perception of causality. As intuitive expert knowledge, structural constraints control DAG learning by limiting the causal relationship between variables, which is expected to solve the above-mentioned problem. However, it is often impossible to build a DAG by relying on expert knowledge alone. To solve this problem, we propose the use of expert knowledge as a hard constraint and the structural prior gained via data learning as a soft constraint. In this paper, we propose a fitness-rate-rank-based multiarmed bandit (FRRMAB) hyper-heuristic that integrates soft and hard constraints into the DAG learning process. For a linear structural equation model (SEM), soft constraints are obtained via partial correlation analysis. The experimental results on different networks show that the proposed method has higher scalability and accuracy. Full article

(This article belongs to the Special Issue Causal Graphical Models and Their Applications)

► Show Figures

Figure 1

20 pages, 351 KiB

Open AccessFeature PaperArticle

Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees

by Ali Baheri

Mathematics 2025, 13(1), 149; https://doi.org/10.3390/math13010149 - 3 Jan 2025

Viewed by 589

Abstract

The multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real-world scenarios: (1) when [...] Read more.

The multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real-world scenarios: (1) when decisions must follow a hierarchical structure (as in autonomous systems where high-level strategy guides low-level actions); (2) when there are constraints at multiple levels of decision-making (such as both system-wide and component-level resource limits); and (3) when available actions depend on previous choices or context. To address these challenges, we introduce the hierarchical constrained bandits (HCB) framework, which extends contextual bandits to incorporate both hierarchical decisions and multilevel constraints. We propose the HC-UCB (hierarchical constrained upper confidence bound) algorithm to solve the HCB problem. The algorithm uses confidence bounds within a hierarchical setting to balance exploration and exploitation while respecting constraints at all levels. Our theoretical analysis establishes that HC-UCB achieves sublinear regret, guarantees constraint satisfaction at all hierarchical levels, and is near-optimal in terms of achievable performance. Simple experimental results demonstrate the algorithm’s effectiveness in balancing reward maximization with constraint satisfaction. Full article

► Show Figures

Figure 1

30 pages, 1715 KiB

Open AccessArticle

Multi-Armed Bandit Approaches for Location Planning with Dynamic Relief Supplies Allocation Under Disaster Uncertainty

by Jun Liang, Zongjia Zhang and Yanpeng Zhi

Smart Cities 2025, 8(1), 5; https://doi.org/10.3390/smartcities8010005 - 25 Dec 2024

Viewed by 577

Abstract

Natural disasters (e.g., floods, earthquakes) significantly impact citizens, economies, and the environment worldwide. Due to their sudden onset, devastating effects, and high uncertainty, it is crucial for emergency departments to take swift action to minimize losses. Among these actions, planning the locations of [...] Read more.

Natural disasters (e.g., floods, earthquakes) significantly impact citizens, economies, and the environment worldwide. Due to their sudden onset, devastating effects, and high uncertainty, it is crucial for emergency departments to take swift action to minimize losses. Among these actions, planning the locations of relief supply distribution centers and dynamically allocating supplies is paramount, as governments must prioritize citizens’ safety and basic living needs following disasters. To address this challenge, this paper develops a three-layer emergency logistics network to manage the flow of emergency materials, from warehouses to transfer stations to disaster sites. A bi-objective, multi-period stochastic integer programming model is proposed to solve the emergency location, distribution, and allocation problem under uncertainty, focusing on three key decisions: transfer station selection, upstream emergency material distribution, and downstream emergency material allocation. We introduce a multi-armed bandit algorithm, named the Geometric Greedy algorithm, to optimize transfer station planning while accounting for subsequent dynamic relief supply distribution and allocation in a stochastic environment. The new algorithm is compared with two widely used multi-armed bandit algorithms: the

$ϵ$ -Greedy algorithm and the Upper Confidence Bound (UCB) algorithm. A case study in the Futian District of Shenzhen, China, demonstrates the practicality of our model and algorithms. The results show that the Geometric Greedy algorithm excels in both computational efficiency and convergence stability. This research offers valuable guidelines for emergency departments in optimizing the layout and flow of emergency logistics networks. Full article

(This article belongs to the Section Applied Science and Humanities for Smart Cities)

► Show Figures

Figure 1

21 pages, 653 KiB

Open AccessArticle

Non-Myopic Beam Scheduling for Multiple Smart-Target Tracking in Phased Array Radar Networks

by Yuhang Hao, Zengfu Wang, José Niño-Mora, Jing Fu, Quan Pan and Min Yang

Sensors 2024, 24(23), 7755; https://doi.org/10.3390/s24237755 - 4 Dec 2024

Viewed by 637

Abstract

This paper addresses beam scheduling for tracking multiple smart targets in phased array radar networks, aiming to mitigate the performance degradation in previous myopic scheduling methods and enhance the tracking performance, which is measured by a discounted cost objective related to the tracking [...] Read more.

This paper addresses beam scheduling for tracking multiple smart targets in phased array radar networks, aiming to mitigate the performance degradation in previous myopic scheduling methods and enhance the tracking performance, which is measured by a discounted cost objective related to the tracking error covariance (TEC) of the targets. The scheduling problem is formulated as a restless multi-armed bandit problem, where each bandit process is associated with a target and its TEC states evolve with different transition rules for different actions, i.e., either the target is tracked or not. However, non-linear measurement functions necessitate the inclusion of dynamic state information for updating future multi-step TEC states. To compute the marginal productivity (MP) index, the unscented sampling method is employed to predict dynamic and TEC states. Consequently, an unscented sampling-based MP (US-MP) index policy is proposed for selecting targets to track at each time step, which can be applicable to large networks with a realistic number of targets. Numerical evidence presents that the bandit model with the scalar Kalman filter satisfies sufficient conditions for indexability based upon partial conservation laws and extensive simulations validate the effectiveness of the proposed US-MP policy in practical scenarios with TEC states. Full article

(This article belongs to the Section Radar Sensors)

► Show Figures

Figure 1

38 pages, 1053 KiB

Open AccessArticle

Thompson Sampling for Stochastic Bandits with Noisy Contexts: An Information-Theoretic Regret Analysis

by Sharu Theresa Jose and Shana Moothedath

Entropy 2024, 26(7), 606; https://doi.org/10.3390/e26070606 - 17 Jul 2024

Cited by 2 | Viewed by 1085

Abstract

We study stochastic linear contextual bandits (CB) where the agent observes a noisy version of the true context through a noise channel with unknown channel parameters. Our objective is to design an action policy that can “approximate” that of a Bayesian oracle that [...] Read more.

We study stochastic linear contextual bandits (CB) where the agent observes a noisy version of the true context through a noise channel with unknown channel parameters. Our objective is to design an action policy that can “approximate” that of a Bayesian oracle that has access to the reward model and the noise channel parameter. We introduce a modified Thompson sampling algorithm and analyze its Bayesian cumulative regret with respect to the oracle action policy via information-theoretic tools. For Gaussian bandits with Gaussian context noise, our information-theoretic analysis shows that under certain conditions on the prior variance, the Bayesian cumulative regret scales as

$\tilde{O} (m \sqrt{T})$ , where m is the dimension of the feature vector and T is the time horizon. We also consider the problem setting where the agent observes the true context with some delay after receiving the reward, and show that delayed true contexts lead to lower regret. Finally, we empirically demonstrate the performance of the proposed algorithms against baselines. Full article

(This article belongs to the Special Issue Information Theoretic Learning with Its Applications)

► Show Figures

Figure 1

14 pages, 4717 KiB

Open AccessEditor’s ChoiceArticle

Exploring Multi-Armed Bandit (MAB) as an AI Tool for Optimising GMA-WAAM Path Planning

by Rafael Pereira Ferreira, Emil Schubert and Américo Scotti

J. Manuf. Mater. Process. 2024, 8(3), 99; https://doi.org/10.3390/jmmp8030099 - 15 May 2024

Viewed by 1570

Abstract

Conventional path-planning strategies for GMA-WAAM may encounter challenges related to geometrical features when printing complex-shaped builds. One alternative to mitigate geometry-related flaws is to use algorithms that optimise trajectory choices—for instance, using heuristics to find the most efficient trajectory. The algorithm can assess [...] Read more.

Conventional path-planning strategies for GMA-WAAM may encounter challenges related to geometrical features when printing complex-shaped builds. One alternative to mitigate geometry-related flaws is to use algorithms that optimise trajectory choices—for instance, using heuristics to find the most efficient trajectory. The algorithm can assess several trajectory strategies, such as contour, zigzag, raster, and even space-filling, to search for the best strategy according to the case. However, handling complex geometries by this means poses computational efficiency concerns. This research aimed to explore the potential of machine learning techniques as a solution to increase the computational efficiency of such algorithms. First, reinforcement learning (RL) concepts are introduced and compared with supervised machining learning concepts. The Multi-Armed Bandit (MAB) problem is explained and justified as a choice within the RL techniques. As a case study, a space-filling strategy was chosen to have this machining learning optimisation artifice in its algorithm for GMA-AM printing. Computational and experimental validations were conducted, demonstrating that adding MAB in the algorithm helped to achieve shorter trajectories, using fewer iterations than the original algorithm, potentially reducing printing time. These findings position the RL techniques, particularly MAB, as a promising machining learning solution to address setbacks in the space-filling strategy applied. Full article

(This article belongs to the Special Issue Advances in Directed Energy Deposition Additive Manufacturing)

► Show Figures

Figure 1

21 pages, 2816 KiB

Open AccessArticle

Reinforcement Learning-Based Resource Allocation and Energy Efficiency Optimization for a Space–Air–Ground-Integrated Network

by Zhiyu Chen, Hongxi Zhou, Siyuan Du, Jiayan Liu, Luyang Zhang and Qi Liu

Electronics 2024, 13(9), 1792; https://doi.org/10.3390/electronics13091792 - 6 May 2024

Cited by 1 | Viewed by 1481

Abstract

With the construction and development of the smart grid, the power business puts higher requirements on the communication capability of the network. In order to improve the energy efficiency of the space–air–ground-integrated power three-dimensional fusion communication network, we establish an optimization problem for [...] Read more.

With the construction and development of the smart grid, the power business puts higher requirements on the communication capability of the network. In order to improve the energy efficiency of the space–air–ground-integrated power three-dimensional fusion communication network, we establish an optimization problem for joint air platform (AP) flight path selection, ground power facility (GPF) association, and power control. In solving the problem, we decompose the problem into two subproblems, one is the AP flight path selection subproblem and the other is the GPF association and power control subproblem. Firstly, based on the GPF distribution and throughput weights, we model the AP flight path selection subproblem as a Markov Decision Process (MDP) and propose a multi-agent iterative optimization algorithm based on the comprehensive judgment of GPF positions and workload. Secondly, we model the GPF association and power control subproblem as a multi-agent, time-varying K-armed bandit model and propose an algorithm based on multi-agent Temporal Difference (TD) learning. Then, by alternately iterating between the two subproblems, we propose a reinforcement learning (RL)-based joint optimization algorithm. Finally, the simulation results indicate that compared to the three baseline algorithms (random path, average transmit power, and random device association), the proposed algorithm improves an overall energy efficiency of the system of 16.23%, 86.29%, and 5.11% under various conditions (including different noise power levels, GPF bandwidth, and GPF quantities), respectively. Full article

(This article belongs to the Special Issue 5G and 6G Wireless Systems: Challenges, Insights, and Opportunities)

► Show Figures

Figure 1

18 pages, 1434 KiB

Open AccessArticle

Dynamic Grouping within Minimax Optimal Strategy for Stochastic Multi-ArmedBandits in Reinforcement Learning Recommendation

by Jiamei Feng, Junlong Zhu, Xuhui Zhao and Zhihang Ji

Appl. Sci. 2024, 14(8), 3441; https://doi.org/10.3390/app14083441 - 18 Apr 2024

Viewed by 967

Abstract

The multi-armed bandit (MAB) problem is a typical problem of exploration and exploitation. As a classical MAB problem, the stochastic multi-armed bandit (SMAB) is the basis of reinforcement learning recommendation. However, most existing SMAB and MAB algorithms have two limitations: (1) they do [...] Read more.

The multi-armed bandit (MAB) problem is a typical problem of exploration and exploitation. As a classical MAB problem, the stochastic multi-armed bandit (SMAB) is the basis of reinforcement learning recommendation. However, most existing SMAB and MAB algorithms have two limitations: (1) they do not make full use of feedback from the environment or agent, such as the number of arms and rewards contained in user feedback; (2) they overlook the utilization of different action selections, which can affect the exploration and exploitation of the algorithm. These limitations motivate us to propose a novel dynamic grouping within the minimax optimal strategy in the stochastic case (DG-MOSS) algorithm for reinforcement learning recommendation for small and medium-sized data scenarios. DG-MOSS does not require additional contextual data and can be used for recommendation of various types of data. Specifically, we designed a new exploration calculation method based on dynamic grouping which uses the feedback information automatically in the selection process and adopts different action selections. During the thorough training of the algorithm, we designed an adaptive episode length to effectively improve the training efficiency. We also analyzed and proved the upper bound of DG-MOSS’s regret. Our experimental results for different scales, densities, and field datasets show that DG-MOSS can yield greater rewards than nine baselines with sufficiently trained recommendation and demonstrate that it has better robustness. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

19 pages, 773 KiB

Open AccessArticle

Distributed Data-Driven Learning-Based Optimal Dynamic Resource Allocation for Multi-RIS-Assisted Multi-User Ad-Hoc Network

by Yuzhu Zhang and Hao Xu

Algorithms 2024, 17(1), 45; https://doi.org/10.3390/a17010045 - 19 Jan 2024

Cited by 2 | Viewed by 2401

Abstract

This study investigates the problem of decentralized dynamic resource allocation optimization for ad-hoc network communication with the support of reconfigurable intelligent surfaces (RIS), leveraging a reinforcement learning framework. In the present context of cellular networks, device-to-device (D2D) communication stands out as a promising [...] Read more.

This study investigates the problem of decentralized dynamic resource allocation optimization for ad-hoc network communication with the support of reconfigurable intelligent surfaces (RIS), leveraging a reinforcement learning framework. In the present context of cellular networks, device-to-device (D2D) communication stands out as a promising technique to enhance the spectrum efficiency. Simultaneously, RIS have gained considerable attention due to their ability to enhance the quality of dynamic wireless networks by maximizing the spectrum efficiency without increasing the power consumption. However, prevalent centralized D2D transmission schemes require global information, leading to a significant signaling overhead. Conversely, existing distributed schemes, while avoiding the need for global information, often demand frequent information exchange among D2D users, falling short of achieving global optimization. This paper introduces a framework comprising an outer loop and inner loop. In the outer loop, decentralized dynamic resource allocation optimization has been developed for self-organizing network communication aided by RIS. This is accomplished through the application of a multi-player multi-armed bandit approach, completing strategies for RIS and resource block selection. Notably, these strategies operate without requiring signal interaction during execution. Meanwhile, in the inner loop, the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm has been adopted for cooperative learning with neural networks (NNs) to obtain optimal transmit power control and RIS phase shift control for multiple users, with a specified RIS and resource block selection policy from the outer loop. Through the utilization of optimization theory, distributed optimal resource allocation can be attained as the outer and inner reinforcement learning algorithms converge over time. Finally, a series of numerical simulations are presented to validate and illustrate the effectiveness of the proposed scheme. Full article

(This article belongs to the Collection Parallel and Distributed Computing: Algorithms and Applications)

► Show Figures

Figure 1

15 pages, 4801 KiB

Open AccessArticle

An Intelligent Control and a Model Predictive Control for a Single Landing Gear Equipped with a Magnetorheological Damper

by Quang-Ngoc Le, Hyeong-Mo Park, Yeongjin Kim, Huy-Hoang Pham, Jai-Hyuk Hwang and Quoc-Viet Luong

Aerospace 2023, 10(11), 951; https://doi.org/10.3390/aerospace10110951 - 11 Nov 2023

Cited by 3 | Viewed by 1783

Abstract

Aircraft landing gear equipped with a magnetorheological (MR) damper is a semi-active system that contains nonlinear behavior, disturbances, uncertainties, and delay times that can have a huge impact on the landing’s performance. To solve this problem, this paper adopts two types of controllers, [...] Read more.

Aircraft landing gear equipped with a magnetorheological (MR) damper is a semi-active system that contains nonlinear behavior, disturbances, uncertainties, and delay times that can have a huge impact on the landing’s performance. To solve this problem, this paper adopts two types of controllers, which are an intelligent controller and a model predictive controller, for a landing gear equipped with an MR damper to improve the landing gear performance considering response time in different landing cases. A model predictive controller is built based on the mathematical model of the landing gear system. An intelligent controller based on a neural network is designed and trained using a greedy bandit algorithm to improve the shock absorber efficiency at different aircraft masses and sink speeds. In this MR damper, the response time is assumed to be constant at 20 ms, which is similar to the response time of the commercial MR damper. To verify the efficiency of the proposed controllers, numerical simulations compared with a passive damper and a skyhook controller in different landing cases are executed. The major finding indicates that the suggested controller performs better in various landing scenarios than other controllers in terms of shock absorber effectiveness and adaptability. Full article

(This article belongs to the Special Issue Electro-Mechanical Actuators for Safety-Critical Aerospace Applications (2nd Edition))

► Show Figures

Figure 1

31 pages, 1146 KiB

Open AccessArticle

DELOFF: Decentralized Learning-Based Task Offloading for Multi-UAVs in U2X-Assisted Heterogeneous Networks

by Anqi Zhu, Huimin Lu, Mingfang Ma, Zongtan Zhou and Zhiwen Zeng

Drones 2023, 7(11), 656; https://doi.org/10.3390/drones7110656 - 1 Nov 2023

Cited by 7 | Viewed by 2365

Abstract

With multi-sensors embedded, flexible unmanned aerial vehicles (UAVs) can collect sensory data and provide various services for all walks of life. However, limited computing capability and battery energy put a great burden on UAVs to handle emerging compute-intensive applications, necessitating them to resort [...] Read more.

With multi-sensors embedded, flexible unmanned aerial vehicles (UAVs) can collect sensory data and provide various services for all walks of life. However, limited computing capability and battery energy put a great burden on UAVs to handle emerging compute-intensive applications, necessitating them to resort to innovative computation offloading technique to guarantee quality of service. Existing research mainly focuses on solving the offloading problem under known global information, or applying centralized offloading frameworks when facing dynamic environments. Yet, the maneuverability of today’s UAVs, their large-scale clustering, and their increasing operation in the environment with unrevealed information pose huge challenges to previous work. In this paper, in order to enhance the long-term offloading performance and scalability for multi-UAVs, we develop a decentralized offloading scheme named DELOFF with the support of mobile edge computing (MEC). DELOFF considers the information uncertainty caused by the dynamic environment, uses UAV-to-everything (U2X)-assisted heterogeneous networks to extend network resources and offloading flexibility, and tackles the joint strategy making related to computation mode, network selection, and offloading allocation for multi-UAVs. Specifically, the optimization problem of multi-UAVs is addressed by the proposed offloading algorithm based on a multi-arm bandit learning model, where each UAV itself can adaptively assess the offloading link quality through the fuzzy logic-based pre-screening mechanism designed. The convergence and effectiveness of the DELOFF proposed are also demonstrated in simulations. And, the results confirm that DELOFF is superior to the four benchmarks in many respects, such as reduced consumed energy and delay in the task completion of UAVs. Full article

(This article belongs to the Special Issue Edge Computing and IoT Technologies for Drones)

► Show Figures

Figure 1

28 pages, 686 KiB

Open AccessArticle

Age of Information Cost Minimization with No Buffers, Random Arrivals and Unreliable Channels: A PCL-Indexability Analysis

by José Niño-Mora

Mathematics 2023, 11(20), 4394; https://doi.org/10.3390/math11204394 - 23 Oct 2023

Cited by 1 | Viewed by 1145

Abstract

Over the last decade, the Age of Information has emerged as a key concept and metric for applications where the freshness of sensor-provided data is critical. Limited transmission capacity has motivated research on the design of tractable policies for scheduling information updates to [...] Read more.

Over the last decade, the Age of Information has emerged as a key concept and metric for applications where the freshness of sensor-provided data is critical. Limited transmission capacity has motivated research on the design of tractable policies for scheduling information updates to minimize Age of Information cost based on Markov decision models, in particular on the restless multi-armed bandit problem (RMABP). This allows the use of Whittle’s popular index policy, which is often nearly optimal, provided indexability (index existence) is proven, which has been recently accomplished in some models. We aim to extend the application scope of Whittle’s index policy in a broader AoI scheduling model. We address a model with no buffers incorporating random packet arrivals, unreliable channels, and nondecreasing AoI costs. We use sufficient indexability conditions based on partial conservation laws previously introduced by the author to establish the model’s indexability and evaluate its Whittle index in closed form under discounted and average cost criteria. We further use the index formulae to draw insights on how scheduling priority depends on model parameters. Full article

(This article belongs to the Section D1: Probability and Statistics)

► Show Figures

Figure 1

15 pages, 332 KiB

Open AccessArticle

Spectrum Allocation and User Scheduling Based on Combinatorial Multi-Armed Bandit for 5G Massive MIMO

by Jian Dou, Xuan Liu, Shuang Qie, Jiayi Li and Chaoliang Wang

Sensors 2023, 23(17), 7512; https://doi.org/10.3390/s23177512 - 29 Aug 2023

Viewed by 1070

Abstract

As a key 5G technology, massive multiple-input multiple-output (MIMO) can effectively improve system capacity and reduce latency. This paper proposes a user scheduling and spectrum allocation method based on combinatorial multi-armed bandit (CMAB) for a massive MIMO system. Compared with traditional methods, the [...] Read more.

As a key 5G technology, massive multiple-input multiple-output (MIMO) can effectively improve system capacity and reduce latency. This paper proposes a user scheduling and spectrum allocation method based on combinatorial multi-armed bandit (CMAB) for a massive MIMO system. Compared with traditional methods, the proposed CMAB-based method can avoid channel estimation for all users, significantly reduce pilot overhead, and improve spectral efficiency. Specifically, the proposed method is a two-stage method; in the first stage, we transform the user scheduling problem into a CMAB problem, with each user being referred to as a base arm and the energy of the channel being considered a reward. A linear upper confidence bound (UCB) arm selection algorithm is proposed. It is proved that the proposed user scheduling algorithm experiences logarithmic regret over time. In the second stage, by grouping the statistical channel state information (CSI), such that the statistical CSI of the users in the angular domain in different groups is approximately orthogonal, we are able to select one user in each group and allocate a subcarrier to the selected users, so that the channels of users on each subcarrier are approximately orthogonal, which can reduce the inter-user interference and improve the spectral efficiency. The simulation results validate that the proposed method has a high spectral efficiency. Full article

(This article belongs to the Special Issue Dynamic Spectrum Sharing for Future Wireless Systems)

► Show Figures

Figure 1

18 pages, 1298 KiB

Open AccessArticle

Budgeted Bandits for Power Allocation and Trajectory Planning in UAV-NOMA Aided Networks

by Ramez Hosny, Sherief Hashima, Ehab Mahmoud Mohamed, Rokaia M. Zaki and Basem M. ElHalawany

Drones 2023, 7(8), 518; https://doi.org/10.3390/drones7080518 - 7 Aug 2023

Cited by 4 | Viewed by 1599

Abstract

On one hand combining Unmanned Aerial Vehicles (UAVs) and Non-Orthogonal Multiple Access (NOMA) is a remarkable direction to sustain the exponentially growing traffic requirements of the forthcoming Sixth Generation (6G) networks. In this paper, we investigate effective Power Allocation (PA) and Trajectory Planning [...] Read more.

On one hand combining Unmanned Aerial Vehicles (UAVs) and Non-Orthogonal Multiple Access (NOMA) is a remarkable direction to sustain the exponentially growing traffic requirements of the forthcoming Sixth Generation (6G) networks. In this paper, we investigate effective Power Allocation (PA) and Trajectory Planning Algorithm (TPA) for UAV-aided NOMA systems to assist multiple survivors in a post-disaster scenario, where ground stations are malfunctioned. Here, the UAV maneuvers to collect data from survivors, which are grouped in multiple clusters within the disaster area, to satisfy their traffic demands. On the other hand, while the problem is formulated as Budgeted Multi-Armed Bandits (BMABs) that optimize the UAV trajectory and minimize battery consumption, challenges may arise in real-world scenarios. Herein, the UAV is the bandit player, the disaster area clusters are the bandit arms, the sum rate of each cluster is the payoff, and the UAV energy consumption is the budget. Hence, to tackle these challenges, two Upper Confidence Bound (UCB) BMAB schemes are leveraged to handle this issue, namely BUCB1 and BUCB2. Simulation results confirm the superior performance of the proposed BMAB solution against benchmark solutions for UAV-aided NOMA communication. Notably, the BMAB-NOMA solution exhibits remarkable improvements, achieving 60% enhancement in the total number of assisted survivors, 80% improvement in convergence speed, and a considerable amount of energy saving compared to UAV-OMA. Full article

(This article belongs to the Special Issue AI-Powered Energy-Efficient UAV Communications)

► Show Figures

Figure 1

Search Results (50)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (50)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI