Adaptive Resource Allocation for Emergency Communications with Unmanned Aerial Vehicle-Assisted Free Space Optical/Radio Frequency Relay System

Lin, Yuanmo; Ai, Yuxun; Xu, Zhiyong; Wang, Jingyuan; Li, Jianhua

doi:10.3390/photonics11080754

Open AccessArticle

Adaptive Resource Allocation for Emergency Communications with Unmanned Aerial Vehicle-Assisted Free Space Optical/Radio Frequency Relay System

by

Yuanmo Lin

^1,2

,

Yuxun Ai

²,

Zhiyong Xu

^1,*,

Jingyuan Wang

¹ and

Jianhua Li

¹

College of Communications Engineering, Army Engineering University of PLA, Nanjing 210000, China

²

College of Mechanical Electrical and Information Engineering, Putian University, Putian 351100, China

^*

Author to whom correspondence should be addressed.

Photonics 2024, 11(8), 754; https://doi.org/10.3390/photonics11080754 (registering DOI)

Submission received: 9 July 2024 / Revised: 5 August 2024 / Accepted: 6 August 2024 / Published: 13 August 2024

(This article belongs to the Special Issue Coherent Transmission Systems in Optical Wireless Communication)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This paper investigates the problem of coordinated resource allocation for multiple unmanned aerial vehicles (UAVs) to address the scarcity of communication resources in disaster-affected areas. UAVs carrying modules of free space optical (FSO) and radio frequency (RF) serve as relay nodes and edge offloading nodes, presenting an FSO/RF dual-hop framework. Considering the varying urgency levels of tasks, we assign task priorities and transform the proposed problem into distributed collaborative optimization problem. Based on the K-means algorithm and the multi-agent deep deterministic policy gradient (MADDPG) algorithm, we propose a UAV-coordinated K-means MADDPG (KMADDPG) to maximize the number of completed tasks while prioritizing high-priority tasks. Simulation results show that KMADDPG is 5% to 10% better than the benchmark DRL methods in convergence performance.

Keywords:

unmanned aerial vehicles; emergency communication; task priority; free space optical/radio frequency; multi-agent DDPG

1. Introduction

Natural disasters often result in severe damage to ground communication infrastructure, such as base stations, leading to significant communication challenges and hindering emergency response efforts. These events create a critical need for efficient and rapid deployment of temporary communication networks to facilitate rescue operations and mitigate losses [1]. Infrastructures in affected areas may be destroyed, causing traditional communication networks to malfunction. A primary issue caused by natural disasters is resource scarcity, which also applies to communication networks. The efficient utilization of network resources is top priority. To prevent the losses from network collapse, researchers are focusing on rapidly establishing temporary networks [2,3,4].

Due to their flexibility, low cost, and independence of ground conditions, UAVs are widely used in smart city applications such as urban sensing and delivery [5,6]. UAV-assisted communication networks do not rely on ground infrastructure and can be quickly established as temporary networks using UAVs and FSO communication. FSO communication offers high bandwidth, high speed, and the advantage of no spectrum licensing by transmitting data through the air using visible or infrared light [7]. Its immunity to electromagnetic interference and highly focused beam transmission ensures communication security [8]. The rapid deployment capability as well as low installation and maintenance costs of FSO systems make them perform excellently in emergency communication recovery and complex geographical environments, particularly for urban network interconnection, temporary communication facilities, military communication, and post-disaster emergency communication. Therefore, under bandwidth constraints, adopting FSO technology significantly improves the resource utilization of communication systems [9]. It gradually becomes indispensable in network construction.

However, under adverse weather conditions, the FSO backhaul will be severely affected, reducing the reliability of FSO links [10]. In contrast, RF technology can be used in various environments, unaffected by weather, and is suitable for both indoor and outdoor communication scenarios [11]. To handle rare foggy conditions, it is imperative to quickly build emergency temporary networks instead of constructing expensive permanent parallel RF links [12]. Thus, UAVs can be employed to serve some users, reducing the load on GBSs. We therefore considered a dual-hop FSO/RF model, constructing two channels: from mobile users (MUs) to UAVs and from UAVs to GBSs.

When UAV-assisted communication networks operate in disaster areas, providing equal service to all user devices is unreasonable [13]. When users make communication requests and emergency rescue requests, their urgency levels are different. A more reasonable strategy is for UAVs to prioritize higher-urgency tasks, so we need to incorporate priority levels to distinguish different tasks [14].

To further address the limited terminal computing power and meet MU’s Quality-of-Service (QoS) requirements, we consider using Mobile Edge Computing (MEC) technology in emergency communication. In this mode, MUs can offload computationally intensive tasks to nearby GBSs, reducing processing delays and saving the energy consumption of user devices [15,16]. Specifically, to prevent the base station from being overloaded with tasks, in proposed application scenarios, MUs can offload tasks to UAVs for computation, leveraging UAVs’ mobility to mitigate GBS load [17].

Considering the implementation of UAVs autonomously performing edge offloading and resource allocation, we introduce deep reinforcement learning (DRL) technology. Deep multi-agent reinforcement learning has become a key role in the field of multi-UAV-assisted communications [18,19]. With the increase in the number of UAV and task complexity, traditional control and optimization methods struggle to cope with complex environments and dynamic demands. DRL enables multiple UAVs to act as agents for collaborative learning and decision-making, achieving adaptive and efficient communication network management in dynamic and uncertain environments [20,21].

This study explores the challenges of edge offloading and resource allocation within UAV-supported emergency communication systems. To enhance convergence speed, we incorporate the K-means algorithm into multi-agent deep reinforcement learning. This approach allows each agent to share locally observed user data during the training phase of action neural networks, jointly processing it and moving to a better positiaon. The key contributions of this research are outlined as follows:

(1): A mathematical model for emergency communication scenarios is formulated, where several UAVs were used as offloading nodes for MEC or as access relays, facilitating the connection between MUs and GBSs. Specifically, we described the scenario, RF model, FSO model, and delay model of emergency communication.
(2): We propose a UAV-assisted resource allocation method known as K-means MADDPS (KMADDPG), which aims to maximize the number of successful tasks while prioritizing high-priority ones. The proposed algorithm builds on the MADDPG algorithm, integrating it with the K-means algorithm to handle the high data dimensionality in the above-mentioned scenarios. For different tasks with different urgency levels, we incorporate a priority mechanism for mobile users.
(3): We examine the time complexity of the proposed algorithm and assess its performance in emergency communication settings through simulation studies. The results indicate that the proposed KMADDPG effectively optimizes communication resource allocation in regions affected by disasters with compromised communication infrastructures. Additionally, extensive simulations reveal that KMADDPG surpasses several baseline methods regarding convergence speed and the number of successful tasks.

The structure of the rest of this paper is as follows: Section 2 covers a review of related work. Section 3 describes the system model. In Section 4, we introduce the KMADDPG for resource allocation and edge offloading. Section 5 provides and discusses the simulation results. Finally, conclusions are drawn in Section 6.

2. Related Work

The FSO/RF dual-hop model is a feasible approach to establishing a stable, cost-effective, and rapid heterogeneous network. Known for its extensive transmission range and high bandwidth, the FSO/RF dual-hop model has garnered significant research attention in recent years. For instance, Pang et al. [22] initially proposed an optical IRS-assisted dual-hop hybrid FSO and RF system for cloud radio access networks (C-RANs). They introduced polarization codes in the FSO link to mitigate signal fading and achieve optimal data rates for the RF link. Wang et al. [23] employed gamma–gamma distribution to characterize the atmospheric turbulence effects on the FSO link from UAV to HAP and proposed a novel RIS-assisted UAV secure multi-user FSO/RF system. Lee et al. [24] explored the forwarding of packets between ground terminals and backhaul through multi-HAPS relays in a dual-hop FSO/RF network.

The allocation of scarce communication resources is a critical issue that merits discussion. Che et al. [25] enhanced system-level energy efficiency by jointly designing FSO and RF links and optimizing UAV altitude using power allocation techniques. Qi et al. [26] proposed a UAV-assisted vehicular communication network, utilizing DRL for resource allocation to enhance UAV energy efficiency and ensure QoS. MEC significantly enhances the computational performance of MUs with constrained resources. In the study by Jiang et al. [27], a new distributed DRL framework utilizing multi-agent systems is introduced to reduce latency and energy usage in solving optimization challenges within large-scale MEC systems.

Most studies have explored mechanisms for task prioritization. For instance, Qin et al. [28] addressed the issue of time-varying priorities for reconnaissance tasks, examining task selection and scheduling within UAV-enabled multi-access edge computing for reconnaissance. Liu et al. [29] focused on prioritizing industrial equipment and developed a dynamic priority multi-channel access algorithm using DRL. However, they have yet to apply the integration of prioritization and DRL within the domain of UAV.

Researchers have introduced multi-agent reinforcement learning and integrated it with UAV-assisted networks to enable UAV autonomous decision-making [30]. For instance, a UAV-assisted communication network with edge offloading capabilities was developed, and the MADDPG algorithm was proposed to minimize service delay in internet of vehicle (IoV) task processing [31]. In contrast to previous optimization goals, a similar model was constructed, and the MADDPG algorithm was utilized to optimize UAV trajectory design and offloading strategy in a 3D environment [32]. Lee et al. [33] investigated the use of deep reinforcement learning to maximize communication efficiency in the context of hybrid FSO/RF communication models. Guan et al. [34] studied the cooperative trajectory optimization of multiple UAVs in the hybrid FSO/RF communication model to maximize the achievable rate for mobile users.

The above research has made significant contributions to the field of UAV-assisted communication networks. Studies such as [22,23,24] primarily focus on constructing FSO/RF models. We have incorporated reinforcement learning to train UAVs for autonomous decision-making within the FSO/RF system model. In contrast to [25,26], our focus is on resource allocation for bandwidth and CPU frequency. Our objective, distinct from the aforementioned studies [31,32,33,34], is to maximize both the number of successful tasks and the number of high-priority tasks. In this article, we apply a multi-agent reinforcement learning scheme to achieve resource allocation, offloading control, and task prioritization in MEC, enabling UAVs to learn cooperative behaviors.

3. System Model

3.1. Scenario

This paper proposes an application scenario of a multi-UAV-assisted heterogeneous wireless network. As shown in Figure 1, we define that there are J MUs scattered on the plane, following a Poisson distribution, denoted by a set

J = {1, \dots, J}

. A fleet of UAVs hover above the MUs, serving as aerial base stations for relay or as nodes for edge computing, receiving data transmitted by MUs. The height of the UAVs is denoted by H and the UAVs are represented by a set

I = {1, \dots, I}

. GBSs are located at the edges of the plane, represented by a set

G = {1, \dots, G}

. The main notations are listed in Table 1. This model can be considered as a two-layer model, since we considered the transmission channel between the UAV and the MU is different from the transmission channel to the GBSs. Communication between UAVs and MUs uses traditional RF communication, while communication between UAVs and GBSs uses FSO communication. The positions of MUs and UAVs are represented by

Q_{j}^{M U} = {[x_{j}, y_{j}, 0]}^{T}

and

Q_{i}^{U A V} = {[x_{j}, y_{j}, H]}^{T}

, respectively.

3.2. Air-to-Ground Channel

In the proposed channel model, due to the high complexity of urban environments and dynamic environmental changes, UAVs cannot obtain sufficient information. Therefore, directly applying the ideal free space model is impractical. We introduce a probabilistic model for light-of-sight (LoS) and non-light-of-sight (NLoS), which depends on environmental parameters, elevation angle, and the positions of MUs and UAVs.

The 3D distance between the MU and UAV can be expressed as

d_{i j} = \sqrt{{(x_{i} - x_{j})}^{2} + {(y_{i} - y_{j})}^{2} + H^{2}} .

(1)

Similarly, according to [35], the probability of the channel being LoS can be considered as

p_{i j}^{L o S} (φ_{i j}) = \frac{1}{1 + C exp (- D (φ_{i j} - C))},

(2)

where C and D are coefficients depending on environmental factors such as terrain, atmospheric conditions, lighting conditions, etc. The elevation angle

φ_{i j}

between UAV i and MU j is given by

φ_{i j} = \frac{180^{\circ}}{2 π} arcsin (H / d_{i j}) .

(3)

According to [36], the LoS and NLoS path losses can be defined as

\begin{matrix} Γ_{i j}^{L o S} = η^{L o S} + γ^{L o S} log d_{i j} + G, \end{matrix}

(4)

\begin{matrix} Γ_{i j}^{N L o S} = η^{N L o S} + γ^{N L o S} log d_{i j} + G, \end{matrix}

(5)

where

η^{L o S}

and

η^{N L o S}

present the path losses when the reference distance

d_{i j}

is equal to 1 m.

γ^{L o S}

and

γ^{N L o S}

indicate the path loss exponents of the LoS and NLoS transmissions, respectively; G represents a Gaussian random variable with standard deviation. The average value of the path loss can be determined by

Γ_{i j}^{a v g} = p_{i j}^{L o S} Γ_{i j}^{L o S} + (1 - p_{i j}^{L o S}) Γ_{i j}^{N L o S} .

(6)

Between the UAV and the MU, the Signal-to-Interference-plus-Noise Ratio (SINR) of the RF channel can be expressed as

\begin{matrix} S I N R & = \frac{P_{signal}}{P_{noise} + P_{intf}}, \end{matrix}

(7)

\begin{matrix} = \frac{p_{i}^{R F} 10^{\frac{- Γ_{i j}^{a v g}}{10}}}{σ^{2} + (\sum_{a = j + 1}^{Φ_{i}} p_{a}^{R F} 10^{\frac{- Γ_{i a}^{a v g}}{10}})}, \end{matrix}

(8)

where

P_{signal}

is the transmission power;

P_{intf}

and

P_{noise}

are the co-channel interference and noise, respectively. The latter is modeled as additive white Gaussian noise (AWGN)

σ

. To improve the communication quality, we employ Non-Orthogonal Multiple Access (NOMA) technology to reduce co-channel interference among multiple devices served by the same UAV. The set of tasks received by UAV

Φ_{i}

is defined and sorted in descending order of signal strength. When a UAV processes offloading tasks, it prioritizes tasks with stronger signals, reducing interference for subsequent tasks.

The achievable rate can be defined by the SINR as

θ^{R F} = B_{i j} * {log}_{2} (1 + \frac{p_{i}^{R F} 10^{\frac{- Γ_{i j}^{a v g}}{10}}}{σ^{2} + \sum_{a = j + 1}^{Φ_{i}} p_{a}^{R F} 10^{\frac{- Γ_{i a}^{a v g}}{10}}}),

(9)

where

B_{i j}

is the bandwidth allocated by UAV i to MU j.

3.3. Air-to-Air Channel

We consider that the computing ability of the GBS, which is the endpoint of the FSO channel for task offloading, is sufficiently abundant. Therefore, the computing time for edge offloading tasks at the GBS can be neglected. We only need to focus on optimizing the transmission rate of the FSO channel between the UAV and the GBS. Due to the typical high-altitude operation range of UAVs, atmospheric conditions affecting the FSO channel must be considered. According to [37], we introduce a size distribution of scattering particles based on weather conditions, defined as

ξ = \{\begin{matrix} 1.6, & ν > 50 km, \\ 1.3, & 6 km < ν < 50 km, \\ 0.16 ν + 0.34, & 1 km < ν < 6 km, \\ ν - 0.5, & 0.5 km < ν < 1 km, \\ 0, & ν < 0.5 km, \end{matrix}

(10)

where

ν

is the atmospheric visibility per kilometer. The atmospheric attenuation coefficient can be represented as [38]

ρ = \frac{3.91}{ν} {(\frac{λ}{550 \times 10^{- 9}})}^{- ζ},

(11)

where

λ

is the transmission wavelength. Utilizing the atmospheric attenuation coefficient

ρ

and the distance between the UAV and the GBS

d_{i g}

, the atmospheric transmittance at the laser transmitter wavelength

ϰ_{a t m}

can be expressed as [34]

ϰ_{a t m} = 10^{\frac{- ρ d_{i g}}{10}} .

(12)

The achievable rate of the FSO channel between the UAV and the GBS can be represented as [34]

θ_{i}^{F S O} = \frac{p^{F S O} ϰ_{t} ϰ_{a t m} ζ^{2}}{π {(υ_{t} / 2)}^{2} d_{i g} E_{p} N_{b}},

(13)

where

p^{F S O}

is the transmission power of the UAV on the FSO channel;

ϰ_{t}

is the optical efficiency of the receiver and transmitter.

ζ

represents the free-space beam diameter at the UAV receiver.

υ_{t}

denotes the divergence angle of the transmitter.

E_{p} = h c / λ_{c}

represents the photon energy, where h denotes the constant of Planck, c is the speed of light, and

λ_{c}

is the wavelength.

N_{b}

represents the average receiver sensitivity. Summarizing these considerations, we establish the high-altitude FSO channel model.

3.4. Computing Delay Model

In the proposed system model, computational tasks of user devices can be processed in three ways: local computing, partial offloading to UAVs via RF channels, and further offloading from UAVs to the GBSs via FSO channels. These methods are detailed as follows:

(1): Local Computing: When the computing power of the device of an MU can satisfy the task requirements, the MU can choose local processing. The processing time of a task can be expressed as $T_{i j}^{t o t a l} = T_{i j}^{l o c a l}$ , where $T_{i j}^{l o c a l}$ can be expressed as

$T_{j}^{l o c a l} [l] = \frac{(1 - R_{i j} [l]) D_{j} [l] s}{f_{m_{i}} [l]},$

(14)

where l indicates the time slot, $R_{i j} [l]$ represents the pre-assigned offload ratio of UAVs for the task of MU j. $D_{j}$ is the size of the task submitted by user j. $f_{m_{j}}$ is the CPU frequency of the user’s device. s represents the CPU cycles required per bit.
(2): Offloading to UAVs: When the computing power of the device of an MU cannot meet the task requirements, the MU can choose to offload part of the task to the UAVs for concurrent execution. $T_{i j}^{t o t a l}$ can be represented as

$T_{i j}^{t o t a l} = max (T_{i j}^{l o c a l}, T_{i j}^{R F} + T_{i j}^{c o m p}),$

(15)

where $T_{i j}^{R F}$ and $T_{i j}^{c o m p}$ represent the time of transmission and computing when the user transmits data to UAV i by the RF channel, respectively. $T_{i j}^{R F}$ and $T_{i j}^{c o m p}$ can be expressed as

$T_{i j}^{R F} [l] = \frac{R_{i j} [l] D_{j} [l]}{θ^{R F}},$

(16)

and

$T_{i j}^{c o m p} [l] = \frac{R_{i j} [l] D_{j} [l] s}{f_{u_{i}} [l]} .$

(17)
(3): Offloading to BS via UAV: When the computing power of both the user’s device and the UAV cannot meet the task requirements, the UAV will forward the received task data to the GBS via the FSO channel. $T_{i j}^{t o t a l}$ can be represented as

$T_{j}^{t o t a l} = max (T_{j}^{l o c a l}, T_{i j}^{R F} + T_{i g}^{F S O}),$

(18)

where $T_{i g}^{F S O}$ is the transmission time from UAV i to the GBS g via the FSO channel, $T_{i g}^{F S O}$ can be expressed as

$T_{i g}^{F S O} [l] = \frac{R_{i j} [l] D_{j} [l]}{θ^{F S O}} .$

(19)

3.5. Problem Formulation

In scenarios with limited system resources and environmental obstructions, our objective is to maximize the completion rate of tasks, with a particular emphasis on prioritizing high importance tasks. We introduce a binary parameter

λ_{i j}

to represent the matching relationship between user j and UAV i. If

λ_{i j} = 1

, this indicates that user J is served by UAV i, and vice versa. We can formalize the system optimization problem as

\begin{matrix} P 1 : max_{x_{i} [l], y_{i} [l], z_{i} [l]} \sum_{j}^{J} I (T_{i j}^{t o t a l} \leq t_{j}^{d e l a y}) \cdot p r_{j}, \\ s . t . \{\begin{matrix} C 1 : \sum_{i} λ_{i j} \leq 1, \forall i \in I, λ \in [0, 1] \\ C 2 : S I N R - δ \leq 0, \\ C 3 : \sum_{i} λ_{i j} B_{i j} < B, \forall i \in I, \forall j \in J, \\ C 4 : \sum_{i} λ_{i j} f_{i j} < F, \forall i \in I, \forall j \in J, \\ C 5 : 0 \leq R_{i j} \leq 1, \forall i \in I, \forall j \in J, \\ C 6 : x_{i} [l] \in [X_{m i n}, X_{m a x}], \forall i \in I, \\ C 7 : y_{i} [l] \in [Y_{m i n}, Y_{m a x}], \forall i \in I, \\ C 8 : z_{i} [l] \in [Z_{m i n}, Z_{m a x}], \forall i \in I . \end{matrix} \end{matrix}

(20)

In this context, P1 presents the goal of formulating the proposed system model problem

p r_{j}

, indicating the priority weights of tasks.

I (T_{i j}^{t o t a l} \leq t_{j}^{d e l a y})

is an indicator function, and we have

I (T_{i j}^{t o t a l} \leq t_{j}^{d e l a y}) = 1

if

T_{i j}^{t o t a l} \leq t_{j}^{d e l a y}

, otherwise,

I (T_{i j}^{t o t a l} \leq t_{j}^{d e l a y}) = 0

. C1 represents that for user device j, no more than one UAV can serve it. C2 specifies that when users transmit data through the channel, if the SINR is below a certain threshold, it will be deemed a failure, where

δ

is the minimum SINR threshold. C3 and C4 indicate that the bandwidth and CPU frequency allocated by UAV i to user device j cannot exceed the total resource of the UAV. C5 states that the offloading ratio cannot be negative and cannot exceed 1. C6–8 specify that the coordinates of the UAV cannot exceed the scope of the application scenario.

4. KMADDPG-Based UAV Resource Allocation

4.1. Overview

In multi-agent scenarios, traditional reinforcement learning algorithms such as Deep Q-Network (DQN) face significant challenges. The primary issue is that during training, each agent treats other agents as part of the environment. As other agents change, the environment becomes unstable, violating the Markov assumption essential for Q-learning algorithms. Consequently, DQN struggles to converge in multi-agent environments.

Conversely, the MADDPG algorithm employs centralized training and decentralized execution strategies. This approach allows each agent to optimize its actions by considering the strategies of other agents, leading to more stable convergence and significantly enhancing the overall performance of the multi-agent system. Thus, MADDPG exhibits greater adaptability and efficiency in multi-agent contexts. Given the limited system resources, UAVs functioning as edge offloading nodes must carefully allocate resources to users.

4.2. Markov Decision Analysis

The proposed multi-UAV environment is typically considered a discrete partially observable Markov decision process (POMDP), as there are multiple interacting agents, each with its local observations and decision space, and the agents’ partial observations of the environment may be incomplete. The future state of a UAV depends only on the current state and the actions of other UAVs, not on past states. They are regarded as distributed agents capable of making resource allocation and task offloading decisions. The decisions made by UAVs are decentralized, but their actions affect other agents, necessitating coordination with other UAVs to achieve the objectives. The observations, states, actions, and rewards of distributed UAVs at time t are defined as

o (t), s (t), a (t)

and

r (t)

:

(1): State $s (t)$ : In the proposed system model, the information obtained by agents through observations at time slot t is defined as $s (t)$ . Due to each agent having its local observation and local decision space, $s (t)$ is typically incomplete. It can be composed of ${s_{i} (t), s_{m_{0}} (t), \dots, s_{m_{J}} (t), s_{g} (t)}$ , where $s_{g} (t)$ represents the state information of the GBS. $s_{i} (t)$ denotes the state information of UAV i at time slot t, which can be expressed as

$s_{i} (t) = {x_{i}, y_{i}, z_{i}, B_{i}, f_{i}},$

(21)

where $B_{i}$ and $f_{i}$ represent the bandwidth and CPU computing power held by the UAV, respectively. $s_{m_{J}} (t)$ represents the state information of user equipment, and its components are as

$s_{m_{j}} (t) = {x_{j}, y_{j}, t_{j}^{d e l a y}, f_{j}, D_{j}, p r_{j}},$

(22)

where $t_{j}^{d e l a y}$ represents the delay requirement for the task delivered by user j; $p r_{j}$ indicates the priority weight of the current task.
(2): Observation $o (t)$ : At time slot t, all agents’ local observation information $s (t)$ can be combined to form a global observation $o (t)$ . It can be expressed as

$o (t) = {s_{0} (t), \dots, s_{I} (t), s_{m_{0}} (t), \dots, s_{m_{J}} (t), s_{g} (t)} .$

(23)
(3): Action Space $a (t)$ : The action space of KMADDPG mainly consists of three parts: decisions of task matching, edge offloading, and resource allocation. The components of $a (t)$ are as

$a_{i} (t) = {a_{i}^{m a t} (t), a_{i}^{l o a d} (t), a_{i}^{r e s} (t)},$

(24)

where $a_{i}^{m a t} (t)$ denotes the decision of UAV matching with users, and $a_{i}^{m a t} (t) = {λ_{i 0}, \dots, λ_{i J}}$ . $a_{i}^{o f f l o a d} (t)$ denotes the decision of task offloading rate, and $a_{i}^{r e s} (t)$ indicates the actions of resource allocation.
(4): Reward Function $r (t)$ : $r (t)$ is the sum of rewards completed by all UAVs. In the proposed scenario, the priority weights of user requests differ; it should focus on handling high-priority urgent requests. Therefore, the reward function is related to priority weights and completion status, which can be expressed as

$R (t) = \sum_{i}^{I} {γ_{i j} p r_{j} ω + (1 - γ_{i j}) ξ},$

(25)

where $γ_{i j}$ is a binary coefficient indicating the completion status of the task. If the UAV can satisfy the user’s delay requirement, then $γ_{i j}$ = 1 and the task is considered completed. The task reward can be expressed as $R (t) = γ_{i j} \times p r_{j} \times ω$ . Otherwise, $γ_{i j}$ = 0 and the task reward can be expressed as $R (t) = ξ$ , where $ω$ is the reward coefficient for task completion, and $ξ$ is the penalty coefficient for task failure.

4.3. MADDPG-Based Resource Allocation and Task Offloading

In the proposed multi-agent algorithm, we adopt a centralized training and decentralized execution (CTDE) strategy. This approach maximizes rewards while minimizing interactions between agents to enhance convergence speed. As show in Figure 2, we propose to use reinforcement learning combined with the K-means algorithm to optimize the initial positions of the agents, to achieve better optimization results.

(1): Joint K-Means Algorithm: We propose a joint K-means algorithm for initializing the positions of UAVs. According to the path loss formula, path loss is positively correlated with the distance between UAVs and MUs. During centralized training, the system obtains the positions of all agents. We define sets ${\hat{Q}}^{U A V}$ and ${\hat{Q}}^{M U}$ to represent the positions of UAVs and users, respectively. Here, $\hat{Q}$ is different from the Q mentioned earlier; it only represents the 2D coordinates of UAVs and users. The purpose of the K-means algorithm is to set I cluster centers ${C_{0}, \dots, C_{I}}$ , assign J user devices to the nearest cluster centers $C_{i}$ , and finally, the positions of the cluster centers will be the target positions for the UAVs. The optimization objective function of the K-means algorithm can be expressed as

$arg min_{C} \sum_{i = 1}^{I} \sum_{j \in C_{i}} {∥ {\hat{Q}}_{j}^{m u} - {\hat{Q}}_{i}^{U A V} ∥}^{2} .$

(26)

The updating process of K-means can be expressed as

${\hat{Q}}_{i}^{U A V} (t + 1) = \frac{1}{| C_{j} (t) |} \sum_{i \in C_{I} (t)} {\hat{Q}}_{j}^{m u} .$

(27)
(2): Deep Reinforcement Learning Algorithm: To avoid the additional load and delay caused by information exchange between the central controller and UAVs, this paper proposes a distributed joint resource management algorithm based on MADDPG. We consider a centralized training decentralized execution DRL mechanism. In the CTDE strategy, the collaborative operation of agents can be viewed as an optimization problem that maximizes expected rewards. This optimization problem can be expressed as

$J = arg max_{μ} E [γ^{t - 1} r (t)],$

(28)

where $μ = {μ_{0}, \dots, μ_{I}}$ represents the policy network of the agent. $γ^{t - 1}$ denotes the discounted return at time slot t. We set the parameters of the policy network as $θ = {θ_{0}, \dots, θ_{I}}$ , then the expected return gradient of the UAV can be written as

$\nabla_{θ_{i}} J (θ_{i}) = E_{s, a \sim D} [\nabla_{θ_{i}} log μ_{i} (a_{i} ∣ o_{i}) Q_{i}^{μ} (*) ∣ a_{i} = μ_{i} (o_{i})],$

(29)

where D is the experience replay buffer that stores experience tuples $\{S, a, r, S^{'}\}$ . $Q_{i}^{μ} (*)$ is the value function. The loss function of the Critic network for UAV i can be expressed as

$L (θ_{i}) = E_{S, a, r, S} [{(Q_{i}^{μ} (S, a_{1}, \dots, a_{I}) - \hat{y})}^{2}],$

(30)

where $\hat{y}$ is the TD error, which can be expressed as

$\hat{y} = r_{i} + {γ Q_{i}^{μ^{-}} (S, \dot{a_{1}}, \dots, a_{I})|}_{a_{i} = μ^{-} (o_{i}^{-})},$

(31)

where $Q_{i}^{μ^{-}}$ and $μ_{i}^{-} (*)$ are the target Critic network and target Actor network, respectively. The specific neural network update process is shown in Algorithm 1.
(3): Complexity Analysis: The complexity analysis of the KMADDPG algorithm can be divided into two main components: the K-means algorithm and the neural network. For the K-means algorithm, each iteration requires the position coordinates of each UAV and MU, resulting in a time complexity of $O (I \times J)$ . The number of iteration is denoted as $N_{k}$ . After iterating, the total time complexity for the K-means algorithm is given by

$O (2 \times N_{k} \times I \times J) .$

(32)

Since the dimensionality of the 2D coordinates and the number of iterations are generally constants, this complexity simplifies to $O (I \times J)$ . For the neural network part, the efficiency of the proposed algorithm is further evaluated through its time complexity. Both offline training and online execution involve mapping states to actions using deep neural networks. The computational complexity of the deep neural network, denoted as $O (N N)$ , is expressed as follows [39]:

$O (N N) = O (\sum_{n}^{N^{l a y e r}} l_{n} l_{n + 1}),$

(33)

where $N^{l a y e r}$ is the number of layers, and $l_{n}$ represents the number of neurons in the n-th layer. The dimensions of input and output layers of the deep neural network depends on the size of state space and action space. For $N_{e p}$ episodes, S steps, and $N_{e x p}$ experiences, the algorithm’s time complexity is

$O (\sum_{n}^{N^{l a y e r}} l_{n} l_{n + 1} \times N^{e x p} \times N_{e p}^{S}) .$

(34)

Algorithm 1 KMADDPG Algorithm
Initialize: Max_episode, Max_ep_len;
1:	for episode from 1 to Max_episode do
2:	$Initialize state : S; local observations : o_{i} for any agent$
3:	$i \in I; cluster {C_{0}, \dots, C_{I}} centroids randomly;$
4:	for each MU ${\hat{Q}}_{j}^{m u}$ do
5:	Calculate the distance between UAV and MU;
6:	Find the closest centroids ${\hat{C}}_{i} = arg {min}_{i} {\hat{d}}_{i j}$ ;
7:	Assign ${\hat{Q}}_{j}^{m u}$ to closet centroids $C_{i}$ ;
8:	endfor
9:	Update centroids position:
10:	${\hat{Q}}_{i}^{U A V} = \frac{1}{\| C_{j} (t) \|} \sum_{i \in C_{I} (t)} {\hat{Q}}_{j}^{m u}$ ;
11:	for t = 1 to Max_ep_len do
12:	$For each UAV, move to Q^{U A V} from K - Means;$
13:	$UAV select action a_{i};$
14:	$Get the reward r_{i} by environment;$
15:	$Store (S, a, r, S^{'}) in the replay buffer D;$
16:	$S \leftarrow S^{'}$
17:	for agent i = 1 to I do
18:	$Sample randomly a batch of$ N $samples$
19:	$(S_{j}, a_{j}, r_{j}, S_{j}^{'}) sampled from D;$
20:	$Calculate TD error according to (31);$
20:	$Update critic network using minimizing loss L (ϑ_{i});$
22:	$Update actor network through gradient \nabla_{ϑ_{i}} J (μ_{i});$
23:	endfor
24:	$Update the target network parameters for agent i;$
25:	$ω_{n e w}^{i -} \leftarrow τ ω_{n e w}^{i} + (1 - τ) ω_{n o w}^{i -};$
26:	$ϑ_{n e w}^{i -} \leftarrow τ ϑ_{n e w}^{i} + (1 - τ) ϑ_{n o w}^{i -};$
27:	endfor
28:	endfor

5. Experimental Design and Analysis

5.1. Simulation Parameters

In this section, we evaluate the proposed KMADDPG-based UAV resource allocation and edge offloading optimization scheme. Specifically, we consider a disaster area of 200 m × 200 m. We consider a GBS set up at the center of the disaster area, establishing an FSO communication link with the UAVs. Meanwhile, the UAVs serve as relay BS or edge offloading nodes, providing radio communication services or computational resources to MUs. To simplify the system model, we set the penalty coefficient to −1 and the reward coefficient to 1. The reward is the cumulative reward of tasks completed by all UAVs in each time slot. We used 8 MUs, 3 UAVs, and 1 GBS as the experimental subjects. The detailed parameters are listed in Table 2. The algorithms to be evaluated are as follows:

•: MeanResource MADDPG (MeanRes): An algorithm where the neural network selects the matching objects, but the offloading ratio and resource allocation ratio are evenly distributed.
•: NearbyMatch MADDPG (NearbyMatch): An algorithm where the neural network selects the offloading and resource ratios, and the service objects are the nearest users.
•: Base MADDPG (Base): An algorithm where the neural network selects the matching objects, offloading, and resource ratios.
•: Proposed KMADDPG (Proposed): An algorithm that integrates K-means into the Base MADDPG algorithm.

5.2. Convergence Analysis

Figure 3 illustrates the trend of average system rewards obtained by each UAV per episode, and Figure 4 presents the tendency of total success rate obtained by each UAV per episode. This experiment evaluates the convergence of the proposed algorithm in comparison to other algorithms. It can be observed that the rewards for KMADDPG, Base, NearbyMatch, and MeanResource decrease in that order. The MeanRes algorithm exhibits significant fluctuations in performance, performs the worst, and fails to converge. This is due to the algorithm’s inability to customize based on user demand differences, resulting in a significant waste of computational resources.

The NearbyMatch algorithm performs somewhat better, with average rewards and average total success rates of 5.8 and 78% converges when the episode is equal to 8000, as it devises strategies based on user demand. However, the limitation of NearbyMatch algorithm often causes it to fall into local optima. This limitation is that during training, MUs may cluster around a single UAV and offload task to it, leading to an inability of that UAV to satisfy all demands of MUs while the remaining UAVs remain idle.

The Base algorithm demonstrates better performance, with average rewards and average total success rates of 6.8 and 86% converges when the episode is equal to 10,000. The proposed KMADDPG algorithm addresses these shortcomings by integrating the KMeans algorithm to optimize UAV positioning, reducing path loss, and achieving better convergence performance compared to Base MADDPG. Consequently, it achieved the best results, with average rewards and average total success rates of 7.8 and 90% converges when the episode is equal to 3000. The proposed algorithm is capable of converging under minor constraints and has a faster convergence rate than the other algorithm.

5.3. Priority Analysis

Figure 5 and Figure 6 depict the performance of various algorithms in terms of task completion within a single episode and their ability to handle tasks of different priority levels, respectively. Figure 5 shows the number of completed tasks within one episode. The KMADDPG algorithm completed 719 tasks, which is 8%, 12%, and 41% more than those completed by Base, NearbyMatch, and MeanResource, respectively. This indicates that the proposed algorithm exhibits superior task scheduling and processing capabilities. Figure 6 illustrates the performance of different algorithms in handling tasks with varying priorities, where P1, P2, and P3 represent low-, medium-, and high-priority tasks, respectively. It is evident from Figure 6 that all four algorithms perform better on higher-priority tasks, indicating that high-priority tasks receive more resource allocation. The proposed algorithm outperforms the other three algorithms across all three priority levels, with an increase in task completion by 12.5%, 5%, and 6% for low, medium, and high priorities, respectively, compared to Base. This demonstrates that the proposed algorithm not only maximizes the number of successful tasks but also exhibits superior handling capabilities for high-priority tasks.

5.4. Convergence under Different Proportions

Figure 7 and Figure 8 illustrate the trends of the average system reward and total success rate, respectively. In the proposed system model, as the number of UAVs and users increases, resources increase proportionally, making the environment more complex. Particularly, the increase in the number of agents means that each agent’s decision can more frequently affect the states of other agents, making it harder for the results to converge. We compare different combinations of ratios to demonstrate that the proposed algorithm can still perform well even as the number of agents increases, though inevitably at a slower convergence rate. The reason for increasing resources proportionally is to avoid task failures due to insufficient resources. This experiment evaluated the convergence speed of various algorithms under different ratios of UAV and mobile user (MU) combinations. With the increase in the number of UAVs and users, resources increase proportionally, and the environment becomes more complex. To verify that the proposed algorithm can achieve good convergence results in complex situations, we used a UAV ratio and MU of 3:8 (simplified to (3, 8)) as the baseline, adjusting their numbers proportionally. From Figure 7, it is evident that the convergence rewards vary with different UAV combinations: the (2, 4) combination converges around 4, the (6, 16) combination converges around 15, and the (9, 24) combination converges around 22. This variation is due to the different number of users each combination serves. The convergence speed decreases sequentially because, as the number of agents and users increases, the system environment becomes more complex, necessitating more training for the neural network to achieve optimal results. However, as shown in Figure 8, even as the environment complexity increases, the final convergence effect still reaches 90%, demonstrating that the proposed algorithm performs well in more complex environments.

6. Conclusions

To address the challenges faced by multi-UAV rescue systems utilizing a hybrid FSO/RF dual-hop model, we introduce a priority-driven resource allocation strategy. This innovative approach is designed to optimize the completion rate of tasks, with a particular emphasis on ensuring that high-priority tasks are efficiently managed and expedited. This paper transforms the problem into a discrete partially observable Markov decision process and develops a KMADDPG algorithm to improve convergence speed. Simulation results indicate that the KMADDPG algorithm outperforms other DRL algorithms in enhancing system performance. In future work, we will explore resource allocation strategies for UAVs in terms of energy efficiency and transmission power, and incorporate dynamic priority mechanisms.

Author Contributions

Conceptualization, Y.L., Y.A. and Z.X.; Data curation, Y.A., Z.X. and J.W.; Formal analysis, Y.L., Y.A. and J.L.; Funding acquisition, Y.L., J.W. and J.L.; Investigation, Y.L., Y.A., Z.X. and J.W.; Methodology, Y.L., Z.X., J.W. and J.L.; Project administration, Y.L., Z.X., J.W. and J.L.; Resources, Y.A., Z.X. and J.W.; Software, Y.L., Y.A. and Z.X.; Supervision, Z.X. and J.L.; Validation, Y.A.; Visualization, J.W.; Writing—original draft, Y.L. and Y.A.; Writing—review and editing, Y.L., Y.A. and Z.X. All authors have read and agreed to the published version of the manuscript.

Funding

The research is supported by the National Natural Science Foundation of China (62171463, 62271502), Natural Science Foundation of Fujian Province, China (2021J011112), and Science and Technology Bureau of Putian, Fujian Province, China (2022GZ2001ptxy11).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on reasonable request from the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, D.; Cao, Y.; Lam, K.Y.; Hu, Y.; Kaiwartya, O. Authentication and Key Agreement Based On Three Factors and PUF for UAVs-Assisted Post-Disaster Emergency Communication. IEEE Internet Things J. 2024, 11, 20457–20472. [Google Scholar] [CrossRef]
Yao, Z.; Cheng, W.; Zhang, W.; Zhang, H. Resource allocation for 5G-UAV-based emergency wireless communications. IEEE J. Sel. Areas Commun. 2021, 39, 3395–3410. [Google Scholar] [CrossRef]
Wu, J.; Chen, Q.; Jiang, H.; Wang, H.; Xie, Y.; Xu, W.; Zhou, P.; Xu, Z.; Chen, L.; Li, B.; et al. Joint Power and Coverage Control of Massive UAVs in Post-Disaster Emergency Networks: An Aggregative Game-Theoretic Learning Approach. IEEE Trans. Netw. Sci. Eng. 2024, 11, 3782–3799. [Google Scholar] [CrossRef]
Tang, X.; Chen, F.; Wang, F.; Jia, Z. Disaster Resilient Emergency Communication With Intelligent Air-Ground Cooperation. IEEE Internet Things J. 2023, 11, 5331–5346. [Google Scholar] [CrossRef]
Gao, J.; Wang, Q.; Li, Z.; Zhang, X.; Hu, Y.; Han, Q.; Pan, Y. Towards Efficient Urban Emergency Response Using UAVs Riding Crowdsourced Buses. IEEE Internet Things J. 2024, 11, 22439–22455. [Google Scholar] [CrossRef]
Zhou, M.; Chen, H.; Shu, L.; Liu, Y. UAV-assisted sleep scheduling algorithm for energy-efficient data collection in agricultural Internet of Things. IEEE Internet Things J. 2021, 9, 11043–11056. [Google Scholar] [CrossRef]
Bekkali, A.; Fujita, H.; Hattori, M. New generation free-space optical communication systems with advanced optical beam stabilizer. J. Light. Technol. 2022, 40, 1509–1518. [Google Scholar] [CrossRef]
Bekkali, A.; Hattori, M.; Hara, Y.; Suga, Y. Free Space Optical Communication Systems FOR 6G: A Modular Transceiver Design. IEEE Wirel. Commun. 2023, 30, 50–57. [Google Scholar] [CrossRef]
Guo, Z.; Gao, W.; Ye, H.; Wang, G. A location-aware resource optimization for maximizing throughput of emergency outdoor–indoor UAV communication with FSO/RF. Sensors 2023, 23, 2541. [Google Scholar] [CrossRef] [PubMed]
Yahia, O.B.; Erdogan, E.; Kurt, G.K.; Altunbas, I.; Yanikomeroglu, H. A weather-dependent hybrid RF/FSO satellite communication for improved power efficiency. IEEE Wirel. Commun. Lett. 2021, 11, 573–577. [Google Scholar] [CrossRef]
Aboelala, O.; Lee, I.E.; Chung, G.C. A survey of hybrid free space optics (FSO) communication networks to achieve 5G connectivity for backhauling. Entropy 2022, 24, 1573. [Google Scholar] [CrossRef] [PubMed]
Nafees, M.; Huang, S.; Thompson, J.; Safari, M. Backhaul-aware user association and throughput maximization in UAV-aided hybrid FSO/RF network. Drones 2023, 7, 74. [Google Scholar] [CrossRef]
Li, Y.; Zhang, W.; Wang, C.X.; Sun, J.; Liu, Y. Deep reinforcement learning for dynamic spectrum sensing and aggregation in multi-channel wireless networks. IEEE Trans. Cogn. Commun. Netw. 2020, 6, 464–475. [Google Scholar] [CrossRef]
Zhu, G.; Lyu, Z.; Jiao, X.; Liu, P.; Chen, M.; Xu, J.; Cui, S.; Zhang, P. Pushing AI to wireless network edge: An overview on integrated sensing, communication, and computation towards 6G. Sci. China Inf. Sci. 2023, 66, 130301. [Google Scholar] [CrossRef]
Song, F.; Xing, H.; Wang, X.; Luo, S.; Dai, P.; Xiao, Z.; Zhao, B. Evolutionary multi-objective reinforcement learning based trajectory control and task offloading in UAV-assisted mobile edge computing. IEEE Trans. Mob. Comput. 2022, 22, 7387–7405. [Google Scholar] [CrossRef]
Yang, Y.; Song, T.; Yang, J.; Xu, H.; Xing, S. Joint Energy and AoI Optimization in UAV-Assisted MEC-WET Systems. IEEE Sensors J. 2024, 24, 15110–15124. [Google Scholar] [CrossRef]
Guo, S.; Zhao, X. Multi-Agent Deep Reinforcement Learning Based Transmission Latency Minimization for Delay-Sensitive Cognitive Satellite-UAV Networks. IEEE Trans. Commun. 2023, 71, 131–144. [Google Scholar] [CrossRef]
Xiong, Z.; Zhang, Y.; Lim, W.Y.B.; Kang, J.; Niyato, D.; Leung, C.; Miao, C. UAV-assisted wireless energy and data transfer with deep reinforcement learning. IEEE Trans. Cogn. Commun. Netw. 2020, 7, 85–99. [Google Scholar] [CrossRef]
Qin, Z.; Liu, Z.; Han, G.; Lin, C.; Guo, L.; Xie, L. Distributed UAV-BSs Trajectory Optimization for User-Level Fair Communication Service With Multi-Agent Deep Reinforcement Learning. IEEE Trans. Veh. Technol. 2021, 70, 12290–12301. [Google Scholar] [CrossRef]
Westheider, J.; Rückin, J.; Popović, M. Multi-uav adaptive path planning using deep reinforcement learning. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 649–656. [Google Scholar]
Orr, J.; Dutta, A. Multi-agent deep reinforcement learning for multi-robot applications: A survey. Sensors 2023, 23, 3625. [Google Scholar] [CrossRef] [PubMed]
Pang, W.; Wang, P.; Han, M.; Li, S.; Yang, P.; Li, G.; Guo, L. Optical intelligent reflecting surface for mixed dual-hop FSO and beamforming-based RF system in C-RAN. IEEE Trans. Wirel. Commun. 2022, 21, 8489–8506. [Google Scholar] [CrossRef]
Wang, D.; Wu, M.; Wei, Z.; Yu, K.; Min, L.; Mumtaz, S. Uplink secrecy performance of RIS-based RF/FSO three-dimension heterogeneous networks. IEEE Trans. Wirel. Commun. 2023, 23, 1798–1809. [Google Scholar] [CrossRef]
Lee, J.H.; Park, K.H.; Ko, Y.C.; Alouini, M.S. Spectral-efficient network design for high-altitude platform station networks with mixed RF/FSO system. IEEE Trans. Wirel. Commun. 2022, 21, 7072–7087. [Google Scholar] [CrossRef]
Che, Y.L.; Long, W.; Luo, S.; Wu, K.; Zhang, R. Energy-efficient UAV multicasting with simultaneous FSO backhaul and power transfer. IEEE Wirel. Commun. Lett. 2021, 10, 1537–1541. [Google Scholar] [CrossRef]
Qi, W.; Song, Q.; Guo, L.; Jamalipour, A. Energy-efficient resource allocation for UAV-assisted vehicular networks with spectrum sharing. IEEE Trans. Veh. Technol. 2022, 71, 7691–7702. [Google Scholar] [CrossRef]
Jiang, F.; Dong, L.; Wang, K.; Yang, K.; Pan, C. Distributed resource scheduling for large-scale MEC systems: A multiagent ensemble deep reinforcement learning with imitation acceleration. IEEE Internet Things J. 2021, 9, 6597–6610. [Google Scholar] [CrossRef]
Qin, Z.; Wang, H.; Wei, Z.; Qu, Y.; Xiong, F.; Dai, H.; Wu, T. Task selection and scheduling in UAV-enabled MEC for reconnaissance with time-varying priorities. IEEE Internet Things J. 2021, 8, 17290–17307. [Google Scholar] [CrossRef]
Liu, X.; Xu, C.; Yu, H.; Zeng, P. Deep reinforcement learning-based multichannel access for industrial wireless networks with dynamic multiuser priority. IEEE Trans. Ind. Inform. 2021, 18, 7048–7058. [Google Scholar] [CrossRef]
Seid, A.M.; Boateng, G.O.; Anokye, S.; Kwantwi, T.; Sun, G.; Liu, G. Collaborative computation offloading and resource allocation in multi-UAV-assisted IoT networks: A deep reinforcement learning approach. IEEE Internet Things J. 2021, 8, 12203–12218. [Google Scholar] [CrossRef]
Liu, Y.; Lin, P.; Zhang, M.; Zhang, Z.; Yu, F.R. Mobile-Aware Service Offloading for UAV-Assisted IoVs: A Multi-Agent Tiny Distributed Learning Approach. IEEE Internet Things J. 2024, 11, 21191–21201. [Google Scholar] [CrossRef]
He, Y.; Gan, Y.; Cui, H.; Guizani, M. Fairness-based 3D multi-UAV trajectory optimization in multi-UAV-assisted MEC system. IEEE Internet Things J. 2023, 10, 11383–11395. [Google Scholar] [CrossRef]
Lee, J.H.; Park, J.; Bennis, M.; Ko, Y.C. Integrating LEO satellites and multi-UAV reinforcement learning for hybrid FSO/RF non-terrestrial networks. IEEE Trans. Veh. Technol. 2022, 72, 3647–3662. [Google Scholar] [CrossRef]
Guan, Y.; Zou, S.; Peng, H.; Ni, W.; Sun, Y.; Gao, H. Cooperative UAV trajectory design for disaster area emergency communications: A multi-agent PPO method. IEEE Internet Things J. 2023, 11, 8848–8859. [Google Scholar] [CrossRef]
Ali, M.A.; Jamalipour, A. UAV-aided cellular operation by user offloading. IEEE Internet Things J. 2020, 8, 9855–9864. [Google Scholar] [CrossRef]
Liu, C.; Ding, M.; Ma, C.; Li, Q.; Lin, Z.; Liang, Y.C. Performance analysis for practical unmanned aerial vehicle networks with LoS/NLoS transmissions. In Proceedings of the 2018 IEEE International Conference on Communications Workshops (ICC Workshops), Kansas City, MO, USA, 20–24 May 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. [Google Scholar]
Nistazakis, H.E.; Tsiftsis, T.A.; Tombras, G.S. Performance analysis of free-space optical communication systems over atmospheric turbulence channels. IET Commun. 2009, 3, 1402–1409. [Google Scholar] [CrossRef]
Kim, I.I.; Koontz, J.; Hakakha, H.; Adhikari, P.; Stieger, R.; Moursund, C.; Barclay, M.; Stanford, A.; Ruigrok, R.; Schuster, J.J.; et al. Measurement of scintillation and link margin for the TerraLink laser communication system. In Wireless Technologies and Systems: Millimeter-Wave and Optical; SPIE: Bellingham, WA, USA, 1998; Volume 3232, pp. 100–118. [Google Scholar]
Wang, S.; Lv, T. Deep reinforcement learning based dynamic multichannel access in HetNets. In Proceedings of the 2019 IEEE Wireless Communications and Networking Conference (WCNC), Marrakesh, Morocco, 15–18 April 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]

Figure 1. The model of UAVs cooperate to connect GBSs and MUs for emergency communication.

Figure 2. The KMADDPG algorithm in the UAV-assisted emergency communication network.

Figure 3. Convergence performance of reward under different algorithms.

Figure 4. Convergence performance of total success rate under different algorithms.

Figure 5. The distribution of success tasks under different algorithms.

Figure 6. The distribution of successful tasks under different algorithms according to priority.

Figure 7. Convergence performance of reward under different couples.

Figure 8. Convergence performance of total success rate under different couples.

Table 1. List of main notations.

Notation	Definition
$i / I$	The index/set of UAVs
$j / J$	The index/set of MUs
$g / G$	The index/set of GBSs
$d_{i j}$	Distance between UAV i and MU j
$Q_{u_{i}} / Q_{m_{j}}$	The position of UAV i/MU j
$Γ_{i j}^{avg}$	Average path loss between UAV i and MU j
$θ_{i j}^{RF}$	Transmission rate between UAV i and MU j
$θ_{i}^{FSO}$	Backhaul rate between UAV i and its GBS
$d_{i g}$	Distance between UAV i and GBS g
$R_{i j}$	Load rate between UAV i and MU j
$B_{i j}$	Bandwidth allocated to MU j by UAV i
$f_{i j}$	CPU frequency allocated to MU j by UAV i
$T_{i j}^{R F} / T_{i g}^{F S O}$	The RF/FSO transmit time between UAV i and GBS g
$σ$	Additive white Gaussian noise
$p r_{i}$	The weight of priority
S	State space
A	Action space
$r (t)$	Reward
$J (θ_{i})$	Obeject function
$Q^{μ} (s, a)$	Value function

Table 2. List of main simulations parameters.

Parameters	Value
Learning rate of actor	2 $\times 10^{- 5}$
Learning rate of Critic	5 $\times 10^{- 4}$
The number of Episode	15,000
Episode length	10
Channel bandwidth $(B)$	50 $MHz$
Cpu frequency of UAV ( $f_{i j}$ )	$1.5 \times 10^{9}$
Cpu frequency of MU ( $f_{l o c a l}$ )	$10^{9}$
Size of task ( $D_{i j}$ )	0.1 M–1.5 M
Requested delay of task ( $t_{j}^{d e l a y}$ )	12 ms–15 ms
LoS additional path loss $(η^{L o S})$	$1 dB$
NLoS additional path loss for $(η^{N L o S})$	$20 dB$
LoS path loss exponent $(β^{LoS})$	2.09
NLoS path loss exponent $(β^{NLoS})$	3.75
Noise power $(σ)$	−95 $dBm$
Operating altitude of UAV $(h_{t})$	$100 m$
Transmit power of UAV i $(p^{RF})$	30 $dBm$
Transmission power of FSO $(p^{FSO})$	200 $mW$
Efficiencies of optic $(ϰ_{t})$	0.8
Receiver diameter of FSO channel $(ς)$	$0.06 m$
Transmitter divergence of FSO channel ( $υ_{t}$ )	$2.07 \times 10^{- 4}$ rad
Wavelength of FSO channel $(λ)$	$1550 mm$
Sensitivity of receiver $(N_{b})$	100 photons/bit

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, Y.; Ai, Y.; Xu, Z.; Wang, J.; Li, J. Adaptive Resource Allocation for Emergency Communications with Unmanned Aerial Vehicle-Assisted Free Space Optical/Radio Frequency Relay System. Photonics 2024, 11, 754. https://doi.org/10.3390/photonics11080754

AMA Style

Lin Y, Ai Y, Xu Z, Wang J, Li J. Adaptive Resource Allocation for Emergency Communications with Unmanned Aerial Vehicle-Assisted Free Space Optical/Radio Frequency Relay System. Photonics. 2024; 11(8):754. https://doi.org/10.3390/photonics11080754

Chicago/Turabian Style

Lin, Yuanmo, Yuxun Ai, Zhiyong Xu, Jingyuan Wang, and Jianhua Li. 2024. "Adaptive Resource Allocation for Emergency Communications with Unmanned Aerial Vehicle-Assisted Free Space Optical/Radio Frequency Relay System" Photonics 11, no. 8: 754. https://doi.org/10.3390/photonics11080754

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Adaptive Resource Allocation for Emergency Communications with Unmanned Aerial Vehicle-Assisted Free Space Optical/Radio Frequency Relay System

Abstract

1. Introduction

2. Related Work

3. System Model

3.1. Scenario

3.2. Air-to-Ground Channel

3.3. Air-to-Air Channel

3.4. Computing Delay Model

3.5. Problem Formulation

4. KMADDPG-Based UAV Resource Allocation

4.1. Overview

4.2. Markov Decision Analysis

4.3. MADDPG-Based Resource Allocation and Task Offloading

5. Experimental Design and Analysis

5.1. Simulation Parameters

5.2. Convergence Analysis

5.3. Priority Analysis

5.4. Convergence under Different Proportions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI