UAV-Assisted Mobile Edge Computing: Dynamic Trajectory Design and Resource Allocation

Wang, Zhuwei; Zhao, Wenjing; Hu, Pengyu; Zhang, Xige; Liu, Lihan; Fang, Chao; Sun, Yanhua

doi:10.3390/s24123948

Open AccessArticle

UAV-Assisted Mobile Edge Computing: Dynamic Trajectory Design and Resource Allocation

by

Zhuwei Wang

¹

,

Wenjing Zhao

¹,

Pengyu Hu

^2,*,

Xige Zhang

³

,

Lihan Liu

⁴,

Chao Fang

^1,3

and

Yanhua Sun

¹

Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China

²

Department of Smart Agriculture Engineering, Shanghai Vocational College of Agriculture and Forestry, Shanghai 201699, China

³

Beijing Institute of Astronautical Systems Engineering, Beijing 100076, China

⁴

School of Statistics and Data Science, Beijing Wuzi University, Beijing 101149, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(12), 3948; https://doi.org/10.3390/s24123948

Submission received: 5 May 2024 / Revised: 8 June 2024 / Accepted: 11 June 2024 / Published: 18 June 2024

(This article belongs to the Special Issue Remote Sensing-Based Intelligent Communication)

Download

Browse Figures

Versions Notes

Abstract

:

The recent advancements of mobile edge computing (MEC) technologies and unmanned aerial vehicles (UAVs) have provided resilient and flexible computation services for ground users beyond the coverage of terrestrial service. In this paper, we focus on a UAV-assisted MEC system in which the UAV equipped with MEC servers is used to assist user devices in computing their tasks. To minimize the weighted average energy consumption and delay in the UAV-assisted MEC system, a LQR-Lagrange-based DDPG (LLDDPG) algorithm, which jointly optimizes the user task offloading and the UAV trajectory design, is proposed. To be specific, the LLDDPG algorithm consists of three subproblems. The DDPG algorithm is used to address the issue of UAV desired trajectory planning, and subsequently, the LQR-based algorithm is employed to achieve the real-time tracking control of UAV desired trajectory. Finally, the Lagrange duality method is proposed to solve the optimization problem of computational resource allocation. Simulation results indicate that the proposed LLDDPG algorithm can effectively improve the system resource management and realize the real-time UAV trajectory design.

Keywords:

UAV; MEC; trajectory design; task offloading; resource allocation

1. Introduction

The rapid development of mobile intelligent devices is boosting the growth of the Internet of Things (IoT) and the advent of complex mobile applications with intelligent features, such as face recognition, video processing, and online games [1]. These applications are typically latency- and computation-sensitive. However, IoT devices, due to their relatively low computing and battery capabilities, are unable to maintain superior performance [2]. Although cloud computing can offload terminal computing tasks to cloud servers, thereby alleviating the computational burden on mobile devices, the task offloading will cause excessive latency and link congestion problems [3,4].

Mobile edge computing (MEC) provides a cost-efficient solution for computationally intensive and latency-critical tasks, by allocating computational resources towards the network edge to users [5]. The edge execution of user tasks extends the battery life of devices, reduces the power consumption and latency associated with communication and local computing, and improves the quality of service [6,7]. However, in traditional MEC application scenarios, the communication links are dominated by non-line-of-sight (NLoS), which indicates that the data transmission rate is severely restricted by the poor quality of the communication channel [8]. In addition, it poses a significant challenge in deploying the terrestrial MEC unit in certain situations, such as in remote areas or during emergency events [9].

Fortunately, the technology of unmanned aerial vehicles (UAVs), characterized by flexible mobility, easy deployment, and line-of-sight (LoS) connections, has gradually become an important component of future wireless networks. The UAV-assisted MEC system provides a potential solution to address the aforementioned challenges in terrestrial MEC systems [10,11]. Compared to traditional wireless networks, UAV-assisted MEC networks offer a multitude of advantages in terms of mobility, flexibility, cost, coverage, and reconfiguration. Moreover, UAVs equipped with MEC servers can approach users closely to provide services, which can notably reduce energy consumption and transmission delay.

However, designing a joint optimal scheme for resource management and UAV trajectory planning faces significant challenges due to the UAV’s inherent dynamics constraints and limited onboard computation capability and energy resources [12,13,14]. On the one hand, the actual UAV flight acceleration and velocity cannot be adjusted arbitrarily, so sudden acceleration, deceleration, and turning are impossible. However, they are often completely overlooked in existing UAV trajectory planning, resulting in significant deviations between the actual flight trajectory and the theoretically designed trajectory of the UAV [15]. On the other hand, UAV trajectory planning requires achieving coverage for all users and satisfying the task offloading requirements. However, both the users’ task offloading ratio and the communication channel between the user and the UAV are time-varying, which causes performance degradation on offloading efficiency, latency, and energy efficiency. Therefore, given the high dynamic scenarios and the frequent task offloading requirements of users, the resource management of MEC systems and the design of UAV flight trajectories have become crucial research topics.

Motivated by the above-mentioned reasons, this paper focuses on the UAV-assisted MEC system considering UAV flight dynamics constraints. A novel linear quadratic regulator (LQR)-Lagrange-based deep deterministic policy gradient (LLDDPG) algorithm is proposed to minimize the weighted energy consumption and delays of the system through the joint optimization of dynamic computation resources and UAV flight trajectory. In fact, in light of the UAV flight dynamics restriction that the velocity and acceleration of the UAV cannot change arbitrarily, the UAV is required to replan a feasible flight trajectory based on the UAV’s current flight state and the task offloading requirements of users, thereby enhancing the system performance. The main contributions of this work are summarized as follows:

Taking into account the dynamic control of the UAV trajectory, the system architecture for a UAV-assisted MEC is investigated. The communication model, UAV control model, as well as the computing and transmission model are analyzed in detail. Subsequently, the joint optimization problem minimizing the weighted energy consumption and delay is formulated when considering the UAV dynamics constraint.
Constrained by the system dynamics of the UAV, where the velocity and acceleration are not allowed to change arbitrarily, a LLDDPG algorithm is proposed to address the joint dynamic trajectory and resource allocation problem. Specifically, for a practical solution, the optimization problem is decomposed into three distinct subproblems. Firstly, a DDPG-based UAV trajectory design algorithm is developed to acquire the desired optimal trajectory. Subsequently, the LQR-based tracking control algorithm is introduced to derive the actual UAV flight trajectory subject to the system dynamics. Finally, the resource allocation problem regarding the offloading ratio and computation frequency assignments is solved using the Lagrange duality method.
Numerical simulation results extensively demonstrate the efficacy of the proposed LLDDPG algorithm in terms of learning rate, loss function, and reward convergence. Additionally, the performance evaluations with different weight parameters and the effectiveness of the LLDDPG algorithm in actual UAV flight control are also investigated and analyzed.

The remainder of the paper is organized as follows. In Section 2, the related works are reviewed. The UAV-assisted MEC system model and the optimization problem formulation are presented in Section 3. In Section 4, the LLDDPG algorithm is proposed. In Section 5, numerical simulations and results are presented. Finally, conclusions are drawn in Section 6.

2. Related Works

This section briefly reviews the works related to UAV-assisted MEC resource allocation and trajectory design, and the existing issues and challenges are also discussed.

2.1. UAV-Assisted MEC Resource Allocation

In recent years, the increasing maturity of UAV-assisted wireless communication technology has boosted the further development of MEC systems. How to combine UAV advantages and MEC networks has become a research hotspot [16,17,18,19,20,21,22,23,24,25]. Guo et al. [16] introduced a UAV-enabled MEC system, in which the UAV served as a relay between the base station and the offloading user. This work investigated the joint optimization of flight trajectories and computational offloading, considering both user service quality and energy consumption. In [17], it maximized the UAV’s transmit power efficiency by jointly optimizing bandwidth assignment, transmission time, UAV placement, and power allocation control. Furthermore, Qin et al. [18] investigated the energy efficiency of a UAV-assisted MEC system by considering energy consumption and the device task requirements.

The above works mainly focused on the energy efficiency of the MEC system, while ignoring the influence of the UAV trajectory design. In order to improve the network lifetime and computation capability associated with the UAV, Wang et al. [19] investigated an optimization problem that aimed to minimize the total energy consumption of the UAV through a combined approach of zone division and UAV trajectory planning. Wang et al. [20] addressed the efficiency maximization problem by jointly optimizing bandwidth management, UAV trajectory, computation offloading, and computation resource assignment. Diao et al. [21] optimized the computational offload strategy and UAV trajectory in the UAV-enabled MEC system. Their objective was to minimize the total energy consumption and delay while enhancing the user’s service quality. Hu et al. [22] also focused on the joint optimization problem to maximize the data offloading efficiency while minimizing the UAV energy consumption. Liu et al. [23] put forward a system energy minimization problem subject to constraints such as UAV trajectory, transmit power, and CPU frequency. Zeng et al. [24] investigated the problem of minimizing UAV energy consumption, including propulsion energy and communication-related energy, while satisfying the communication throughput requirements of each ground node. By leveraging the traveling salesman problem (TSP) with neighborhood and convex optimization techniques, a successive convex approximation (SCA)-based algorithm is proposed. Yang et al. [25] considered a UAV-enabled MEC system to jointly optimize UAV energy and trajectory control while satisfying long-term data queue stability, and then a perturbed Lyapunov optimization-based offloading and trajectory (PLOT) control algorithm was proposed.

In the aforementioned research, the authors delved into the joint optimization of variables such as UAV trajectory, offloading strategy, computation frequency, and transmission power. Nevertheless, these existing works predominantly focus on the design of desired UAV flight trajectory, completely disregarding the restrictions imposed by UAV flight dynamics. In reality, the inherent limitations in the UAV’s flight capabilities can result in a growing deviation between the desired and actual flight trajectories, which might potentially lead to significant performance degradation.

2.2. UAV Trajectory Control

In order to ensure the efficiency of the UAV-assisted MEC system, it is crucial to jointly optimize the system resource allocation and UAV trajectory control. Since the UAV trajectory directly influences the MEC system’s energy consumption and user service quality, it is of great importance to track and control the flight trajectory. The trajectory flight control problem continues to attract significant attention due to its potential to enhance the system’s adaptability and its capability to handle dynamics and uncertainties [26,27,28,29,30]. Addressing the UAV trajectory tracking control problem, Yan et al. [26] proposed a dynamic tracking method for UAV landing trajectories based on chaos genetic algorithms. Lee et al. [27] proposed a trajectory tracking control methodology utilizing backward stepping and LQR control. Furthermore, Li et al. [28] presented a control-oriented UAV trajectory design approach that incorporates both the kinematics and dynamics equations of the UAV. However, these works primarily focused on the UAV’s trajectory tracking control and neglected the effect of tracking deviation on the overall performance of the UAV-assisted MEC system. Zhang et al. [29] tried to investigate this limitation by considering a network control system with delays, and an adaptive dynamic programming-based tracking control algorithm was proposed to generate real-time control actions. Liu et al. [30] focused on the UAV trajectory planning problem for an environmental monitoring system. The formulated optimization problem was divided into two subproblems: the UAV velocity optimization and trajectory optimization. To address these subproblems, the solving algorithms, based on SCA and general algorithm (GA), respectively, were proposed.

Regrettably, there have been few studies that take into account the joint design of system resource allocation and real-time UAV trajectory control in the UAV-assisted MEC system. Most of the existing articles have studied the desired trajectory planning, assuming that the UAV has the perfect flight capability and operates in a static transmission environment. In this paper, we aim to address this gap to investigate the inherent constraints of UAV flight dynamics, and focus on the joint optimization problem of dynamic trajectory design and resource allocation for a UAV-assisted MEC system.

3. System Modeling and Optimization Problem Formulation

A UAV-assisted MEC system is depicted in Figure 1, which consists of the UAV and multiple users. The UAV, equipped with an MEC server, is capable of simultaneously transmitting information and providing edge computing service. Considering the limited computation capacities, the users are required to offload a portion of their computing tasks to the MEC server through the shared wireless network. Generally, the user’s computing tasks can be divided into two parts; one is computed locally, and the other is offloaded to the UAV for processing. The UAV aggregates the received information to form a new global model, and then feeds back the global information to the users. In order to enhance the energy efficiency and address the dynamic nature of entire system environment, the UAV’s flight needs to be frequently adjusted and controlled.

3.1. Communication Model

Let

q_{k} = {x_{k}, y_{k}}

denote the location of the k-th ground user, which is assumed to be known by the UAV. The position of the UAV in the n-th time slot can be represented as

q [n] = \{x_{u} [n], y_{u} [n], H\}

, where H is the fixed flight altitude of the UAV. Typically, it is assumed that the wireless channel between the user and the UAV is mainly dominated by the LoS. Thus, the channel gain

g_{k} [n]

between the user k and the UAV in the n-th time slot can be expressed as [31]

\begin{matrix} g_{k} [n] = β_{0} d_{k}^{- 2} [n] \end{matrix}

(1)

where

β_{0}

is the channel coefficient and

d_{k} [n]

is the distance between the user k and the UAV for which

d_{k} [n] = \sqrt{{(x_{u} [n] - x_{k})}^{2} + {(y_{u} [n] - y_{k})}^{2} + H^{2}}

.

Subsequently, the transmission data rate from the user k to the UAV can be derived as

\begin{matrix} R_{k} [n] = B {log}_{2} (1 + \frac{g_{k} [n] p}{N_{0}}) \end{matrix}

(2)

where

N_{0}

is the noise power, and B and p represent the assigned bandwidth and transmit power, respectively.

Similarly, the transmission data rate from UAV to the user k is given by

R_{u} [n] = B {log}_{2} (1 + g_{k} [n] p_{u} / N_{0})

, where

p_{u}

represents the transmit power of the UAV.

3.2. Computing and Transmission Model

Considering the partial offloading computation scenario, the computation tasks can be divided into two parts. One is handled locally, while the other is offloaded to the MEC server for processing.

(1) Local Computation: Each user has a restricted computation capability for performing local computing, and the CPU frequency

f_{k} [n]

serves as the key factor. The delay

T_{k}^{L} [n]

and the energy consumption

E_{k}^{L} [n]

for local computing can be, respectively, deduced as follows [32].

\begin{matrix} T_{k}^{L} [n] = \frac{β_{k} [n] L_{k} [n] C_{k}}{f_{k} [n]} \end{matrix}

(3a)

\begin{matrix} E_{k}^{L} [n] = η T_{k}^{L} [n] f_{k}^{3} [n] = η β_{k} [n] C_{k} L_{k} [n] f_{k}^{2} [n] \end{matrix}

(3b)

where

C_{k}

is the number of CPU cycles required for computing,

η

denotes the effective capacitance coefficient for which

η f_{k}^{3} [n]

is the CPU power consumption,

L_{k} [n]

is the total task sizes, and

β_{k} [n] L_{k} [n]

represents the task processed at the local level.

(2) Task Offloading: The offloading delay is determined by the offloading task size, which is given by

\begin{matrix} T_{k}^{o} [n] = \frac{(1 - β_{k} [n]) L_{k} [n]}{R_{k} [n]} \end{matrix}

(4)

Similarly, the relevant transmission energy consumption of user task offloading is given by

\begin{matrix} E_{k}^{o} [n] = p_{u} T_{k}^{O} [n] = \frac{(1 - β_{k} [n]) L_{k} [n] p_{u}}{R_{k} [n]} \end{matrix}

(5)

(3) UAV Computation: Once the user offloads the task to the MEC server, the UAV processes the task, which causes the processing delay as

\begin{matrix} T_{k}^{c} [n] = \frac{(1 - β_{k} [n]) L_{k} [n] C_{k}}{f_{u, k} [n]} \end{matrix}

(6)

where

f_{u, k} [n]

denotes the CPU computing frequency of the UAV allocated to user k.

Similar to (3b), the energy consumption for offloaded task processing can be obtained as

\begin{matrix} E_{k}^{c} [n] = ψ f_{u, k}^{3} [n] T_{k}^{c} [n] = ψ f_{u, k}^{2} [n] (1 - β_{k} [n]) L_{k} [n] C_{k} \end{matrix}

(7)

where

ψ

is the UAV effective capacitance coefficient.

(4) Result Feedback: Once the UAV task computation is completed, the results will be fed back to the relevant user, and the transmission-induced delay

T_{k}^{u} [n]

is given by

\begin{matrix} T_{k}^{u} [n] = \frac{L_{u} [n]}{R_{u} [n]} \end{matrix}

(8)

where

L_{u} [n]

represents the transmission data size back to the user.

Then, the energy consumption for the information feedback is

\begin{matrix} E_{k}^{u} [n] = p_{u} T_{k}^{u} [n] = \frac{L_{u} [n] p_{u}}{R_{u} [n]} \end{matrix}

(9)

From (3) to (9), the total delay and energy consumption in each time slot are given by

\begin{matrix} T [n] = \sum_{k = 1}^{K} T_{k}^{L} [n] + T_{k}^{o} [n] + T_{k}^{c} [n] + T_{k}^{u} [n] \end{matrix}

(10a)

\begin{matrix} E [n] = \sum_{k = 1}^{K} E_{k}^{L} [n] + E_{k}^{o} [n] + E_{k}^{c} [n] + E_{k}^{u} [n] \end{matrix}

(10b)

3.3. UAV Control Model

The existing solutions for UAV trajectory planning are typically carried out under the assumption of perfect UAV flight capability, in which case the velocity and acceleration of UAV can change arbitrarily. However, it is impractical in actual UAV flight, considering the constraints on acceleration and velocity as well as the underlying dynamics principles. In addition, the time-varying task requirements of users and the dynamic transmission environment contribute to the frequent adjustments in the UAV trajectory. Therefore, real-time UAV trajectory control is raised to reduce the performance degradation induced by the state deviations.

Typically, the dynamics of the UAV can be expressed as

\begin{matrix} \dot{q} (t) = v (t) \end{matrix}

(11a)

\begin{matrix} \dot{v} (t) = a (t - Δ) \end{matrix}

(11b)

where

q (t)

and

v (t)

, respectively, denote the UAV’s location and velocity,

a (t)

is the UAV acceleration, and

Δ

is the time delay.

Define a new state vector as

\begin{matrix} w (t) = [q (t), \dot{q} (t)] = [q (t), v (t))] \end{matrix}

(12)

Based on (11) and (12), the dynamics model can be rewritten as

\begin{matrix} \begin{matrix} \dot{w} (t) = A w (t) + B a (t - Δ) \end{matrix} \end{matrix}

(13)

where

A = [\begin{matrix} 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 \end{matrix}], B = [\begin{matrix} 0 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}]

Then, the relevant discrete-time dynamics is given by [33]

\begin{matrix} w [n + 1] = A_{0} w [n] + B_{1} a [n] + B_{2} a [n - 1] \end{matrix}

(14)

where

\begin{matrix} A_{0} = e^{A Δ T}, B_{1} = \int_{0}^{Δ T - Δ} e^{A Δ T} d t B, w [n] = w [n Δ T], B_{2} = \int_{Δ T - Δ}^{Δ T} e^{A Δ T} d t B, \end{matrix}

(15)

where

a [n]

is the control strategy (i.e., acceleration) of UAV, and

Δ T

is the duration of one time slot.

The propulsion energy is the significant flight energy consumption for the UAV, which is typically given by [34]

\begin{matrix} E_{u}^{f l y} [n] = & (N_{1} {∥v [n]∥}^{3} + \frac{N_{2}}{∥v [n]∥} (1 + \frac{{∥a [n]∥}^{2}}{a_{0}^{2}})) Δ T + \frac{1}{2} M ({∥v [n]∥}^{2} - {∥v [n - 1]∥}^{2}) \end{matrix}

(16)

where

N_{1}

and

N_{2}

are system-determined parameters, M is the mass of the UAV, and

a_{0}

is the gravitational acceleration.

3.4. Optimization Problem Formulation

The objective of UAV-assisted MEC is to minimize the weighted energy consumption and delay through the optimization design of UAV trajectory, task offloading strategy, and the allocation of computation resources for both UAV and users, which can be formulated as

\begin{matrix} P 1 : min_{F, β, Q} \sum_{n = 1}^{N} ε_{1} E_{u}^{f l y} + ε_{2} E [n] + ε_{3} T [n] \end{matrix}

\begin{matrix} s . t . w [n + 1] = A_{0} w [n] + B_{1} a [n] + B_{2} a [n - 1] \end{matrix}

(17a)

\begin{matrix} \sum_{k = 1}^{K} f_{u, k} \leq F_{u, max}, \forall k \end{matrix}

(17b)

\begin{matrix} 0 \leq f_{u, k}, \forall k \end{matrix}

(17c)

\begin{matrix} 0 \leq f_{k} \leq F_{k}, \forall k \end{matrix}

(17d)

\begin{matrix} 0 \leq β_{k} \leq 1, \forall k \end{matrix}

(17e)

where

F ≜ {f_{k} [n], f_{u, k} [n]}

represents the assigned CPU frequencies,

β = {β_{k} [n]}

is the offloading ratio,

Q = {q_{u} [n]}

is the UAV flight trajectory, and

ε_{1}

,

ε_{2}

,

ε_{3}

are the weight coefficients. In the optimization problem, (17a) is the corresponding discrete-time dynamics of the UAV, (17b)–(17d) are the maximum CPU computation frequency constraints, and (17e) is the offloading ratio constraint of the user.

4. LLDDPG Algorithm Design

As mentioned above,

P 1

is a mixed optimization problem due to the system dynamics restriction (17a), which has been commonly ignored in existing studies. For a practical solution, the optimization problem

P 1

can be decomposed into three distinct subproblems. The detailed analysis will be presented below.

4.1. UAV Trajectory Design

Given the offloading ratio and the CPU computation frequency assignments, the optimization problem

P 1

can be simplified as a UAV trajectory design problem.

4.1.1. DDPG-Based Desired Trajectory Design

Assuming the perfect UAV flight capacity, the optimization problem

P 1

can be simplified to a desired trajectory design challenge such that

\begin{matrix} P 1.1 : min_{Q} \sum_{n = 1}^{N} (ε E_{u}^{f l y} + \sum_{k = 1}^{K} (E_{k}^{o} [n] + E_{k}^{u} [n]) + \sum_{k = 1}^{K} (T_{k}^{o} [n] + T_{k}^{u} [n])) \end{matrix}

(18)

To address the subproblem of minimizing the system cost in terms of energy consumption and delay as in

P 1.1

, deep reinforcement learning (DRL), which has shown remarkable ability in solving intricate network optimization challenges, is employed to achieve the desired trajectory of the UAV. Typically, DRL can be formulated as a Markov decision process (MDP), where the next system state depends solely on the current state and the action determined by the agent. To be specific, the MDP is defined by the tuple

(S, A, R, P)

, where S, A, and R, respectively, represent the set of states, actions, and rewards, and P is the transition probability from state

s_{n}

to state

s_{n + 1}

[35]. The following are the definitions for the state, action, and reward functions.

(1) State: Based on the MEC system model and task offloading model formulated in Section 3, the state

s_{n}

consists of the locations of the UAV and users, as well as the user task requirements, which can be defined as

\begin{matrix} s_{n} = {x_{1} [n], y_{1} [n], x_{2} [n], y_{2} [n], \dots, x_{K} [n], y_{K} [n], L_{1} [n], L_{2} [n], \dots, L_{K} [n], x_{u} [n], y_{u} [n]} \end{matrix}

(19)

(2) Action: The UAV is required to determine its movements, including flight velocity

v [n]

and direction

θ [n])

, which is given by

\begin{matrix} a_{n} = {v [n] cos (θ [n]), v [n] sin (θ [n])} \end{matrix}

(20)

(3) Reward: The reward function is highly associated with the optimization objective. The objective in

P 1.1

can be directly served as the reward function:

\begin{matrix} r_{n} = - (ε_{1} E_{u}^{f l y} + ε_{2} (E_{k}^{o} + E_{k}^{u}) + ε_{3} (T_{k}^{o} + T_{k}^{u})) \end{matrix}

(21)

In the proposed MDP framework, the UAV acts as the agent, interacting with the environment by observing a state

s_{n}

. Then, it executes an action

a_{n}

based on its policy

π

. Following the execution of action

a_{n}

, the agent receives a reward

r_{n}

and transitions to the next state

s_{n + 1}

. DDPG, as one of the classic DRL algorithms, stands out for its ability to leverage low-dimensional observations to learn effective strategies in continuous action spaces. In dynamic environments, this method has been demonstrated to be highly effective in making decisions and achieving the desired trajectory of the UAV. Therefore, a DDPG-based algorithm for the design of desired UAV trajectory is proposed to minimize both the energy consumption and time delays induced by the UAV and users.

The architecture of the DDPG-based algorithm is depicted in Figure 2. The DDPG network comprises actor and critic neural networks. Specifically, the actor network, serving as the policy network, takes the current environmental state as input and generates relevant actions through the analysis of the neural network. To enhance the efficacy of the iterative update strategy, the critic network, which utilizes a value-based learning approach, can be updated at each step.

The evaluation value

Q (s_{n}, a_{n} | θ^{Q})

is acquired by executing action

a_{n}

in the state

s_{n}

. The action

a_{n} = μ (s_{n} | θ^{μ})

is taken in each state, reaching a specific value through a deterministic behavioral strategy. DDPG draws upon the dual network structure of DQN and experience replay to dissociate the behavior strategy network from the evaluation strategy network. The actor and critic have two networks with a similar structure but have asynchronous parameter renewals. In this manner, the convergence speed is quicker when training the network, and the soft update formula of the target network is set as follows:

\begin{matrix} θ^{Q^{'}} \leftarrow τ θ^{Q} + (1 - τ) θ^{Q^{'}} \end{matrix}

(22a)

\begin{matrix} θ^{μ^{'}} \leftarrow τ θ^{μ} + (1 - τ) θ^{μ^{'}} \end{matrix}

(22b)

where

τ

represents the update rate,

θ^{Q}

and

θ^{μ}

are the parameters of the critic network, while

θ^{Q^{'}}

and

θ^{μ^{'}}

are the parameters of the target network.

The critic network parameter is updated as

\begin{matrix} y_{i} & = R + γ Q^{'} (s_{n + 1}, μ^{'} (s_{n + 1} | θ^{μ^{'}}) | θ^{Q^{'}}) \end{matrix}

(23)

where

y_{i}

represents the actual evaluation value calculated by the target network, and

γ

is the reward decay rate.

Then, the loss function can be expressed as

\begin{matrix} L = \frac{1}{N} \sum_{n} (y_{n} - Q (s_{n}, a_{n} | θ^{Q}))^{2} \end{matrix}

(24)

The actor network parameter is updated by

\begin{matrix} \nabla_{θ^{μ}} J \approx \frac{1}{N} \sum_{i} \nabla_{a} Q (s_{n}, a | θ^{Q}) \end{matrix}

(25)

where

a = μ (s_{n}) \nabla_{θ^{μ}} μ (s_{n} | θ^{Q})

,

\nabla μ

denotes the modification trend of the actor parameters, and

\nabla Q

indicates the actor network update direction calculated by the critic.

After completing the training process, the optimized network parameters of the actor, denoted as

θ^{μ^{*}}

, are obtained. Subsequently, the desired UAV trajectory is given by

\begin{matrix} q^{*} [n] = μ (s_{n} | θ^{μ^{*}}) \end{matrix}

(26)

Consequently, the DDPG-based algorithm for the design of desired UAV trajectory can be summarized in Algorithm 1.

Algorithm 1 DDPG-based algorithm for desired UAV trajectory design
Input: The positions of the UAV $q (t)$ Output: UAV movement policy
1:	Initialize the main actor network and critic network.
2:	Initialize the target actor network and critic network.
3:	Initialize the replay memory $B$ and initialize $σ^{2} = 2.0$ , $ε = 0.9$ for action exploration.
4:	for episode $: = 1, \dots, M$ do
5:	for step $n : = 1, \dots, N$ do
6:	Update the environment status, observe the current environment state $s_{n}$ .
7:	Set the current action $a_{n} \sim N (μ (s_{n} \| θ^{μ}), \in σ^{2})$ ;
8:	Execute the action $a_{n}$ , obtain the reward $r_{n}$ , and transit to the next state $s_{n + 1}$ .
9:	Store the experience tuple ( $s_{n}$ , $a_{n}$ , $r_{n}$ , $s_{n + 1}$ ) into replay memory $B$ .
10:	if Update then
11:	Randomly sampling the mini-batch transitions from $B$ .
12:	Renew the critic network through minimizing the critic loss.
13:	Renew the actor network through maximizing the actor loss.
14:	Renew the target networks based on (29).
15:	Decay the action $σ^{2} \leftarrow σ^{2} ε$
16:	end if
17:	end for
18:	end for

4.1.2. LQR-Based Trajectory Tracking Control

In practical applications, the UAV faces a dynamic transmission environment, time-varying user task requirements, and diverse flight disturbances. To enhance system performance and control stability, the UAV must dynamically adjust its flight trajectory. In addition, due to the inherent limitations in the UAV’s velocity and acceleration, the actual flight state cannot be adjusted arbitrarily to match the desired trajectory

q^{*}

obtained from subproblem

P 1.1

. Therefore, there exist inevitable deviations between the desired and actual flight trajectories, potentially degrading the system performance and even significantly diminishing the effectiveness of the optimization design. To mitigate this issue, it is imperative to minimize the desired UAV trajectory deviation through real-time flight control. In this regard, an LQR-based UAV trajectory tracking control algorithm is proposed to align the actual flight trajectory with the desired trajectory as closely as possible, thereby improving the overall system performance.

Given the desired UAV trajectory design

q^{*}

obtained from subproblem

P 1.1

, the trajectory tracking control problem is given by

\begin{matrix} P 1.2 : & q \to q^{*} \\ s . t . w [n + 1] = A_{0} w [n] + B_{1} a [n] + B_{2} a [n - 1] \end{matrix}

(27)

The UAV location and velocity deviations can be, respectively, obtained as

\begin{matrix} \tilde{q} = q - q^{*} \end{matrix}

(28a)

\begin{matrix} \tilde{v} = v - v^{*} \end{matrix}

(28b)

Define a new vector

\tilde{w} = (\tilde{q}, \tilde{v})

, and then the UAV deviation dynamics can be derived based on (12), (13), and (14) as

\begin{matrix} \tilde{w} [n + 1] = A_{0} \tilde{w} [n] + B_{1} a [n] + B_{2} a [n - 1] \end{matrix}

(29)

Then, by using the typical quadradic cost funcion, the UAV trajectory tracking control problem

P 1.2

can be equivalent to be the following optimizaton problem [36]:

\begin{matrix} min_{a [n]} {\tilde{w}}^{T} [n] Q \tilde{w} [n] + \sum_{n = 0}^{N - 1} {\tilde{w}}^{T} [n] Q \tilde{w} [n] + a^{T} [n] R a [n] \\ s . t . \tilde{w} [n + 1] = A_{0} \tilde{w} [n] + B_{1} a [n] + B_{2} a [n - 1] \end{matrix}

(30)

where N denotes the finite time horizon, and Q and R are system-determined parameters.

It can be seen that the objective of (30) is to minimize the trajectory tracking deviation through the optimal design of UAV flight

a [n]

. Then, an LQR-based trajectory tracking control algorithm is proposed to solve the optimization problem (30).

Define a new state vector as

\begin{matrix} z [n] = [\tilde{w} [n], a [n - 1]] \end{matrix}

(31)

The optimization problem (30) can be rewritten as

\begin{matrix} min_{a [n]} z^{T} [N] \dot{Q} z [N] + \sum_{n = 0}^{N - 1} z^{T} [n] \dot{Q} z [n] + a^{T} [n] R a [n] \\ s . t . z [n + 1] = C_{n} z [n] + D_{n} a [n] \end{matrix}

(32)

where

\dot{Q} = [\begin{matrix} Q & 0 \\ 0 & 0 \end{matrix}], C [n] = [\begin{matrix} A [n] & B_{2} [n] \\ 0 & 0 \end{matrix}], D [n] = [\begin{matrix} B_{1} [n] \\ 0 \end{matrix}]

(33)

The optimization problem (32) is a classic LQR-based control problem, and the optimal control strategy can be derived as [33]

\begin{matrix} a [n] = - l [n] z [n] \end{matrix}

(34)

where

\begin{matrix} l [n] = {[D^{T} [n] S [n + 1] D [n] + R]}^{- 1} D^{T} [n] S [n + 1] C [n], \\ S [n] = C^{T} [n] S [n + 1] C [n] + \dot{Q} - l^{T} [n] D^{T} [n] S [n + 1] C [n], \\ S [N] = \dot{Q} \end{matrix}

(35)

Based on (34) and (35), the actual UAV flight trajectory

q [n]

can be obtained based on the UAV acceleration strategy

a [n]

.

4.2. Computation Resource Allocation Optimization

After the actual flight trajectory

q [n]

is determined from Section 4.1 and Section 4.2, the optimization problem in

P 1

can be equivalent to be a computation resource allocation problem, which is given by

\begin{matrix} P 1.3 : min_{F, β} \sum_{n = 1}^{N} E_{k}^{L} [n] + E_{k}^{c} [n] + T_{k}^{L} [n] + T_{k}^{c} [n] \end{matrix}

(36a)

\begin{matrix} s . t . \sum_{k \in K} f_{k} \leq F_{k}, \forall k \end{matrix}

(36b)

\begin{matrix} f_{k}, f_{u} \geq 0, \forall k \end{matrix}

(36c)

\begin{matrix} 0 \leq β_{k} \leq 1, \forall k \end{matrix}

(36d)

Subproblem

P 1.3

, as a convex problem, can be typically solved using the Lagrange duality method as follows [37].

Theorem 1.

The UAV trajectory

q [n]

, the CPU frequencies of the users and UAV, as well as the optimal offloading ratios, respectively, denoted by

β_{k}^{*} [n]

,

f_{u}^{*} [n]

, and

f_{k}^{*} [n]

, can be expressed as follows.

\begin{matrix} f_{k}^{*} [n] = \sqrt{\frac{γ_{k}}{3 σ_{c} M K \sum_{i = n}^{N} κ_{k, i}}}, k \in K, n \in N \end{matrix}

(37a)

\begin{matrix} f_{u}^{*} [n] = \{\begin{matrix} 0, & n = 1 \\ \sqrt{\frac{λ_{N} - \sum_{i = n}^{N - 1} λ_{i}}{3 σ_{c} M}}, & n = 2, \dots, N - 1 \\ \sqrt{\frac{λ_{N}}{3 σ_{c} M}}, & n = N \end{matrix} \end{matrix}

(37b)

\begin{matrix} β_{k}^{*} [n] = B ψ {log}_{2} \{\frac{B g_{k} [n] [\sum_{i = n + 1}^{N - 1} λ_{i} + ξ_{k} - λ_{N}]}{\sum_{i = n}^{N} κ_{k, i} σ^{2} ln 2}\} \end{matrix}

(37c)

where

ξ_{k} \geq 0

,

κ_{k, n} \geq 0

,

λ_{n} \geq 0

.

Finally, based on the algorithm analysis presented in Section 4.1 and Section 4.2, the schematic of the proposed LLDDPG algorithm for the joint optimization of the UAV dynamic trajectory and resource allocation can be illustrated as shown in Figure 3, and its algorithmic procedure can be summarized in Algorithm 2.

Algorithm 2 LLDDPG Algorithm

1:: Initialization: $q_{k}$ , $q_{u} [0]$ , $q_{u} [N]$ , $L_{k} [0]$ , $β_{k} [0]$ , $f_{u, k} [0]$ , $f_{k} [0]$ , $p_{u}$ , p, B, $C_{k}$ , $N_{0}$ , $η$ , $ψ$ .
2:: Solve the subproblem $P 1.1$ , and obtain the desired trajectory of UAV $q^{*} [n]$ based DDPG algorithm as Algorithm 1.
3:: Solve the subproblem $P 1.2$ by using the LQR method:
4:: Derive the control coefficient $l [n]$ offline based on (35).
5:: Obtain the acceleration control strategy $a [n]$ as in (34) based on UAV state deviations and previous control strategies.
6:: Then, the actual UAV flight trajectory $q [n]$ can be obtained based on acceleration control strategy $a [n]$ .
7:: Solve the subproblem $P 1.3$ by using the Lagrange duality method:
8:: Obtain the optimal offloading ratios $β_{k}^{*} [n]$ and the CPU frequency of the users $f_{u}^{*} [n]$ and UAV $f_{k}^{*} [n]$ based on (37).
9:: $q [n]$ , $β_{k}^{*} [n]$ , $f_{u}^{*} [n]$ , and $f_{k}^{*} [n]$ are fed back for parameter update of DDPG algorithm.

5. Simulations

In this section, the performances of the proposed LLDDPG algorithm are comprehensively evaluated through simulations and numerical results. Specifically, the convergence performance of the algorithm is analyzed, and the performance comparisons with other existing works are given.

5.1. Simulation Settings

In simulations, the number of user devices is set to 100 and the task duration time is set to 10 min. The ground users are distributed in a 50 m × 50 m area. The flight height of the UAV is 10 m and the maximum flight velocity

V_{max}

= 10 m/s. At the beginning, the UAV starts the task at a random location. The vertical and horizontal coverage radius of the UAV are set, respectively, to

X_{d}

= 25 m and

X_{h}

= 10 m. The user’s data cache is updated each slot time with a Poisson process. The data buffer capacity

U_{max}

is set to 5000 packets, and the relevant data transfer size is Q = 10 Mbits. The UAV and user’s transmit power are, respectively, set to

P_{u}

= 30 dBm and

P_{k}

= −20 dBm. The other system corresponding parameters are shown in Table 1, where parameter settings refer to [38]. The structure and parameters of the DDPG network are shown in Table 2. During the implementation, the final output layer of the actor network is set to the tann layer, and all hidden layers are completely connected and activated using ReLU functions.

5.2. Results and Analysis

Figure 4 shows the convergence of the reward function and the effect of different discount factors on the reward. The results reveal that the model convergence speed is relatively fast, and the final convergence level of the reward function is comparable under the conditions of different discount factors. When the discount factor is

0.99

and

0.7

, there are abnormal fluctuations in the subsequent convergence stage, indicating that the exploration of action space is not comprehensive. Since there is no significant change in the final performance when the discount factor is

0.9

, it indicates that the agent is able to learn the optimal policy.

Figure 5 presents the correlation between the learning rate and the loss function. Initially, the learning rate shows a high sensitivity to the loss function of the model. In the case of a low learning rate, the value of the loss function still increases slowly even after multiple training episodes. Conversely, in the case of a higher learning rate, the loss function rises rapidly, but it eventually takes a long time to converge. To ensure more comprehensive exploration of the agent’s action space, it is better to keep a smooth increase in the loss function and ultimately reach the optimal value. To sum up, the discount factor 0.9 is selected as a moderate discount factor to achieve the desired result.

For performance evaluations, Figure 6, Figure 7 and Figure 8 illustrate the effects of different weight parameters, namely

ε_{1}

,

ε_{2}

, and

ε_{3}

, on different system performances, including the transmission data rate, average energy consumption, and number of service users. The parameters for the comparison experiments are set as shown in Table 3. The horizontal coverage distance is set to 5 m, 10 m, 15 m, 20 m, and 25 m. It is observed from Figure 6 that, with the increase in the horizontal coverage distance, the transmission data rate decreases under all policies. In Figure 7, the average energy consumption under “

o p 2

” is higher than that of “

o p 1

” and “

o p 3

” because the weight of energy consumption is set to 0 in “

o p 2

”. As can be seen from Figure 8, as the horizontal coverage distance increases, the number of service users also increases. Overall, the transmission data rate and the number of service users under the “

o p 1

” policy are better than the other two strategies, and the energy consumption performance is slightly worse than that of the “

o p 2

” policy. This also validates that the proposed algorithm can successfully learn control strategies that simultaneously optimize multiple optimization objectives.

Figure 9 shows the relationship between the weighted total energy consumption of all users and the quantity of user tasks under scenarios with different numbers of users. It can be seen that the weighted total energy consumption increases when the amount of computing tasks becomes larger, and this can be inferred from Formulas (3) and (5). In addition, the weighted total energy consumption increases with the increase in the number of users, owing to the requirement of consuming more energy between the UAV and the users for computing and transmission purposes.

In order to investigate the effectiveness of the LLDDPG algorithm in actual UAV flight control, a numerical simulation of the scenario of UAV-assisted users in task offloading is conducted. The comparisons of the UAV trajectory, velocity, and acceleration are, respectively, presented in Figure 10, Figure 11 and Figure 12. It can be observed that the size of the offloading task for each user has a significant effect on the UAV trajectory design, and the UAV needs to approach the user with a higher task demand as quickly as possible. Initially, the global planning of the UAV trajectory is carried out based on the DDPG algorithm to ensure the performance of each user and save more energy consumption. Subsequently, the LQR algorithm is used to track the desired trajectory to mitigate the performance degradation introduced by the UAV dynamics constraints.

A rigorous trajectory analysis reveals that the initial alignment between the actual and desired trajectories is apparent. However, as the desired trajectory is formulated based on the user’s initial task size, it gradually diverges from the actual trajectory, which can be dynamically replanned in accordance with the evolving user task size and the current UAV flight state. Consequently, the deviation between the UAV’s actual and desired trajectories translates into an increasing gap. To address this, the proposed LLDDPG algorithm demonstrates its proficiency in dynamically adjusting the flight trajectory in real-time, taking into account both the user’s task requirements and the UAV’s current flight status, ultimately enhancing the overall performance.

Figure 13 depicts the performance comparisons, including energy consumption, latency, and system cost, against existing optimization algorithms [24,25]. In [24], by leveraging the TSP with neighborhood and convex optimization techniques, an SCA-based algorithm is proposed to address the UAV trajectory planning problem, while the work in [25] proposed a PLOT control algorithm to maximize the aggregate execution of local and offloading tasks.

As depicted in Figure 13, it is revealed that, with the increasing task bit size, the energy consumption of the GCO algorithm escalates significantly greater than the others. Meanwhile, the TSP algorithm exhibits a notably higher latency compared to the other two algorithms. In contrast, the proposed LLDDPG algorithm achieves the lowest total system cost in the UAV-assisted MEC system, effectively optimizing the total weighted energy consumption and delay of the system.

6. Conclusions

This work focus on the intricate issue of resource allocation and real-time trajectory control for a UAV-assisted MEC system operating in a partial offloading mode. Through the joint optimization of CPU frequency, offload ratio, and UAV trajectory, the minimization of the weighted average energy consumption and delay is achieved. In particular, to address the trajectory planning problem, the DDPG and LQR algorithms are employed together to realize the real-time control for actual UAV flight. For the computation resource allocation problem, as a convex problem, a low-complexity Lagrange duality method is proposed to derive the optimal expressions for CPU frequency and offload ratio. Finally, the efficacy of the proposed LLDDPG algorithm is comprehensively evaluated through simulations and numerical results.

Author Contributions

Conceptualization, Z.W., W.Z., P.H. and C.F.; methodology, Z.W., W.Z., P.H., X.Z., L.L. and C.F.; software, Z.W., W.Z., X.Z., L.L. and Y.S.; validation, Z.W., W.Z., P.H., X.Z., L.L., C.F. and Y.S.; formal analysis, W.Z., P.H., X.Z. and C.F.; investigation, W.Z., X.Z. and C.F.; data curation, W.Z., X.Z. and Y.S.; writing—original draft preparation, Z.W., W.Z., P.H., L.L., C.F. and Y.S.; writing—review and editing, Z.W., W.Z., P.H., X.Z., L.L., C.F. and Y.S.; project administration, Z.W. and P.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China under Grant 62371014, Beijing Natural Science Foundation 4222002, Urban Carbon Neutral Science and Technology Innovation Fund Project of Beijing University of Technology 040000514122607, and Special Research Program of Academic Cooperation between Taipei University of Technology and Beijing University of Technology NTUT-BJUT-112-02.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Al-Fuqaha, A.; Guizani, M.; Mohammadi, M.; Aledhari, M.; Ayyash, M. Internet of things: A survey on enabling technologies, protocols, and applications. IEEE Commun. Surv. Tutor. Syst. 2015, 17, 2347–2376. [Google Scholar] [CrossRef]
Alwarafy, A.; Al-Thelaya, K.A.; Abdallah, M.; Schneider, J.; Hamdi, M. A survey on security and privacy issues in edge-computing-assisted internet of things. IEEE Int. Things J. 2020, 8, 4004–4022. [Google Scholar] [CrossRef]
Hu, Z.; Zhong, R.; Fang, C.; Liu, Y. Caching-at-STARS: The next generation edge caching. IEEE Trans. Wire. Commun. 2024. early access. [Google Scholar] [CrossRef]
Fang, C.; Hu, Z.; Meng, X.; Tu, S.; Wang, Z.; Zeng, D.; Ni, W.; Guo, S.; Han, Z. Drl-driven joint task offloading and resource allocation for energy-efficient content delivery in cloud-edge cooperation networks. IEEE Trans. Veh. Technol. 2023, 72, 16195–16207. [Google Scholar] [CrossRef]
Hou, Y.; Wang, C.; Zhu, M.; Xu, X.; Tao, X.; Wu, X. Joint allocation of wireless resource and computing capability in MEC-enabled vehicular network. China Commun. 2021, 18, 64–76. [Google Scholar] [CrossRef]
Baccour, E.; Mhaisen, N.; Abdellatif, A.A.; Erbad, A.; Mohamed, A.; Hamdi, M.; Guizani, M. Pervasive AI for IoT applications: A survey on resource-efficient distributed artificial intelligence. IEEE Commun. Surv. Tutor. Syst. 2022, 24, 2366–2418. [Google Scholar] [CrossRef]
Porambage, P.; Okwuibe, J.; Liyanage, M.; Ylianttila, M.; Taleb, T. Survey on multi-access edge computing for internet of things realization. IEEE Commun. Surv. Tutor. Syst. 2018, 20, 2961–2991. [Google Scholar] [CrossRef]
Yuan, S.; Zhang, Z.; Li, Q.; Li, W.; Zhang, Y. Joint optimization of dnn partition and continuous task scheduling for digital twin-aided mec network with deep reinforcement learning. IEEE Wire. Commun. 2023, 11, 27099–27110. [Google Scholar] [CrossRef]
Nadeem, L.; Azam, M.A.; Amin, Y.; Al-Ghamdi, M.A.; Chai, K.K.; Khan, M.F.N.; Khan, M.A. Integration of D2D, network slicing, and MEC in 5G cellular networks: Survey and challenges. IEEE Access 2021, 9, 37590–37612. [Google Scholar] [CrossRef]
Mehrabi, M.; You, D.; Latzko, V.; Salah, H.; Reisslein, M.; Fitzek, F.H. Device-enhanced MEC: Multi-access edge computing (MEC) aided by end device computation and caching: A survey. IEEE Access 2019, 7, 166079–166108. [Google Scholar] [CrossRef]
Lin, Z.; Lin, M.; De Cola, T.; Wang, J.B.; Zhu, W.P.; Cheng, J. Supporting IoT with rate-splitting multiple access in satellite and aerial-integrated networks. IEEE Int. Things J. 2021, 8, 11123–11134. [Google Scholar] [CrossRef]
Zhao, C.; Liu, J.; Sheng, M.; Teng, W.; Zheng, Y.; Li, J. Multi-UAV trajectory planning for energy-efficient content coverage: A decentralized learning-based approach. IEEE J. Sel. Area Commun. 2021, 39, 3193–3207. [Google Scholar] [CrossRef]
Lin, Z.; Niu, H.; An, K.; Wang, Y.; Zheng, G.; Chatzinotas, S.; Hu, Y. Refracting RIS-aided hybrid satellite-terrestrial relay networks: Joint beamforming design and optimization. IEEE Trans. Aerosp. Electron. Syst. 2022, 58, 3717–3724. [Google Scholar] [CrossRef]
Niu, H.; Chu, Z.; Zhou, F.; Zhu, Z.; Zhen, L.; Wong, K.K. Robust design for intelligent reflecting surface-assisted secrecy SWIPT network. IEEE Trans. Wire. Commun. 2021, 21, 4133–4149. [Google Scholar] [CrossRef]
Yang, M.; Jeon, S.W.; Kim, D.K. Optimal trajectory for curvature-constrained UAV mobile base stations. IEEE Wire. Commun. Lett. 2020, 9, 1056–1059. [Google Scholar] [CrossRef]
Guo, F.; Zhang, H.; Ji, H.; Li, X.; Leung, V.C. Joint trajectory and computation offloading optimization for UAV-assisted MEC with NOMA. In Proceedings of the IEEE INFOCOM 2019—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Paris, France, 29 April–2 May 2019; pp. 1–6. [Google Scholar]
Pham, Q.V.; Zeng, M.; Ruby, R.; Huynh-The, T.; Hwang, W.J. UAV communications for sustainable federated learning. IEEE Trans. Veh. Technol. 2021, 70, 3944–3948. [Google Scholar] [CrossRef]
Qin, X.; Song, Z.; Hou, T.; Yu, W.; Wang, J.; Sun, X. Joint optimization of resource allocation, phase shift and UAV trajectory for energy-efficient RIS-assisted UAV-enabled MEC systems. IEEE Trans. Green Commun. Netw. 2023, 7, 1778–1792. [Google Scholar] [CrossRef]
Wang, D.; Tian, J.; Zhang, H.; Wu, D. Task offloading and trajectory scheduling for UAV-enabled MEC networks: An optimal transport theory perspective. IEEE Wire. Commun. Lett. 2021, 11, 150–154. [Google Scholar]
Wang, L.; Zhou, Q.; Shen, Y. Computation efficiency maximization for UAV-assisted relaying and MEC networks in urban environment. IEEE Trans. Green Commun. Netw. 2021, 7, 565–578. [Google Scholar] [CrossRef]
Diao, X.; Guan, X.; Cai, Y. Joint offloading and trajectory optimization for complex status updates in UAV-assisted Internet of Things. IEEE Int. Things J. 2022, 9, 23881–23896. [Google Scholar] [CrossRef]
Hu, Z.; Zeng, F.; Fu, B.; Jiang, H.; Chen, H. Computation efficiency maximization and QoE-provisioning in UAV-enabled MEC communication systems. IEEE Trans. Netw. Sci. Eng. 2021, 8, 1630–1645. [Google Scholar] [CrossRef]
Liu, B.; Wan, Y.; Zhou, F.; Wu, Q.; Hu, R.Q. Resource allocation and trajectory design for MISO UAV-assisted MEC networks. IEEE Trans. Veh. Technol. 2022, 71, 4933–4948. [Google Scholar] [CrossRef]
Zeng, Y.; Xu, J.; Zhang, R. Energy minimization for wireless communication with rotary-wing UAV. IEEE Trans. Wire. Commun. 2019, 18, 2329–2345. [Google Scholar] [CrossRef]
Yang, Z.; Bi, S.; Zhang, Y.J.A. Dynamic offloading and trajectory control for UAV-enabled mobile edge computing system with energy harvesting devices. IEEE Trans. Wire. Commun. 2022, 21, 10515–10528. [Google Scholar] [CrossRef]
Yan, J.; Liu, Z.; Chen, C.; Zhao, J. Dynamic tracking method for landing trajectory of power line patrol UAV Based on Chaos Genetic Algorithm. In Proceedings of the 2021 International Conference on Machine Learning and Intelligent Systems Engineering (MLISE), Chongqing, China, 9–11 July 2021; pp. 301–304. [Google Scholar]
Lee, S.H.; Kang, S.H.; Kim, Y. Trajectory tracking control of quadrotor UAV. In Proceedings of the 2017 IEEE International Conference on Robotics and Biomimetics (ROBIO), Macau, Macao, 5–8 December 2017; pp. 281–285. [Google Scholar]
Li, B.; Li, Q.; Zeng, Y.; Rong, Y.; Zhang, R. 3D trajectory optimization for energy-efficient UAV communication: A control design perspective. IEEE Trans. Wire. Commun. 2021, 21, 4579–4593. [Google Scholar] [CrossRef]
Zhang, M.; Wu, S.; Jiao, J.; Zhang, N.; Zhang, Q. Energy-and Cost-Efficient Transmission Strategy for UAV Trajectory Tracking Control: A Deep Reinforcement Learning Approach. IEEE Int. Things J. 2022, 10, 8958–8970. [Google Scholar] [CrossRef]
Liu, K.; Zheng, J. UAV trajectory optimization for time-constrained data collection in UAV-enabled environmental monitoring systems. IEEE Int. Things J. 2022, 9, 24300–24314. [Google Scholar] [CrossRef]
Zhou, F.; Wu, Y.; Hu, R.Q.; Qian, Y. Computation rate maximization in UAV-enabled wireless-powered mobile-edge computing systems. IEEE J. Sel. Area Commun. 2018, 36, 1927–1941. [Google Scholar] [CrossRef]
Qian, Y.; Wang, F.; Li, J.; Shi, L.; Cai, K.; Shu, F. User association and path planning for UAV-aided mobile edge computing with energy restriction. IEEE Wire. Commun. Lett. 2019, 8, 1312–1315. [Google Scholar] [CrossRef]
Wang, Z.; Gao, Y.; Fang, C.; Liu, L.; Zhou, H.; Zhang, H. Optimal control design for connected cruise control with stochastic communication delays. IEEE Trans. Veh. Technol. 2020, 69, 15357–15369. [Google Scholar] [CrossRef]
Zeng, Y.; Zhang, R. Energy-efficient UAV communication with trajectory optimization. IEEE Trans. Wire. Commun. 2017, 16, 3747–3760. [Google Scholar] [CrossRef]
Zhang, M.; Zhang, Y.; Gao, Z.; He, X. An improved DDPG and its application based on the double-layer BP neural network. IEEE Access 2020, 8, 177734–177744. [Google Scholar] [CrossRef]
Wang, Z.; Jin, S.; Liu, L.; Fang, C.; Li, M.; Guo, S. Design of intelligent connected cruise control with vehicle-to-vehicle communication delays. IEEE Trans. Veh. Technol. 2022, 71, 9011–9025. [Google Scholar] [CrossRef]
Zhou, F.; Wu, Y.; Sun, H.; Chu, Z. UAV-enabled mobile edge computing: Offloading optimization and trajectory design. In Proceedings of the 2018 IEEE International Conference on Communications (ICC), Kansas City, MO, USA, 20–24 May 2018; pp. 1–6. [Google Scholar]
Yu, Y.; Tang, J.; Huang, J.; Zhang, X.; So, D.K.C.; Wong, K.K. Multi-objective optimization for UAV-assisted wireless powered IoT networks based on extended DDPG algorithm. IEEE Trans. Commun. 2021, 69, 6361–6374. [Google Scholar] [CrossRef]

Figure 1. A UAV-assisted MEC system.

Figure 2. Schematic of DDPG algorithm.

Figure 3. Schematic of the proposed LLDDPG algorithm.

Figure 4. Reward of different discount factors.

Figure 5. Loss of different discount factors.

Figure 6. Transmission data rate comparison under different weight parameters.

Figure 7. Average energy consumption comparison under different weight parameters.

Figure 8. Number of service users comparison under different weight parameters.

Figure 9. The relationship between weighted total energy consumption and tasks under different numbers of users.

Figure 10. Comparisons between UAV desired and actual trajectories (actual: purple, desired: red).

Figure 11. Comparisons of UAV velocity (actual: purple, desired: red).

Figure 12. Comparisons of UAV acceleration (actual: purple, desired: red).

Figure 13. Comparison with other algorithms.

Table 1. Simulation parameter settings.

Symbol	Description	Setting
H	UAV flight altitude	10 m
$V_{max}$	Maximum flight velocity	10 m/s
B	Bandwidth	1 Mz
$β_{0}$	Channel power gain	−30 dB
$N_{0}$	Noise power	−90 dBm
$a, b$	LoS probability	10, 0.6
$C_{k}$	Number of CPU cycles per bit	1000 cycles/bit
$ψ$	Effective capacitance coefficient of UAV	$10^{- 28}$
$η$	Effective capacitance coefficient of users	$10^{- 28}$
$τ$	Time slot	0.1 s

Table 2. Parameters of DDPG training.

Parameter	Value
Learning rate of critic	0.001
Learning rate of actor	0.001
Update rate	0.001
Discount factor	0.9
Batch size	64

Table 3. Comparison of experiment parameters.

Name	Parameter
$o p t 1$	$ε_{1} = 0.001$ , $ε_{2} = 0.01$ , $ε_{3} = 1$
$o p 2$	$ε_{1} = 0$ , $ε_{2} = 0$ , $ε_{3} = 1$
$o p 3$	$ε_{1} = 0$ , $ε_{2} = 1$ , $ε_{3} = 0$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Z.; Zhao, W.; Hu, P.; Zhang, X.; Liu, L.; Fang, C.; Sun, Y. UAV-Assisted Mobile Edge Computing: Dynamic Trajectory Design and Resource Allocation. Sensors 2024, 24, 3948. https://doi.org/10.3390/s24123948

AMA Style

Wang Z, Zhao W, Hu P, Zhang X, Liu L, Fang C, Sun Y. UAV-Assisted Mobile Edge Computing: Dynamic Trajectory Design and Resource Allocation. Sensors. 2024; 24(12):3948. https://doi.org/10.3390/s24123948

Chicago/Turabian Style

Wang, Zhuwei, Wenjing Zhao, Pengyu Hu, Xige Zhang, Lihan Liu, Chao Fang, and Yanhua Sun. 2024. "UAV-Assisted Mobile Edge Computing: Dynamic Trajectory Design and Resource Allocation" Sensors 24, no. 12: 3948. https://doi.org/10.3390/s24123948

APA Style

Wang, Z., Zhao, W., Hu, P., Zhang, X., Liu, L., Fang, C., & Sun, Y. (2024). UAV-Assisted Mobile Edge Computing: Dynamic Trajectory Design and Resource Allocation. Sensors, 24(12), 3948. https://doi.org/10.3390/s24123948

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

UAV-Assisted Mobile Edge Computing: Dynamic Trajectory Design and Resource Allocation

Abstract

1. Introduction

2. Related Works

2.1. UAV-Assisted MEC Resource Allocation

2.2. UAV Trajectory Control

3. System Modeling and Optimization Problem Formulation

3.1. Communication Model

3.2. Computing and Transmission Model

3.3. UAV Control Model

3.4. Optimization Problem Formulation

4. LLDDPG Algorithm Design

4.1. UAV Trajectory Design

4.1.1. DDPG-Based Desired Trajectory Design

4.1.2. LQR-Based Trajectory Tracking Control

4.2. Computation Resource Allocation Optimization

5. Simulations

5.1. Simulation Settings

5.2. Results and Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI