Joint Optimization of Task Completion Time and Energy Consumption in UAV-Enabled Mobile Edge Computing

Zhang, Hanwen; Chen, Tao; Ren, Bangbang; Li, Ruozhe; Yuan, Hao

doi:10.3390/drones9040274

Open AccessArticle

Joint Optimization of Task Completion Time and Energy Consumption in UAV-Enabled Mobile Edge Computing

by

Hanwen Zhang

,

Tao Chen

^*

,

Bangbang Ren

,

Ruozhe Li

and

Hao Yuan

National Key Laboratory of Information Systems Engineering, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Drones 2025, 9(4), 274; https://doi.org/10.3390/drones9040274

Submission received: 24 February 2025 / Revised: 17 March 2025 / Accepted: 1 April 2025 / Published: 3 April 2025

(This article belongs to the Topic Internet of Things Architectures, Applications, and Strategies: Emerging Paradigms, Technologies, and Advancing AI Integration)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Unmanned Aerial Vehicles (UAVs) hold great promise for Mobile Edge Computing (MEC) owing to their flexible mobility, rapid deployment, and low-cost characteristics. However, UAV-enabled MEC still faces challenges in terms of the real-time arrival of computational tasks, energy reservation, and the actual response efficiency of the system. In this study, we focus on a UAV-enabled MEC scenario, where multiple UAVs function as airborne edge servers, offering computation services to multiple ground-based user devices (UDs). We aim to minimize the cost of the MEC system by optimizing the computation offloading policy. Specifically, we take task latency into account to ensure the timeliness of real-time tasks. The Lyapunov optimization method is employed to maintain a uniform and stable queue for energy consumption. Additionally, we draw on the concept of maximum completion time in shop-floor scheduling to optimize the actual response latency. To this end, we propose a joint optimization algorithm. First, the joint optimization problem is transformed into a per-time-slot real-time optimization problem (PROP) using the Lyapunov optimization framework. Then, a reinforcement learning method, LyraRD, is proposed to solve the PROP. Experimental results verify that the proposed approach outperforms the benchmarks in terms of system performance.

Keywords:

unmanned aerial vehicles (UAVs); mobile edge computing (MEC); task offloading; network disintegration; reinforcement learning

1. Introduction

With the widespread adoption of the Internet of Things (IoT) and wireless communication technologies, numerous intelligent applications with strict computational latency requirements have emerged, such as real-time video, autonomous driving, and online gaming [1]. Nevertheless, the terminal devices for these applications are limited in computing power and battery capacity, making it arduous to meet the low-latency demands. Mobile Edge Computing (MEC) is regarded as a promising solution due to its real-time computing and energy-saving capabilities [2,3,4]. Based on MEC, user devices can delegate computationally intensive tasks to edge servers instead of relying on remote data centers. This not only compensates for the computing power shortage of user devices but also reduces the response time.

Initially, most MEC servers were deployed on the ground [5,6,7]. In fact, these ground-based MEC servers have provided convenient services to users. However, they typically cover only a portion of the user population [2,8,9]. Fortunately, the rapid development of Unmanned Aerial Vehicles (UAVs) offers new opportunities to address the above-mentioned issues. UAVs can carry computing devices and hover in the air, enabling MEC servers to be deployed flexibly to meet user needs. As depicted in Figure 1, two UAVs hover in air, with different levels of battery capacity, computing resources, and bandwidth resources. We use a three-tuple to represent the different normalized resource values of the UAVs, i.e.,

U A V_{1} : (20, 2, 80 %)

and

U A V_{2} : (20, 1.5, 60 %)

. Two user devices,

U D_{1}

and

U D_{2}

, each with unit computing resources, generate computation tasks at 5-second intervals in different time slots.

U D_{1}

generates tasks

{a_{1} = 5, a_{2} = 3}

,

{a_{3} = 6, a_{4} = 4}

,

{a_{5} = 2, a_{6} = 3}

and

U D_{2}

generates tasks

{b_{1} = 3, b_{2} = 1}

,

{b_{3} = 5, b_{4} = 4}

, and

{b_{5} = 6, b_{6} = 3}

, where the values represent the units required of computing resources in three slots, respectively. Without loss of generality, we assume that UAVs consume one unit of energy for executing one unit of a computation task.

Figure 2 presents two solutions to offload the computing tasks. In Figure 2a,

U D_{1}

offloads tasks

{a_{1}, a_{3}}

to

U A V_{1}

, and

{a_{4}, a_{6}}

to

U A V_{2}

, while

U D_{2}

offloads tasks

{b_{4}, b_{5}}

to

U A V_{1}

, and

{b_{1}}

to

U A V_{2}

. In the first slot (i.e.,

T = 1

), task

b_{1}

is executed in

U A V_{2}

, taking up

3 / 1.5 = 2

seconds and

3 / 1 = 3

units of energy; task

a_{1}

is executed in

U A V_{1}

, taking up

5 / 2 = 2.5

seconds and

5 / 1 = 5

units of energy; task

a_{2}

is executed in

U D_{1}

, taking up

3 / 1 = 3

seconds; and task

b_{2}

is executed in

U D_{2}

, taking up

1 / 1 = 1

seconds. Then, the makespan of all tasks of

U D_{1}

is

m a x (2.5, 3) = 3

seconds, while the makespan of all tasks of

U D_{2}

is

m a x (2, 1) = 2

seconds. Similarly, we can easily obtain the energy cost and makespan in the next two slots (see the detailed values in Figure 2a). Figure 2b presents the other solution. We can easily find that the total percentages of energy use of

U A V_{1}

and

U A V_{2}

in Figure 2a are

\frac{5 + 10 + 6}{20} = 105 %

and

\frac{3 + 4 + 3}{20} = 50 %

, while in Figure 2b, they are

\frac{5 + 8 + 6}{20} = 95 %

and

\frac{3 + 6 + 3}{20} = 60 %

. The average makespan in Figure 2a is

\frac{3 + 2 + 3 + 5 + 2 + 3}{6} = 3

, and in Figure 2b, it is

\frac{3 + 4 + 2 + 2 + 5 + 3}{6} = 3.16

. Considering that the scenario in Figure 1 is online, meaning that users will constantly generate tasks, to ensure that users have a good experience in offloading tasks to the UAVs, the system should have rapidity to finish tasks in each slot (i.e., minimize the average makespan) as soon as possible, and durability to serve (i.e., minimize the maximum energy cost) as long as possible.

In this paper, we are committed to exploring the optimization of both efficiency and durability to identify an optimal task offloading solution in UAV-enabled MEC. However, solving this problem poses three major challenges:

(i) The existing research [10,11,12] predominantly relies on offline task scheduling or prior knowledge of tasks. In contrast, practical applications such as autonomous driving and real-time video analytics typically involve the arrival of stochastic real-time task data. This implies that the current decision making must consider the potential influence of future tasks.

(ii) Most research [4,7,13] efforts concentrate on keeping energy consumption below a certain limit or minimizing the overall energy consumption throughout the process. However, effective energy management should not only aim at minimization but also strive for balance. This is crucial to ensure that UAVs equipped with high computational resources can operate for extended durations.

(iii) It is commonly assumed that task offloading occurs at the beginning of each time slot [3,5,6]. Given that transmitting the task to the UAV will inevitably consume the available time within the time slot, careful bandwidth allocation is essential.

To address these challenges, we model the joint optimization problem of task scheduling in an online fashion instead of an offline one. This approach optimizes the average values across all time slots. Next, to ensure the stable operation of the MEC system with randomly arriving tasks, we utilize Lyapunov optimization theory to introduce a virtual queue that can evenly distribute energy consumption. To solve the online model more accurately, we further decompose the multi-stage stochastic problem into a series of deterministic problems, which can be effectively solved by reinforcement learning. The main contributions of this study are as follows:

We present a novel online optimization framework to help users complete tasks quickly and provide them with more durable services. The framework integrates Lyapunov optimization and Mixed Integer Nonlinear Programming (MINLP), which dynamically adapts to real-time task arrivals while jointly optimizing task response time (by minimizing makespan) and energy consumption. Through Lyapunov optimization, we ensure a uniform energy distribution across the UAV, thereby effectively mitigating energy consumption fluctuations caused by bursty tasks while responding in a timely manner in the absence of a priori knowledge of the data.
We introduce Lyapunov virtual energy queues, decompose the problem into a real-time optimization problem (PROP) for each time slot via Lyapunov drift, and design the reinforcement learning-based algorithm LyraRD. LyraRD uniquely combines Lyapunov-induced stability constraints with adaptive decision making to reduce the computational complexity to millisecond response times.
We investigate the performance of our method under different parameter settings. With extensive experiments, we demonstrate that our method performs better than the benchmarks.

The rest of the paper is organized as follows. Section 2 summarizes the related work. Section 3 introduces the related system model and problem formulation in detail. Then, we describe the Lyapunov-based problem transformation in detail in Section 4. In Section 5, we introduce the two-stage optimization algorithm and theoretical analysis. Our evaluation method and experimental results are shown and analyzed in Section 6. Finally, this paper is concluded in Section 7.

2. Related Work

Due to the high mobility and flexibility, UAV-enabled MEC systems have attracted widespread attention in recent years.

2.1. Real-Time Task Arrival

Most studies have based their strategies on user task requirements given as a priori knowledge [10,11]. However, in most MEC applications, such as autonomous driving and online games, computational tasks arrive randomly in real time [12], and there is a lack of access to the a priori knowledge of the tasks. Therefore, considering the random arrival of task data, Wan et al. [14] developed a path planning algorithm based on deep reinforcement learning by discretizing the action space into a finite set, including eight steering directions and hovering modes. Yang et al. [13] considered the stochastic nature of user tasks and developed a Lyapunov optimization based on perturbation-based algorithm based on an online algorithm to jointly optimize the energy consumption and task processing rate of UAVs while maintaining the stability of long-term data queues. When considering user mobility and random task arrivals, Yang et al. proposed an online control algorithm for UAV trajectory and resource scheduling to minimize a certain budgeted energy consumption of the UAV in a certain budgeted energy consumption by converting the long-term optimization problem into an optimization problem for the real-time control of a real-time control system in [15]. Yan et al. [16] proposed an artificial noise-based UAV-assisted non-orthogonal multiple access (NOMA) secure communication scheme, whose channel dynamics analysis and resource allocation mechanism provide a theoretical basis for future extension of security enhancement strategies. Ali et al. [17] proposed a novel multi-objective adaptive learning framework called MOALF-UAV-MEC, tailored for dynamic IoT environments, which provides a scalable and adaptive solution for deploying computational resources in infrastructure-limited regions, ad hoc events or emergencies. However, they simply add the time delays linearly as the completion delay of the task, which is biased compared to the actual response delay of the task. For this reason, in this paper, we introduce the maximum completion delay (i.e., makespan) in the scheduling problem to represent the actual response delay of the task and improve the response efficiency of MEC by reducing it.

Meanwhile, most of the existing studies consider that the UAV computes the offloaded tasks at the initial moment of each time slot. Zeng et al. [11] optimized resource allocation and UAV trajectory to reduce the waiting latency of the task queue. Sheng et al. [18] considered the simultaneous occurrence of the time-varying/random ground channel and the air-to-ground line-of-sight channel, and optimized the UAV trajectory and the task offload strategy to ensure the robustness of task processing and, at the same time, minimize energy consumption.

All of the above studies theoretically assume that the computational delay of the tasks offloaded to the UAV should be less than or equal to the time slot length. However, when the computational delay of the offloaded tasks happens to be equal to the time slot length, due to the effect of the task transmission delay, it will result in part of the tasks not being able to become computed in all time slots (e.g., task

b_{2}

on the

U A V_{1}

at

T = 2

in Figure 2). If this remaining task is ignored, it will lead to part of the task losing packets, so this paper considers deferring the remaining task to the next time slot, which creates a period of deferred delay, thus causing the UAV pair to start calculating the task not at the initial moment of the time slot. The above effects should be taken into account when calculating the task response delay.

2.2. UAV Energy Management

Traditional UAV energy control mainly considers the available energy consumption of a given UAV and ensures that the energy consumption of the entire process is less than this [11,19,20], and this approach of energy control is mostly applicable to task scheduling problems with given a priori knowledge, where the allocation of the energy consumption of the whole MEC process is adjusted with the number of tasks unloaded in the time slot. New studies on UAV energy management and control are also underway. Michailidis et al. [21] maximize the minimum security computational efficiency (SCE) in a UAV-RIS-assisted MEC-IoT network in the presence of airborne and ground eavesdroppers by jointly optimizing the transmission power allocation, time-slot scheduling, task assignment, and phase-shift of the RIS to ensure that the system’s energy consumption and security performance are optimized. Lakew et al. [22] proposed a MEC model assisted in LEO satellites, in which UAVs can acquire a series of energy packets at the beginning of each time slot by the energy harvesting (EH) technique, which in turn efficiently manages the energy harvesting and consumption of the EH-UAVs to optimize the performance of the whole system and the mission accomplishment rate. However, when there is a lack of a priori knowledge about the tasks, i.e., the tasks arrive randomly, this energy management approach will lead to the uneven distribution of energy consumption and thus insufficient reservation, which will reduce the computational efficiency in the face of unexpected large-scale tasks and thus increase the response latency of the whole MEC system. For this reason, Lyapunov optimization methods are commonly used to stabilize the energy use of each time slot.

Wang et al. [23] considered the joint optimization problem of UAV trajectories and offloading decisions and used Lyapunov optimization methods to satisfy the long-time energy constraints of UAVs while making the energy allocation uniform to minimize the sum of the service latency linearity of each time slot. Qin et al. [24] considered task cache policies, UAV trajectories, and resource allocation policies to reduce the sum of service delay linearity.

In recent years, reinforcement learning has shown significant potential in UAV energy optimization. For example, Li et al. [25] balanced the multidimensional constraints of ground–air cooperative communication using multi-objective deep reinforcement learning, and proposed a ground–air cooperative communication framework based on Evolutionary Multi-objective Deep Reinforcement Learning (EMODRL), which balanced the multidimensional objectives of latency, energy consumption, and coverage through Pareto frontier search. Prakhar et al. [26] presented the SFRL method for the purpose of distributed ML model learning in wireless UAV-assisted MEC AI model networks. Compared to existing distributed learning algorithms, the proposed method yields comparable high test accuracy performance while consuming less energy. And Wu et al. [27] designed an evolutionary reinforcement learning framework for incomplete information scenarios to improve the strategy robustness through action sequence search, and its evolutionary exploration mechanism can provide a reference for the improvement of strategy generalizability under stochastic task arrival.

2.3. Differences of Our Work

Sequential Task Execution with Dynamic Deferral. Unlike existing approaches that either ignore deferred tasks or assume instantaneous computation starts at slot boundaries, we explicitly model the inter-slot dependency of task execution. By deferring unfinished tasks to subsequent slots and accounting for their cascading delays, our framework minimizes the actual response latency (via makespan optimization) while maintaining queue stability. This contrasts with methods like [13,15], which linearly aggregate delays and fail to capture the nonlinear impact of deferred computations.

Non-Uniform Energy Balancing via Lyapunov-Guided RL. While previous studies can address problems such as UAV task offloading based on Lyapunov optimization [23,24] stabilize energy queues, they do not integrate reinforcement learning (RL) to adaptively balance energy consumption across heterogeneous UAVs. Our LyraRD algorithm uniquely combines Lyapunov optimization with RL to dynamically adjust offloading decisions, ensuring that UAVs with higher resource capacities sustain longer operational durations. This prevents energy “hotspots” and improves the resilience of the system, addressing a critical gap in works such as [17,25], which treat energy allocation statically.

Inspired by these gaps, we consider a UAV-enabled MEC system with randomly arriving tasks. Task timeliness is ensured by deferring unprocessed tasks from the previous moment to the next moment, while UAV energy usage is controlled throughout the process to minimize the UAV energy consumption and actual response time of the system.

3. System Model and Problem Formulation

In this section, we first introduce a UAV-enabled MEC framework that is designed to offload computational tasks from UDs. Next, we model the latency and energy consumption of task offloading, transmission, and computation. We then present a joint optimization problem with the goal of minimizing the weighted sum of the actual task response latency and energy usage of UAVs across the MEC system.

3.1. System Overview

As illustrated in Figure 1, a MEC system enabled by UAVs consists of I UDs and J rotation UAVs, where the set

I = {1, 2, \dots, I}

, and

J = {1, 2, \dots, J}

. The system timeline is discretized into T equal time slots [28], i.e.,

t \in T = {1, 2, \dots, T}

, where the duration of each time slot is denoted as

T_{s}

. The UAV acts as an aerial edge server of MEC, providing computation offloading services for UDs in each time slot.

UD Model. We assume that there are several computation tasks that arrive at each UD in each time slot [29]. For

U D_{i} (i \in I)

, the properties of this UD in time slot t can be expressed as

M_{i} (t) = (f_{i}, Φ_{i} (t), P_{i} (t))

, where

f_{i}

represents the local computation capability of

U D_{i}

, and

P_{i} (t)

represents the coordinates of

U D_{i}

. The task that reaches

U D_{i}

at time t is

Φ_{i} (t)

, which consists of k tasks:

Φ_{i} (t) = (α_{i}^{1} (t), α_{i}^{2} (t), α_{i}^{3} (t), \dots, α_{i}^{k} (t))

. These tasks are expressed in the form

α_{i}^{k} (t) = {D_{i}^{k} (t), η_{i}^{k} (t)}

, where

D_{i}^{k} (t)

and

η_{i}^{k} (t)

represent the amount of data and computing intensity of task

α_{i}^{k} (t)

generated at time t.

UAV Model. The attributes of

U A V_{j} (j \in J)

are represented as

U_{j} (t) = (r_{j}, ω_{j}, P_{j} (t), H)

, where

r_{j}

and

ω_{j}

represent the proportion of computing resources and bandwidth, respectively.

P_{j} (t)

represents the coordinates of the

U A V_{j}

at time t, and H represents the altitude [30].

Decision Variables. Decision variables include the task offloading strategy and the UAV resource allocation strategy. (i) Task Offloading Decision. For task

Φ_{i} (t)

, define a binary variable

a_{i, j}^{k} (t)

to represent the decision to load

U D_{i}

in the time period t, where

a_{i, j}^{k} (t) = 0

means that the

k_{t h}

task of

U D_{i}

in the time period t is processed locally, and

a_{i, j}^{k} (t) = 1

means that the

k_{t h}

task of

U D_{i}

in time period t is loaded into

U A V_{j}

for processing. (ii) Resource Allocation Decision. For UAVs, the communication resources and bandwidth resources allocated to the task k unloaded at time t are

r_{j}^{i, k} (t)

and

ω_{j}^{i, k} (t)

.

These factors affect each other, so joint decision making is required.

3.2. Communication Model

The probabilistic line-of-sight (LoS) channel model is used to simulate the communication between

U D_{i}

and

U A V_{j}

[19]. First, the probability of LoS and NLoS communication between

U D_{i}

and

U A V_{j}

in time period t is shown in (1) and (2) [31]:

P_{i, j}^{los} (t) = \frac{1}{1 + ξ_{1} exp (- ξ_{2} (θ_{i, j} (t) - ξ_{1}))}

(1)

and

{\bar{P}}_{i, j}^{l o s} (t) = P_{i, j}^{los} (t) + (1 - P_{i, j}^{los} (t)) κ

(2)

where

ξ_{1}

and

ξ_{2}

are constants depending on the propagation environment,

θ_{i, j} (t) = \frac{180}{π} arcsin \frac{H}{d_{i, j} (t)}

represents the elevation angle,

d_{i, j} (t)

represents the straight-line distance between

U D_{i}

and

U A V_{j}

, and

κ

represents the additional attenuation factor.

According to [32], the communication rate of

U D_{i}

in time slot t can be expressed as Equation (3):

R_{i, j}^{k} (t) = ω_{j}^{i, k} (t) φ_{i, j} (t)

(3)

where

φ_{i, j} (t)

is the spectrum efficiency of

U D_{i}

, which can be calculated as (4)–(5):

\begin{matrix} φ_{i, j} (t) = {log}_{2} (1 + \frac{θ_{i} (t)}{(| | P_{j} (t) - P_{i} (t) | |^{2} + H^{2})^{μ}}) \end{matrix}

(4)

\begin{matrix} θ_{i} (t) = \frac{P_{i} β_{0} {\bar{P}}_{i, j}^{l o s} (t)}{N_{0}}, μ = \frac{\bar{μ}}{2} \end{matrix}

(5)

where

β_{0}

represents the channel gain at the reference distance of

1 m

,

N 0

represents the noise power,

P_{i}

is the transmission power of

U D_{i}

, and

\bar{μ}

represents the path loss exponent.

3.3. Computation Model

Local Computation. The local completion delay of the task

α_{i}^{k} (t)

arriving at

U D_{i}

at time slot t can be calculated as Equation (6):

T_{i}^{k} {(t)}^{l o c} = \frac{η_{i}^{k} (t) D_{i}^{k} (t)}{f_{i}}

(6)

Correspondingly, its energy consumption can be calculated as (7) [33]

E_{i}^{k} {(t)}^{l o c} = k_{0} {(f_{i})}^{3} T_{i}^{k} {(t)}^{l o c}

(7)

where

k_{0}

represents the hardware architecture parameter.

Edge Computation. The transmission delay of the task

α_{i}^{k} (t)

arriving at

U D_{i}

at time slot t to be offloaded to

U A V_{j}

can be calculated as Equation (8):

T_{i, j}^{k} {(t)}^{P} = \frac{D_{i}^{k} (t)}{R_{i, j}^{k} (t)}

(8)

And the edge computing delay of this process can be calculated as Equation (9):

T_{i, j}^{k} {(t)}^{C} = \frac{η_{i}^{k} (t) D_{i}^{k} (t)}{r_{j}^{i, k} (t)}

(9)

Accordingly, the energy consumption of transmission can be expressed as (10):

E_{i, j}^{k} {(t)}^{P} = P_{i} \frac{D_{i}^{k} (t)}{R_{i, j}^{k} (t)}

(10)

where

P_{i}

represents the transmission power.

The energy consumption of edge computing can be expressed as Equation (11):

E_{i, j}^{k} {(t)}^{C} = ϖ η_{i}^{k} (t) D_{i}^{k} (t)

(11)

where

ϖ

represents the energy consumption of the UAV per unit CPU cycle.

3.4. Cost Model

UD Cost. Similar to [34,35], from the user’s perspective, in each time period, the task completion delay

T_{i} (t)

and UD energy consumption

E_{i} (t)

are considered the cost of UD, which reflects the quality of experience of the UD.

In fact, the task calculations between each UD and UAV are parallel within a certain period of time, so simply optimizing the linear sum of all delays is meaningless to reduce the delay of the entire MEC process. Therefore, this paper borrows the concept of the minimum maximum completion time in the flexible workshop scheduling problem.

T_{i} (t)

can be formulated as (12)

T_{i} (t) = Min \{Max \{T_{i}^{l o c} (t), T_{i}^{m e c} (t)\}, T_{s}\}

(12)

where

T_{i}^{l o c} (t) = \sum_{k = 1}^{K} (1 - a_{i, j}^{k} (t)) T_{i}^{k} {(t)}^{l o c}

(13)

and

T_{i}^{m e c} (t) = Max {j \in J | Δ_{t}^{j} + \sum_{i = 1}^{I} \sum_{k = 1}^{K} a_{i, j}^{k} (t) T_{i, j}^{k} {(t)}^{C})}

(14)

T_{i}^{l o c} (t)

represents the sum of the latency of all local computing tasks of

U D_{i}

.

T_{i}^{m e c} (t)

represents the edge task completion delay of

U D_{i}

, that is, the time from the start of the time slot to the completion of the last task of

U D_{i}

calculated by each UAV (makespan).

\sum_{k = 1}^{K} (1 - a_{i, j}^{k} (t)) T_{i}^{k} {(t)}^{l o c}

is the total local processing time of

U D_{i}

. The maximum value of

T_{i}^{m e c} (t)

and

\sum_{k = 1}^{K} (1 - a_{i, j}^{k} (t)) T_{i}^{k} {(t)}^{l o c}

is the task response delay of time slot t. Since the task response delay cannot exceed the time slot length, the smaller one of this value and the time slot length is taken.

E_{i} (t)

can be presented as (15)

E_{i} (t) = \sum_{k = 1}^{K} (1 - a_{i, j}^{k} (t)) E_{i}^{k} {(t)}^{l o c} + \sum_{k = 1}^{K} a_{i, j}^{k} (t) E_{i, j}^{k} {(t)}^{P}

(15)

Then, the cost of

U D_{i}

at time slot t can be given as (16) [30,35]

C_{i} (t) = γ_{i} T_{i} (t) + (1 - γ_{i}) E_{i} (t)

(16)

The weighted parameters

γ_{i}

of

T_{i} (t)

and

(1 - γ_{i})

of

E_{i} (t)

can be flexibly set according to the preference of the UD for latency and energy consumption.

UAV Energy Cost. The cost

E_{j} (t)

of the

U A V_{j}

in time slot t includes the computing energy consumption and movement energy consumption (17):

E_{j} (t) = E_{j}^{C} (t) + E_{j}^{P} (t)

(17)

where the computing energy consumption can be represented as (18)

E_{j}^{C} (t) = \sum_{i = 1}^{I} \sum_{k = 1}^{K} a_{i, j}^{k} (t) E_{i, j}^{k} {(t)}^{C}

(18)

and the movement energy consumption can be represented as (19) [36]

E_{j}^{P} (t) = \frac{W_{j}}{2} {(v_{j} (t))}^{2} T_{P}

(19)

where

W_{j}

is the weight of the

U A V_{j}

,

v_{j} (t)

is the flight speed of the

U A V_{j}

, and

T_{P}

is the travel time. Due to the randomness and uncertainty of the arrival of UD tasks, we consider that each UAV travels along a certain regular route, so this part of the energy consumption is fixed in [21,22].

3.5. Data Queue Model

A_{i} (t)

represents the task data that arrive at

U D_{i}

at time t. In the actual process, they are random. Let

Q_{i} (t)

denote the queue length of the

i_{t h}

UD at the beginning of the time frame. Then, the queue dynamics can be modeled as

Q_{i} (t + 1) = max \{Q_{i} (t) - {\bar{D}}_{i} (t) + A_{i} (t), 0\}, \forall i \in I

(20)

where

{\bar{D}}_{i}^{=} m i n (Q_{i} (t), D_{i} (t))

,

D_{i} (t)

represents the amount of task data processed at time t, and

Q_{i} (1) = 0

. In this paper, we consider the infinite queuing capacity for analytical tractability. We force

Q_{i} (t) \geq 0

to always hold. Then, the queue dynamics are simplified to

Q_{i} (t + 1) = Q_{i} (t) - D_{i} (t) + A_{i} (t), \forall i \in I

(21)

Definition 1.

A discrete time queue

Q_{i} (t)

is strongly stable if the time average queue length

{lim}_{K \to \infty} \frac{1}{T} \sum_{t = 1}^{T} E [Q_{i} (t)] \leq \infty

, where the expectation is taken with respect to the system random events [30], i.e., task data arrivals in this paper.

By the Little’s law, the average delay is proportional to the average queue length. Therefore, a strongly stable data queue translates into a bounded processing delay for each task data bit.

3.6. Problem Formulation

Based on the above model, we minimize the average cost of the entire MEC system over time by jointly optimizing unloading decisions and resource allocation. The joint optimization problem can be expressed as (22) [30]:

P : min_{A, F, W} \frac{1}{T} \sum_{t = 1}^{T} \sum_{i = 1}^{I} C_{i} (t)

(22)

\begin{matrix} s . t . \lim_{T \to + \infty} \frac{1}{T} \sum_{t = 1}^{T} E {E_{j} (t)} \leq \bar{E_{j}} \end{matrix}

(23)

\begin{matrix} a_{i, j}^{k} (t) \in {0, 1}, \forall i \in I, k \in K, j \in J, t \in T \end{matrix}

(24)

\begin{matrix} 0 \leq r_{j}^{i, k} (t) \leq r_{j}, \forall j \in J, t \in T \end{matrix}

(25)

\begin{matrix} \sum_{i = 1}^{I} \sum_{k = 1}^{K} a_{i, j}^{k} (t) r_{j}^{i, k} (t) \leq r_{j}, \forall i \in I, k \in K, j \in J, t \in T \end{matrix}

(26)

\begin{matrix} lim_{T \to \infty} \frac{1}{T} \cdot \sum_{t = 1}^{T} E [Q_{i} (t)] \leq \infty, \forall t \in T \end{matrix}

(27)

where

\bar{E_{j}}

is the energy upper limit of a single time slot, and

F_{\max}

is the total amount of computing resources. Constraint (23) is to ensure that the energy consumption of each UAV in each time slot does not exceed the upper limit. The goal of (24) is to ensure that all computing tasks arriving at the UD can be processed. (25) and (26) are communication resource constraints. (27) is the data queue stability constraint.

The solution to problem P depends on long-term information, such as the arrival of tasks at different times. More seriously, problem P contains two decision variables, task offloading and resource allocation, and is a Mixed Integer Nonlinear Programming problem (MINLP). Even if the future information is known, it is challenging to solve the problem directly.

4. Lyapunov-Based Decoupling of the Multi-Slot MINLP

In this section, we apply Lyapunov optimization to decouple the problem P into a per-frame deterministic problem. Since the problem P is related to future information, we use Lyapunov optimization to decompose the conditions with long-term constraints (i.e., the long-term energy constraints of the UAV (23)) into each time slot.

First, in order to satisfy constraint (23), a virtual energy queue is introduced: UAV computing energy queue

Y_{j}^{c} (t)

. Assume that the queue is set to zero at the initial time slot,

Y_{j}^{c} (t) = 0

. And it is updated as (28)

Y_{j}^{c} (t + 1) = max \{Y_{j}^{c} (t) + E_{j}^{c} (t) - {\bar{E}}_{j}^{c}, 0\}, \forall t \in T

(28)

where

{\bar{E}}_{j}^{c}

represents the computational energy budget of

U A V_{j}

in each time slot. The backlog state of the energy queue is represented by the vector

Θ_{j} (t) = {Y_{j}^{c} (t)}

, and the scalar measure of the backlog of the energy queue can be represented by the function

L (Θ_{j} (t))

(29):

L (Θ_{j} (t)) = {(Y_{j}^{c} (t))}^{2}

(29)

Secondly, we define the Lyapunov drift, which is used to calculate the increment

δ Θ_{j} (t)

of the energy queue backlog from time slot t to time slot

t + 1

(30):

δ θ_{j} (t) = L (Θ_{j} (t + 1)) - L (Θ_{j} (t))

(30)

The drift penalty expression for problem P can be given as (31)

\begin{matrix} P_{1} : & min_{A, F, W} δ θ_{j} (t) + V \sum_{i = 1}^{I} C_{i} (t) \\ s . t . & (24), (25), (26), (27) \end{matrix}

(31)

where V is the parameter that weighs the total cost against the stability of the queue. Then, the scaling transformation of problem P is as follows:

\begin{matrix} δ θ_{j} (t) + V P (t) & = \frac{1}{2} \sum_{j = 1}^{J} ({(Y_{j} (t + 1))}^{2} - {(Y_{j} (t))}^{2}) + V P (t) \end{matrix}

(32)

\begin{matrix} \leq \frac{1}{2} \sum_{j = 1}^{J} {(y_{j} (t))}^{2} + \sum_{j = 1}^{J} Y_{j} (t) y_{j} (t) + V P (t) \end{matrix}

(33)

\begin{matrix} \leq B_{m a x} + \sum_{j = 1}^{J} Y_{j} (t) y_{j} (t) + V P (t) \end{matrix}

(34)

where

Y_{j} (t)

represents the energy queue,

y_{j} (t)

represents the optimization goal,

P (t)

represents the constraint condition, and

B_{m a x}

represents the positive constant of the constraint condition.

Finally, the drift penalty expression for problem P after scaling is shown as (35)

P_{2} : min_{A^{t}, F^{t}, W^{t}} \sum_{j = 1}^{J} Y_{j}^{c} (t) E_{j}^{c} (t) + V \sum_{i = 1}^{I} C_{i} (t)

(35)

\begin{matrix} s . t . a_{i, j}^{k} (t) \in {0, 1}, \forall i \in I, k \in K, j \in J, t \in T \end{matrix}

(36)

\begin{matrix} 0 \leq r_{j}^{i, k} (t) \leq r_{j}, \forall j \in J, t \in T \end{matrix}

(37)

\begin{matrix} \sum_{i = 1}^{I} \sum_{k = 1}^{K} a_{i, j}^{k} (t) r_{j}^{i, k} (t) \leq r_{j}, \forall i \in I, k \in K, j \in J, t \in T \end{matrix}

(38)

and we remove the irrelevant constant term from

P_{2}

and obtain the problem

P_{2}^{'}

(39):

\begin{matrix} P_{2}^{'} : & min_{A^{t}, F^{t}, W^{t}} \sum_{j = 1}^{J} \frac{Y_{j}^{c} (t) E_{j}^{c} (t)}{V} + \sum_{i = 1}^{I} C_{i} (t) \\ s . t . & (36), (37), (38) \end{matrix}

(39)

Since the Lyapunov drift relies on information in the

t + 1

time slot, the problem P is scaled to become a real-time optimization problem that only needs information in a certain time slot. Due to the limited space, please refer to this article [37] for the related reasoning and proof.

Nevertheless, the new problem

P_{2}^{'}

is still an MINLP problem. The two decision variables, task offloading strategy and resource allocation strategy, still affect each other. In order to solve the optimal solution of the

P_{2}^{'}

problem, we transform the complex

P_{2}^{'}

problem into two sub-problems:

P_{3}

and

P_{4}

. Then, we design a joint optimization method to solve it.

5. Joint Optimization Algorithm

5.1. Optimal Resource Allocation Algorithm

Although the problem

P_{2}^{'}

is still a multi-slot MINLP optimization problem, when considering a certain slot, we are surprised to find a subtle relationship between the two decision variables. That is, given the slot task offloading strategy, there is and will be a unique set of resource allocation strategies corresponding to it so that the objective function value of the slot reaches the optimal value. The derivation of this subtle relationship is as follows:

When the task offload strategy

A^{t_{ϕ}}

is given, the UAV resource allocation strategy is adjusted to optimize the objective function

P_{3}

(40):

\begin{matrix} P_{3} : & min_{F^{t_{ϕ}}, W^{t_{ϕ}}} \sum_{i = 1}^{I} γ_{i} T_{i} (t) + \sum_{i = 1}^{I} \sum_{k = 1}^{K} (1 - γ_{i}) a_{i, j}^{k} (t) E_{i, j}^{k} {(t)}^{P} \\ s . t . & (36), (37), (38) \end{matrix}

(40)

Substituting Formulas (8)–(10) into problem

P_{3}

, we can obtain problem

P_{3}^{'}

after removing the fixed constant term and simplifying the coefficients (41)

P_{3}^{'} : min \frac{k_{1}}{n_{1}} + \frac{k_{2}}{n_{2}} + \frac{k_{3}}{n_{3}} + \dots + \frac{k_{ζ}}{n_{ζ}} + \dots

(41)

\begin{matrix} s . t . n_{1}, n_{2}, n_{3}, \dots, n_{ζ}, \dots, n_{l a s t} \geq 0 \end{matrix}

(42)

\begin{matrix} n_{1} + n_{2} + n_{3} + \dots + n_{ζ} + \dots + n_{l a s t} \leq N, \end{matrix}

(43)

where

n_{ζ}

represents the computing resources allocated to the task numbered

ζ

of all tasks unloaded to

U A V_{j}

,

k_{ζ}

(known quantity) is the simplified coefficient in the objective function, and N represents the total computing resources of

U A V_{j}

.

Taking computing resources as an example, problem

P_{3}

becomes problem

P_{3}^{'}

, which can be processed using the Lagrange multiplier method. It is proved that problem

P_{3}^{'}

has only one solution (Appendix A) [38]. Based on this, we firstly design Algorithm 1 to solve the optimal resource and bandwidth allocation strategy when a given task offloading strategy is given.

Algorithm 1: Algorithm for optimal resource allocation of (

P_{3}^{'}

).

Now that we have the optimal resource and bandwidth allocation strategies

F^{t}

and

W^{t}

, solving the problem

P_{2}^{'}

is equivalent to finding the optimal task offloading strategy

A^{t}

. To do this, we introduce a new subproblem

P_{4}

of

P_{2}^{'}

(44):

\begin{matrix} P_{4} : & (A_{t}^{*}) = \underset{F_{t}^{*}, W_{t}^{*}}{arg min} \sum_{j = 1}^{J} \frac{Y_{j}^{c} (t) E_{j}^{c} (t)}{V} + \sum_{i = 1}^{I} C_{i} (t) \\ s . t . & (36) \end{matrix}

(44)

Considering the different ways of offloading tasks of

U D_{i}

, we can introduce the optimization cost function

O_{i} (A_{t})

(45) of the

U D_{i}

task offloading strategy to simplify the problem

P_{4}

into

P_{4}^{'}

:

O_{i} (A_{t}) = \{\begin{cases} \begin{matrix} \sum_{j = 1}^{J} \frac{Y_{j}^{c} (t) E_{i, j}^{k} {(t)}^{C}}{V} + \\ γ_{i} T_{i}^{m e c} (t) + (1 - γ_{i}) E_{i}^{m e c} (t) \end{matrix} & a_{i, j}^{k} (t) = 1 \\ γ_{i} T_{i}^{l o c} (t) + (1 - γ_{i}) E_{i}^{l o c} (t) & a_{i, j}^{k} (t) = 0 \end{cases}

(45)

Then, we can obtain

P_{4}^{'}

(46):

\begin{matrix} P_{4}^{'} : & (A_{t}^{*}) = \underset{F_{t}^{*}, W_{t}^{*}}{arg min} O_{i} (A_{t}) \\ s . t . & (36) \end{matrix}

(46)

To tackle problem

P_{4}^{'}

, we propose a Lyapunov-guided reinforcement learning-based rapid and durable optimization algorithm (LyraRD) framework. Traditional branch and bound and block coordinate descent methods require enumerating

2^{N}

unloading strategies, which brings great complexity even when N is a moderate value (for example, N = 10). Using RL techniques, we propose a LyraRD algorithm to construct a policy

π

that maps from input

I (t)

to optimal actions

x^{*} (t)

with very low complexity, for example, requiring tens of milliseconds of computation time (that is, the duration from observing a state to producing a control action

x (t)

) when N = 10.

5.2. LyraRD Algorithm Description

As shown in Figure 3, LyraRD consists of four main modules: the actor module accepts input

I (t)

and produces a set of candidate offloading actions

x (t)

, the critic module evaluates

x (t)

and selects the best offloading action

x^{*} (t)

, the policy update module improves the actor module’s policy over time, and the queue module updates the state of the system queue

Y (t)

after performing an offloading action. Through repeated interactions with the stochastic environment

A (t)

, the four modules operate sequentially and iteratively as described below.

(1) Actor module: The actor module consists of a DNN and an action quantizer. The observation value

I (t)

is taken as input, where it includes the amount of task data

A (t)

generated by each UD at the current time slot and the virtual energy queue value

Y (t)

for each UAV that performs long-term energy constraints. The network architecture of DNN consists of an input layer, a convolutional layer, and a fully connected layer. The features are extracted by convolving the input

I (t)

with three layers through the ReLU activation function. The output of the convolutional layer is flattened for input to the fully connected layer. The fully connected layer, containing 64 neurons with the ReLU activation function, is used to integrate the global features. The DNN outputs a relaxed unloading decision

x (t) \in {[0, 1]}^{N}

. Then, a sigmoid activation function is used in the output layer to quantize the continuous

x (t)

into feasible candidate binary offloading actions

M_{t}

, where

M_{t}

is a time-dependent design parameter. The quantization function is expressed as

\begin{matrix} x (t) \mapsto g = \{x_{i} (t) | x_{i} (t) \in {0, 1}^{N}, i = 1, \dots, M_{t}\} \end{matrix}

(47)

where

g

denotes the set of candidate offloading actions in the

t_{t h}

time frames.

(2) Critic module: Followed by the actor module, the critic module evaluates

x_{i} (t)

and selects the best unloading action

x^{*} (t)

according to formula (46):

\begin{matrix} L S (θ_{t}) = - \frac{1}{| S_{t} |} \sum_{T \in S_{t}} [x^{T} (t) log Π_{θ_{t}} (I (t)) \\ + {(1 - x (t))}^{T} log (1 - Π_{θ_{t}} (I (t)))] \end{matrix}

(48)

(3) Policy Update Module: LyraRD uses

(I (t), x (t))

as the labeled input–output samples to update the DNN policy. In particular, we maintain a replay memory that only stores the most recent data samples q. In practice, with the memory initially empty, we start training the DNN after collecting more than

q / 2

data samples. The DNN then is periodically trained every

δ_{t}

time interval to avoid overfitting the model. We then use the Adam algorithm [39] to minimize the average cross entropy loss function

L S (θ_{t})

(48) on the data samples to update the parameters of the DNN.

(4) Queuing module: As a by-product of the critic module, we obtain the optimal allocation of resources associated with

x^{*} (t)

. Then, we update the energy queue based on them and the newly arrived task data and iterate again from the actor module as the next round of system input.

The entire process of LyraRD algorithm is shown in Algorithm 2. The algorithm has been shown to have lower complexity and better convergence (LyraRD algorithm has been proven to have extremely low complexity and excellent convergence. For details, refer to [40]).

Algorithm 2: The online LyraRD algorithm for solving (P4’).

6. Simulation Results

In this section, we perform simulations to verify the effectiveness of the proposed method.

6.1. Platforms and Tasks Data

We conduct simulation experiments on a computer with the following specifications: CPU: Intel i7-13650HX 3.2 GHz; GPU: NVIDIA RTX 4060; RAM: 64 GB; OS: 64-bit Ubuntu 16.04. All algorithms are developed using Python 3.7.

We consider a MEC system enabled by UAVs consisting of 4 UAVs and 20 UDs, where the initial horizontal positions of the UAVs are set to P1(0) = [10, 10], P2(0) = [10, 80], P3(0) = [90, 20], P4(0) = [75, 85], respectively, with a fixed height of

H = 100

m. The UDs appear randomly in an area of 100 × 100

m^{2}

, and UAVs fly along fixed routes. The system timeline is discretized into 8000 time slots, each of which is 1 s long [41]. The remaining parameter settings are shown in the table in Appendix B.

In order to simulate the random arrival of data, we assume that each user reaches no more than three tasks at the same time, and the arrival forms of these tasks obey exponential, Poisson, and uniform distributions, respectively. That is, the task queue

Q 1

∼

E (λ)

,

Q 2

∼

π (λ^{'})

,

Q 3

∼

U (a, b)

. To compare the impact of different types of data arrival on the stability of the data queue, we make the expectation the same for all three distributions, i.e.,

λ = λ^{'} = (a + b) / 2

, where

λ = λ^{'} = 3, a = 0, b = 6

. Next, we obtain the simulation results shown in Figure 4, Figure 5 and Figure 6.

6.2. Overall Performance Analysis

Considering the randomness of data arrival, we fully conduct several experiments and select the four most representative experimental results to verify the performance of the proposed method. Figure 4 shows the training loss of the algorithm, which fully illustrates the convergence of the LyraRD algorithm.

Further, we observe the use of energy as shown in Figure 5, where the energy cohort fluctuates over the initial time period and gradually stabilizes. In Figure 5b–d, the energy queue still has a second peak after the initial fluctuation, but it gradually plateaus after that.

Based on the fluctuations in the results, we further investigate the arrival and processing of the data. First, observe the data queue, as shown in Figure 6; the data queue basically reaches a stable state after training. However, the data cohorts in Figure 6b–d all exhibit secondary fluctuations to varying degrees. In response to these fluctuations, we recover the specific values of the data in the queue during that period and analyze them.

Single UD data accumulation. In Figure 6b, the energy queue of the second peak is given by Figure 7. We find that the data queue of the 20th user has an abnormal accumulation that is significantly higher than that of other users. This abnormal accumulation may be due to the excessive amount of random data arriving from this user in the previous moment, resulting in untimely data calculation. In the short period of time after that, our algorithm processes these backlogged data, thus ensuring that the data queue returns to stability in the subsequent moments. This also confirms the energy consumption in Figure 5b.

Task-oriented offloading. The second peak value in Figure 6c is lower, so we speculate that there is no backlog in the data queue, but it is caused by more data being offloaded to the UAV for calculation. For this reason, we also observe the user data queue at this time. We can see in Figure 8 that the amount of data that need to be processed in the user’s data queue is very small, indicating that most of the data are offloaded to the UAV for calculation. The peak of the test energy consumption comes from the large amount of calculation of the UAV.

Large-scale distributed tasks. Then, we retrieve the abnormal data queue value in Figure 6d, as shown in Figure 9. It can be found that the pending value of each user’s subtask queue data is between 20 and 45. However, the sum of the pending data of Q1, Q2, and Q3 is 560.52, 556, and 576.38 respectively, which exceeds the data processing limit of the user and UAV in a period of time, thus causing data accumulation. As the task is continuously unloaded, the data queue returns to stability, and the energy consumption also tends to be stable.

The above results fully demonstrate that our method maintains stable data processing capabilities when dealing with randomly arriving data, while ensuring uniform and stable energy usage throughout the entire process, and can effectively cope with the sudden arrival of large-scale data.

6.3. Baseline Methods

Moreover, we compare LyraRD with the following benchmark schemes:

Average latency of tasks ( $A L T$ ): The optimization goal is the linear sum of the average task completion delay.
Equal resource allocation ( $E R A$ ): The UAV allocates computing and communication resources equally.

6.4. Evaluation Result

Figure 10 shows the case of cost in the three scenarios. The scheme proposed in this paper is

O B J

. First, when the value of V is not large (not more than 40), the cost case of

O B J

is not significantly lower than the other two comparative cases as expected, which is caused by the value of V being small and the cost being too focused on the stability of the energy-consuming queue. When the value of V is greater than or equal to 50, it can be seen that the cost of

O B J

is significantly lower than the cost case of

A L T

, and

O B J

shows good performance in combining the queue stability and the time overhead of the MEC system with the energy overhead of the UAV. Counterintuitively,

E R A

has a lower cost than

O B J

and

A L T

in some cases compared to

O B J

and

A L T

. This is due to the fact that computational task arrivals are randomized, and when generating offloading actions based on these task data,

E R A

outperforms

O B J

and

A L T

in terms of UAV resource allocation in the following scenario, i.e., a chaotic scenario where very few tasks are offloaded to the UAV. In spite of this, the

E R A

scheme does not demonstrate a clear superiority at cost, especially in resource-constrained scenarios with large-scale task arrivals. Finally, it can be seen that the proposed

O B J

scheme is able to adapt to different task data sizes and has relatively superior performance in terms of cost for MEC systems.

7. Conclusions

In this paper, we study a UAV-enabled MEC system that satisfies long-term energy consumption constraints with high response efficiency. The joint optimization of the offloading decision and resource allocation strategy is investigated to minimize the system cost. To solve this problem, we propose a joint optimization algorithm. Specifically, the multi-stage MINLP problem is transformed into a MINLP problem within a single time period using Lyapunov optimization, which is later solved using a reinforcement learning approach. Simulation results show that the method significantly reduces the system cost compared to the benchmark solution. However, our work has some limitations that need to be addressed in future work. First, the current model does not explicitly consider the motion patterns of the models, which may limit the applicability of the results in highly dynamic scenarios. Second, adaptive collaboration mechanisms for multi-UAV systems have been understudied. In the future, we will integrate realistic UD motion models and explore the interactions between dynamic UD behaviors and UAV trajectory optimization, as well as further investigating the adaptive collaboration of UAVs to enhance the utility of the framework. These extensions aim to provide more comprehensive solutions for complex UAV-based MEC networks in the real world.

Author Contributions

Conceptualization, H.Z. and T.C.; methodology, H.Z., B.R. and T.C.; software, H.Z.; validation, H.Z. and T.C.; formal analysis, H.Z. and B.R.; investigation, H.Z., T.C. and B.R.; resources, H.Z., T.C. and B.R.; writing—original draft preparation, H.Z., T.C., B.R. and R.L.; writing—review and editing, H.Z., T.C., B.R. and H.Y.; supervision, B.R. and T.C.; funding acquisition, B.R. and T.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partically supported by the National Natural Science Foundation of China (No. 62302510). The authors would like to thank the support by the COSTA: complex system optimization team of the College of System Engineering at NUDT.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Proof of the First Zonklar Equation

f (n_{1}, n_{2}, n_{3}, \dots, n_{i}) = \frac{k_{1}}{n_{1}} + \frac{k_{2}}{n_{2}} + \frac{k_{3}}{n_{3}} + \dots + \frac{k_{i}}{n_{i}}

subject to the constraints that

n_{1}, n_{2}, n_{3}, \dots, n_{i} \geq 0

and

n_{1} + n_{2} + n_{3} + \dots + n_{i} \leq N

.

To solve this problem using the method of Lagrange multipliers, we introduce a Lagrange multiplier

λ

and construct the Lagrangian function:

L (n_{1}, n_{2}, \dots, n_{i}, λ) = \sum_{j = 1}^{i} \frac{k_{j}}{n_{j}} + λ (N - \sum_{j = 1}^{i} n_{j})

Taking the partial derivatives of

L

with respect to each

n_{j}

and

λ

, and setting them to zero, we obtain:

\frac{\partial L}{\partial n_{j}} = - \frac{k_{j}}{n_{j}^{2}} - λ = 0 for j = 1, 2, \dots, i

\frac{\partial L}{\partial λ} = N - n_{1} - n_{2} - n_{3} - \dots - n_{i} = 0

From the first set of equations, we can solve for

λ

:

λ = - \frac{k_{j}}{n_{j}^{2}}

Since

λ

is the same for all j, we can set the expressions for

λ

equal to each other:

- \frac{k_{1}}{n_{1}^{2}} = - \frac{k_{2}}{n_{2}^{2}} = - \frac{k_{3}}{n_{3}^{2}} = \dots = - \frac{k_{i}}{n_{i}^{2}}

This implies

\frac{k_{1}}{n_{1}^{2}} = \frac{k_{2}}{n_{2}^{2}} = \frac{k_{3}}{n_{3}^{2}} = \dots = \frac{k_{i}}{n_{i}^{2}}

Let this common value be C, then

n_{j}^{2} = \frac{k_{j}}{C}

n_{j} = \sqrt{\frac{k_{j}}{C}}

Substituting

n_{j}

into the constraint

n_{1} + n_{2} + n_{3} + \dots + n_{i} = N

,

\sqrt{\frac{k_{1}}{C}} + \sqrt{\frac{k_{2}}{C}} + \sqrt{\frac{k_{3}}{C}} + \dots + \sqrt{\frac{k_{i}}{C}} = N

\sum_{j = 1}^{i} \sqrt{k_{j}} = N \sqrt{C}

C = {(\frac{\sum_{j = 1}^{i} \sqrt{k_{j}}}{N})}^{2}

Finally, substituting C back into the expression for

n_{j}

,

n_{j} = \sqrt{\frac{k_{j}}{C}} = \sqrt{\frac{k_{j}}{{(\frac{\sum_{j = 1}^{i} \sqrt{k_{j}}}{N})}^{2}}} = \frac{N \sqrt{k_{j}}}{\sum_{j = 1}^{i} \sqrt{k_{j}}}

Therefore, the values of

n_{j}

are

\begin{matrix} n_{j} = \frac{N \sqrt{k_{j}}}{\sum_{j = 1}^{i} \sqrt{k_{j}}} \end{matrix}

Appendix B. Parameter Settings

References

Dash, S.; Ahmad, M.; Iqbal, T. Mobile cloud computing: A green perspective. In Intelligent Systems: Proceedings of ICMIB 2020; Springer: Berlin/Heidelberg, Germany, 2021; pp. 523–533. [Google Scholar]
Xu, Y.; Zhang, T.; Liu, Y.; Yang, D.; Xiao, L.; Tao, M. UAV-assisted MEC networks with aerial and ground cooperation. IEEE Trans. Wirel. Commun. 2021, 20, 7712–7727. [Google Scholar] [CrossRef]
Ranaweera, P.; Jurcut, A.; Liyanage, M. MEC-enabled 5G use cases: A survey on security vulnerabilities and countermeasures. ACM Comput. Surv. (CSUR) 2021, 54, 186. [Google Scholar]
Zhao, R.; Fan, C.; Ou, J.; Fan, D.; Ou, J.; Tang, M. Impact of direct links on intelligent reflect surface-aided MEC networks. Phys. Commun. 2022, 55, 101905. [Google Scholar]
Dai, P.; Song, F.; Liu, K.; Dai, Y.; Zhou, P.; Guo, S. Edge intelligence for adaptive multimedia streaming in heterogeneous internet of vehicles. IEEE Trans. Mob. Comput. 2021, 22, 1464–1478. [Google Scholar]
Khan, M.A.; Baccour, E.; Chkirbene, Z.; Erbad, A.; Hamila, R.; Hamdi, M.; Gabbouj, M. A survey on mobile edge computing for video streaming: Opportunities and challenges. IEEE Access 2022, 10, 120514–120550. [Google Scholar]
Yang, C.; Xu, X.; Zhou, X.; Qi, L. Deep Q network–driven task offloading for efficient multimedia data analysis in edge computing–assisted IoV. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2022, 18, 124. [Google Scholar] [CrossRef]
Pervez, F.; Sultana, A.; Yang, C.; Zhao, L. Energy and latency efficient joint communication and computation optimization in a multi-UAV assisted MEC network. IEEE Trans. Wirel. Commun. 2023, 23, 1728–1741. [Google Scholar]
Du, J.; Wang, J.; Sun, A.; Qu, J.; Zhang, J.; Wu, C.; Niyato, D. Joint optimization in blockchain and mec enabled space-air-ground integrated networks. IEEE Internet Things J. 2024, 11, 31862–31877. [Google Scholar]
Zhan, C.; Hu, H.; Sui, X.; Liu, Z.; Niyato, D. Completion time and energy optimization in the UAV-enabled mobile-edge computing system. IEEE Internet Things J. 2020, 7, 7808–7822. [Google Scholar] [CrossRef]
Zeng, Y.; Chen, S.; Cui, Y.; Yang, J.; Fu, Y. Joint resource allocation and trajectory optimization in UAV-enabled wirelessly powered MEC for large area. IEEE Internet Things J. 2023, 10, 15705–15722. [Google Scholar] [CrossRef]
Jiang, F.; Wang, K.; Dong, L.; Pan, C.; Xu, W.; Yang, K. AI driven heterogeneous MEC system with UAV assistance for dynamic environment: Challenges and solutions. IEEE Netw. 2020, 35, 400–408. [Google Scholar] [CrossRef]
Yang, Z.; Bi, S.; Zhang, Y.J.A. Dynamic offloading and trajectory control for UAV-enabled mobile edge computing system with energy harvesting devices. IEEE Trans. Wirel. Commun. 2022, 21, 10515–10528. [Google Scholar] [CrossRef]
Wan, S.; Lu, J.; Fan, P.; Letaief, K.B. Toward big data processing in IoT: Path planning and resource management of UAV base stations in mobile-edge computing system. IEEE Internet Things J. 2019, 7, 5995–6009. [Google Scholar] [CrossRef]
Yang, Z.; Bi, S.; Zhang, Y.J.A. Online trajectory and resource optimization for stochastic UAV-enabled MEC systems. IEEE Trans. Wirel. Commun. 2022, 21, 5629–5643. [Google Scholar] [CrossRef]
Yan, P.; Cao, Z.; Duan, W.; Li, B.; Zou, Y.; Li, C.; Wang, J. Securing UAV-Aided NOMA Wireless Powered Communications via Artificial Noise. IEEE Trans. Wirel. Commun. 2025. [Google Scholar] [CrossRef]
AL-Bakhrani, A.A.; Li, M.; Obaidat, M.S.; Amran, G.A. MOALF-UAV-MEC: Adaptive Multi-Objective Optimization for UAV-Assisted Mobile Edge Computing in Dynamic IoT Environments. IEEE Internet Things J. 2025, 1. [Google Scholar] [CrossRef]
Sheng, Z.; Hu, H.; Nasir, A.A.; Fang, Y.; da Costa, D.B. Online Trajectory Planning and Resource Allocation of UAV-Enabled MEC Networks Empowered by RIS. IEEE Trans. Green Commun. Netw. 2024, 1. [Google Scholar] [CrossRef]
Zhang, L.; Zhang, Z.Y.; Min, L.; Tang, C.; Zhang, H.Y.; Wang, Y.H.; Cai, P. Task Offloading and Trajectory Control for UAV-Assisted Mobile Edge Computing Using Deep Reinforcement Learning. IEEE Access 2021, 9, 53708–53719. [Google Scholar] [CrossRef]
Yuan, H.; Wang, M.; Bi, J.; Shi, S.; Yang, J.; Zhang, J.; Zhou, M.; Buyya, R. Cost-Efficient Task Offloading in Mobile Edge Computing With Layered Unmanned Aerial Vehicles. IEEE Internet Things J. 2024, 11, 30496–30509. [Google Scholar] [CrossRef]
Michailidis, E.T.; Volakaki, M.G.; Miridakis, N.I.; Vouyioukas, D. Optimization of Secure Computation Efficiency in UAV-Enabled RIS-Assisted MEC-IoT Networks With Aerial and Ground Eavesdroppers. IEEE Trans. Commun. 2024, 72, 3994–4009. [Google Scholar] [CrossRef]
Lakew, D.S.; Tran, A.T.; Dao, N.N.; Cho, S. Intelligent Self-Optimization for Task Offloading in LEO-MEC-Assisted Energy-Harvesting-UAV Systems. IEEE Trans. Netw. Sci. Eng. 2024, 11, 5135–5148. [Google Scholar] [CrossRef]
Wang, J.; Wang, L.; Zhu, K.; Dai, P. Lyapunov-Based Joint Flight Trajectory and Computation Offloading Optimization for UAV-Assisted Vehicular Networks. IEEE Internet Things J. 2024, 11, 22243–22256. [Google Scholar] [CrossRef]
Qin, P.; Wu, X.; Fu, M.; Ding, R.; Fu, Y. Latency Minimization Resource Allocation and Trajectory Optimization for UAV-Assisted Cache-Computing Network with Energy Recharging. IEEE Trans. Commun. 2025, 1. [Google Scholar] [CrossRef]
Li, J.; Sun, G.; Wu, Q.; Niyato, D.; Kang, J.; Jamalipour, A.; Leung, V.C. Collaborative ground-space communications via evolutionary multi-objective deep reinforcement learning. IEEE J. Sel. Areas Commun. 2024, 42, 3395–3411. [Google Scholar]
Consul, P.; Budhiraja, I.; Garg, D.; Garg, S.; Kaddoum, G.; Hassan, M.M. SFL-TUM: Energy efficient SFRL method for large scale AI model’s task offloading in UAV-assisted MEC networks. Veh. Commun. 2024, 48, 100790. [Google Scholar] [CrossRef]
Wu, X.; Zhu, Q.; Chen, W.N.; Lin, Q.; Li, J.; Coello, C.A.C. Evolutionary reinforcement learning with action sequence search for imperfect information games. Inf. Sci. 2024, 676, 120804. [Google Scholar]
Qu, Y.; Dai, H.; Wang, H.; Dong, C.; Wu, F.; Guo, S.; Wu, Q. Service provisioning for UAV-enabled mobile edge computing. IEEE J. Sel. Areas Commun. 2021, 39, 3287–3305. [Google Scholar]
Jiang, H.; Dai, X.; Xiao, Z.; Iyengar, A. Joint task offloading and resource allocation for energy-constrained mobile edge computing. IEEE Trans. Mob. Comput. 2022, 22, 4000–4015. [Google Scholar] [CrossRef]
He, L.; Sun, G.; Sun, Z.; Wang, P.; Li, J.; Liang, S.; Niyato, D. An Online Joint Optimization Approach for QoE Maximization in UAV-Enabled Mobile Edge Computing. arXiv 2024, arXiv:2404.02166. [Google Scholar]
Sun, G.; Zheng, X.; Sun, Z.; Wu, Q.; Li, J.; Liu, Y.; Leung, V.C. UAV-enabled secure communications via collaborative beamforming with imperfect eavesdropper information. IEEE Trans. Mob. Comput. 2023, 23, 3291–3308. [Google Scholar] [CrossRef]
Ndikumana, A.; Tran, N.; Ho, T.; Han, Z.; Saad, W.; Niyato, D.; Hong, C. Joint Communication, Computation, Caching, and Control in Big Data Multi-access Edge Computing. IEEE Trans. Mob. Comput. 2018, 19, 1359–1374. [Google Scholar] [CrossRef]
Zhang, X.; Zhang, J.; Xiong, J.; Zhou, L.; Wei, J. Energy-Efficient Multi-UAV-Enabled Multiaccess Edge Computing Incorporating NOMA. IEEE Internet Things J. 2020, 7, 5613–5627. [Google Scholar] [CrossRef]
Chen, Y.; Zhao, J.; Wu, Y.; Huang, J.; Shen, X. QoE-Aware Decentralized Task Offloading and Resource Allocation for End-Edge-Cloud Systems: A Game-Theoretical Approach. IEEE Trans. Mob. Comput. 2024, 23, 769–784. [Google Scholar] [CrossRef]
Ding, Y.; Li, K.; Liu, C.; Li, K. A Potential Game Theoretic Approach to Computation Offloading Strategy Optimization in End-Edge-Cloud Computing. IEEE Trans. Parallel Distrib. Syst. 2022, 33, 1503–1519. [Google Scholar] [CrossRef]
Xu, B.; Kuang, Z.; Gao, J.; Zhao, L.; Wu, C. Joint offloading decision and trajectory design for UAV-enabled edge computing with task dependency. IEEE Trans. Wirel. Commun. 2022, 22, 5043–5055. [Google Scholar] [CrossRef]
Neely, M. Stochastic Network Optimization with Application to Communication and Queueing Systems; Springer Nature: Berlin/Heidelberg, Germany, 2022. [Google Scholar]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press eBooks; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
Marsland, S. Machine Learning: An Algorithmic Perspective; Chapman and Hall/CRC: Boca Raton, FL, USA, 2009. [Google Scholar]
Bi, S.; Huang, L.; Wang, H.; Zhang, Y.J.A. Lyapunov-guided deep reinforcement learning for stable online computation offloading in mobile-edge computing networks. IEEE Trans. Wirel. Commun. 2021, 20, 7519–7537. [Google Scholar] [CrossRef]
Wang, L.; Wang, K.; Pan, C.; Xu, W.; Aslam, N.; Nallanathan, A. Deep Reinforcement Learning Based Dynamic Trajectory Control for UAV-assisted Mobile Edge Computing. IEEE Trans. Mob. Comput. 2022, 21, 3536–3550. [Google Scholar] [CrossRef]

Figure 1. The UAV-enabled MEC system consists of several UAVs and multiple ground-based user devices (UDs). UAVs provide computing services to UDs through allocating communication and computing resources. Each UD randomly arrives at several tasks at the beginning of a time slot and independently decides to compute tasks locally or offload tasks to UAVs.

Figure 2. (a) The task offloading method with the smallest makespan. (b) A reasonable task offloading method with energy consumption balance and makespan.

Figure 3. LyraRD algorithm flow.

Figure 4. Four cases of Training loss.

Figure 5. Four cases of loss energy queue.

Figure 6. Four cases of loss data queue.

Figure 7. Exception energy queue of Figure 6b.

Figure 8. Exception energy queue of Figure 6c.

Figure 9. Exception energy queue of Figure 6d.

Figure 10. Cost values of the three scenarios for different number of UDs with different values of v.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, H.; Chen, T.; Ren, B.; Li, R.; Yuan, H. Joint Optimization of Task Completion Time and Energy Consumption in UAV-Enabled Mobile Edge Computing. Drones 2025, 9, 274. https://doi.org/10.3390/drones9040274

AMA Style

Zhang H, Chen T, Ren B, Li R, Yuan H. Joint Optimization of Task Completion Time and Energy Consumption in UAV-Enabled Mobile Edge Computing. Drones. 2025; 9(4):274. https://doi.org/10.3390/drones9040274

Chicago/Turabian Style

Zhang, Hanwen, Tao Chen, Bangbang Ren, Ruozhe Li, and Hao Yuan. 2025. "Joint Optimization of Task Completion Time and Energy Consumption in UAV-Enabled Mobile Edge Computing" Drones 9, no. 4: 274. https://doi.org/10.3390/drones9040274

APA Style

Zhang, H., Chen, T., Ren, B., Li, R., & Yuan, H. (2025). Joint Optimization of Task Completion Time and Energy Consumption in UAV-Enabled Mobile Edge Computing. Drones, 9(4), 274. https://doi.org/10.3390/drones9040274

Article Menu

Joint Optimization of Task Completion Time and Energy Consumption in UAV-Enabled Mobile Edge Computing

Abstract

1. Introduction

2. Related Work

2.1. Real-Time Task Arrival

2.2. UAV Energy Management

2.3. Differences of Our Work

3. System Model and Problem Formulation

3.1. System Overview

3.2. Communication Model

3.3. Computation Model

3.4. Cost Model

3.5. Data Queue Model

3.6. Problem Formulation

4. Lyapunov-Based Decoupling of the Multi-Slot MINLP

5. Joint Optimization Algorithm

5.1. Optimal Resource Allocation Algorithm

5.2. LyraRD Algorithm Description

6. Simulation Results

6.1. Platforms and Tasks Data

6.2. Overall Performance Analysis

6.3. Baseline Methods

6.4. Evaluation Result

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Proof of the First Zonklar Equation

Appendix B. Parameter Settings

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI