Task Offloading and Resource Allocation for Augmented Reality Applications in UAV-Based Networks Using a Dual Network Architecture

Van Anh Duong, Dat; Akter, Shathee; Yoon, Seokhoon

doi:10.3390/electronics13183590

Open AccessFeature PaperArticle

Task Offloading and Resource Allocation for Augmented Reality Applications in UAV-Based Networks Using a Dual Network Architecture

by

Dat Van Anh Duong

,

Shathee Akter

and

Seokhoon Yoon

^*

Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan 44610, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(18), 3590; https://doi.org/10.3390/electronics13183590

Submission received: 29 July 2024 / Revised: 30 August 2024 / Accepted: 6 September 2024 / Published: 10 September 2024

(This article belongs to the Special Issue UAV (Unmanned Aerial Vehicles) Networks: Recent Developments and Emerging Trends)

Download

Browse Figures

Versions Notes

Abstract

:

This paper proposes a novel UAV-based edge computing system for augmented reality (AR) applications, addressing the challenges posed by the limited resources in mobile devices. The system uses UAVs equipped with edge computing servers (UECs) specifically to enable efficient task offloading and resource allocation for AR tasks with dependent relationships. This work specifically focuses on the problem of dependent tasks in AR applications within UAV-based networks. This problem has not been thoroughly addressed in previous research. A dual network architecture-based task offloading (DNA-TO) algorithm is proposed, leveraging the DNA framework to enhance decision-making in reinforcement learning while mitigating noise. In addition, a Karush–Kuhn–Tucker-based resource allocation (KKT-RA) algorithm is proposed to optimize resource allocation. Various simulations using real-world movement data are conducted. The results indicate that our proposed algorithm outperforms existing approaches in terms of latency and energy efficiency.

Keywords:

augmented reality; dependent tasks; task offloading; resource allocation; dual network architecture; execution latency; energy consumption

1. Introduction

Augmented reality (AR) [1] is a technology that overlays digital information onto the real world, enhancing interactions with the environment. Augmented reality is widely used in various fields such as urban planning [2], the military [3], education [4], healthcare [5], retail [6], and entertainment [7]. User equipment (UE) for running AR applications typically includes mobile devices such as glasses and smartphones [8,9]. However, the limitation on the computation resource and the energy capacity of mobile devices leads to long processing delays and high power consumption. One way to solve this problem is to use powerful computers in the cloud to handle compute-intensive tasks. But even though such computers are fast, sending lots of data back and forth over the internet can take a long time [10]. This can still cause delays when using AR, which is frustrating for users.

In order to reduce latency, studies [11,12] have investigated the use of mobile edge computing (MEC) for processing AR tasks. Placing edge servers closer to users allows devices to offload heavy processing to these nearby servers, effectively reducing the problem of slow data transfer over long distances. However, these studies assumed that MEC servers are located at the base stations (BSs) of cellular networks. This can be a problem because it limits where MEC services can be used. For example, they might not work well in areas without infrastructure.

To address these challenges, this study introduces a system model for AR applications, which uses a UAV-based network. UAVs with edge computing servers (UECs) can process intensive tasks, leading to substantial latency reduction and increased flexibility. This approach is well-suited to a variety of AR applications. For example, in critical scenarios like natural disasters, fires, or accidents, vital real-time information can be delivered to first responders, aiding in identifying safe routes, locating victims, and highlighting hazards.

Task offloading and resource allocation need to be considered in the proposed system model. In AR applications, tasks are dependent. Specifically, the output of previous tasks is often the input for a subsequent task. However, the most recent studies [13,14,15,16] have focused on independent tasks, and only a few studies have addressed dependent tasks [17,18]. The existing approaches, while valuable for general task offloading scenarios, do not adequately capture the unique requirements of AR applications within the context of a UAV-based system. They fail to consider the specific characteristics of AR tasks, such as the sequential processing pipeline and the constraints of AR tasks. Therefore, the existing approaches are not suitable for AR applications and the proposed system model. It requires a more specialized solution that can effectively manage task offloading and resource allocation in the context of AR applications. In this work, we formulate the problem for dependent tasks in AR applications. Then, based on the proximal policy optimization with a dual network architecture (DNA) [19], we propose a DNA-based task offloading (DNA-TO) algorithm to solve the task offloading decision. A DNA addresses the higher noise level in policy learning compared to value learning in reinforcement learning by separating them into distinct networks. This separation potentially reduces overall system noise. In addition, a Karush–Kuhn–Tucker-based resource allocation (KKT-RA) algorithm is also proposed for resolving resource allocation.

To evaluate the performance of the proposed algorithms, we conducted simulations using real-world movement data from the Geolife GPS trajectory dataset [20], and then we compared the results with other methods. The results demonstrate that our algorithms outperform existing approaches.

The main contributions of this paper can be summarized as follows:

This work considers a novel system model for AR applications using UECs to enable efficient task offloading and resource allocation.
The challenge of dependent tasks in AR applications is addressed. Specifically, this work models the dependencies between tasks and formulates an optimization problem to determine optimal task offloading decisions and resource allocation considering these dependencies.
This work proposes the DNA-TO algorithm for task offloading decisions, leveraging the DNA framework to mitigate noise in reinforcement learning. Additionally, we proposes the KKT-RA algorithm for optimal resource allocation.
Numerous simulations with various scenarios, using multiple performance metrics, were conducted to verify the performance of our proposed algorithms. The results show that the proposed algorithms achieved better results than existing methods.

The rest of this paper is structured as follows: Section 2 provides an overview of related work, while Section 3 describes the system model in more detail. The problem formulation is presented in Section 4, followed by problem decomposition in Section 5. Section 6 describes the algorithms in detail. Finally, evaluation results are presented in Section 7, and the paper concludes in Section 8.

2. Related Work

In the last few years, researchers have focused on optimizing both task offloading and resource allocation in MEC systems [11,12,21]. However, those studies assumed that MEC servers are located at cellular base stations (BSs). This makes the systems depend on an infrastructure, which can limit their flexibility and accessibility in various geographic and operational contexts.

To address these limitations, the concept of UAV-based MEC networks has recently emerged, in which UAVs equipped with edge servers function as mobile base stations [22,23]. In [22], a UAV-assisted VANET architecture to enhance vehicular network computation capabilities was proposed. This architecture uses UAVs as aerial base stations equipped with MEC servers, allowing vehicles to offload computationally intensive tasks. To optimize the system, a joint MEC selection/resource allocation/task offloading algorithm was proposed. The goal is to minimize task processing delays by considering both the transmission time from vehicles to UAVs and the computation time on the MEC servers. The work in [23] introduced a cooperative task offloading scheme for UAV-enabled MEC systems where UEs at a distance can offload tasks to a UAV with the assistance of nearby UEs. The UAV network model consists of multiple UEs (nearby and distant) with a UAV serving as a MEC server. Each UE has a divisible computation task that can be partially offloaded to the UAV. Task offloading by distant UEs is facilitated by associated nearby UEs, reducing UAV energy consumption and ensuring reliable task offloading. The system optimizes the UAV trajectories, computations, and communication resources to minimize the weighted sum energy consumption of both UEs and the UAV. However, in these studies, dependent tasks are not considered.

A few studies have addressed dependent tasks [17,18]. In [17], the authors focused on task dependency in cooperative MEC systems where tasks are composed of multiple concatenated subtasks executed sequentially. Unlike parallel computing, these sequential tasks require the completion of one subtask before starting the next. The authors investigated two scenarios: cooperative nodes without private tasks, and cooperative nodes with their own sequential tasks. In both scenarios, optimization problems are formulated to minimize energy consumption by considering task offloading policies, communication, and computation resource allocation. In [18], task dependency in multi-user mobile edge computing systems was considered, focusing on scenarios where the input of a task on one user device requires the output of another user’s task. A simplified two-user model was initially considered in order to analyze optimal solutions and the impact of task dependency on system performance. The model was then extended to a multi-user scenario, where a task by one mobile user may depend on the output from multiple mobile users. While these studies considered dependent tasks, they did not use UAV-based edge computing or address the specific characteristics of AR applications. Consequently, their findings are not directly applicable to the AR application context and our proposed system model.

Deep reinforcement learning (DRL)-based solutions have recently gained attention for addressing the challenges of task offloading and resource allocation owing to their ability to generalize and efficiently handle complex, high-dimensional problems. For example, the study in [24] proposed a meta critic method to determine the task offloading and resource allocation strategy in a digital twin (DT)-enabled UAV-assisted MEC system. In [25], DRL in a UAV-assisted multi-access edge computing network was proposed to address the challenge of task offloading and resource allocation in various scenarios. In [26], the authors proposed an approach to task offloading and resource allocation in UAV-aided MEC systems. That system model incorporates both ground MEC servers and hovering UAVs to provide low-latency communications, computations, and storage capabilities for IoT devices. The joint task offloading/load balancing/resource allocation problem was formulated as an integer optimization problem aimed at minimizing system cost. A DRL-based algorithm was studied to find an optimal offloading solution. Seid et al. [27] proposed a multi-agent deep reinforcement learning approach to optimize task offloading and resource allocation in a UAV-assisted IoT edge network. The approach considers the dynamic nature of UAV communication and resource demands to minimize computation costs while ensuring quality of service for IoT devices. However, the problem scenarios, system models, and objectives explored in previous studies are significantly different from the scenario, system model, and optimization problems addressed in our work. Consequently, these methods cannot be directly applied to our specific problem.

3. The System Model

A system model for processing AR applications is proposed where the area is divided into blocks, and a block is separated into units, as shown in Figure 1. The proposed system includes two key components.

User equipment (UEs): UEs represent mobile devices capable of handling AR tasks. These tasks can be processed locally on the device or offloaded for execution on edge servers to improve performance or handle computationally intensive tasks.
UAVs with edge computing servers (UECs): UECs are responsible for communicating with UEs in their designated blocks, and for processing offloaded AR tasks upon receiving requests from those UEs.

This system model can be used in many AR applications. For example, in an emergency, it can help people find a safe route to escape from dangers like fires or earthquakes. For the military, it can help find and save people who are lost or hurt. Also, it can be used to play games that mix the real world with digital images and objects.

This work studies the problems of task offloading and resource allocation for AR applications in a block with multiple UEs served by a single UEC. The set of UEs is denoted as

S_{U E}

. For each UE

i \in S_{U E}

, its remaining energy and available computation resource are defined as

E_{i}^{U E}

and

F_{i}^{U E}

, respectively. The UEC server also has its own resource:

E^{U E C}

denotes the remaining energy of the UEC, and

F^{U E C}

represents the available computation resource on the UEC.

Augmented reality technology seamlessly blends computer-generated content with a live video stream, creating an interactive experience where virtual elements are overlaid onto the real world. Figure 2 illustrates the processing pipeline for an AR application, which consists of five essential tasks.

The video source: This module acts as the initial stage, capturing raw video frames from the user’s camera. The captured frames serve as the foundation for subsequent processing.
The tracker: Playing a crucial role in determining the user’s position and orientation relative to the surrounding environment, the tracker analyzes the video frames to identify and track key features, enabling accurate placement of virtual elements within the real world.
The mapper: This task constructs a digital representation of the environment, often referred to as a map. By analyzing the video data and using information from the tracker, the mapper builds a model that includes feature points (details about the real world). This digital map is essential for aligning virtual objects and ensuring they interact and behave realistically within the user’s physical space.
The object recognizer: After analyzing video frames to identify known objects and estimating their 3D positions, this information is the input for the renderer.
The renderer: The final stage of the processing pipeline, the renderer takes the processed data from previous tasks and prepares the frames for display on the user’s device. This may involve overlaying virtual elements onto the video frames, incorporating information from the object recognizer, and ensuring seamless integration of the virtual content with the real world.

The five tasks in an AR application are inherently interdependent. The output from one task often serves as the input for a subsequent task in the processing pipeline. For instance, the tracker’s output becomes the input for the mapper in order to build an accurate digital map. We denote the set of task types as

S_{t a s k_t y p e} = {n_{1}, n_{2}, \dots, n_{5}}

, where

n_{1}, n_{2}, \dots, n_{5}

represent the video source, tracker, mapper, object recognizer, and renderer, respectively. It is important to note that real-time processing requirements of the video source and renderer (

n_{1}

and

n_{5}

) must be executed locally on the user’s device. However, the remaining tasks (

n_{2}, n_{3}

, and

n_{4}

) can be offloaded for execution on a UEC for potentially faster or more efficient processing, particularly when dealing with computationally intensive tasks.

S_{t}^{t a s k}

denotes the set of tasks to be processed at time t. For each

l \in S_{t}^{t a s k}

,

n (l), c (l),

and

d (l)

denote the task type, computation cost, and input data size of task l, respectively. In the context of AR applications, the task offloading and resource allocation problem aims to determine the optimal execution strategy for tasks. We need to determine whether to execute a task locally on the user’s device or offload it to the UEC. This decision considers factors such as task complexity, resource constraints on the device, and the UEC server. Once the execution locations for tasks are determined, we need to allocate the computation resource of the UEC for offloaded tasks. The task offloading decision and the resource allocation strategy are the keys to optimizing the overall performance of the AR application to minimize energy consumption and task completion time. The definitions of notations in the system model are in Table 1.

3.1. The Communication Model

This subsection describes the communication model between UEs and the UEC. UEs connect to the UEC for task processing. The output data size after task execution on the UEC is typically much smaller than the raw input data captured by the UEs. Additionally, downlink data rates (UEC to UE) are generally higher than uplink rates. Therefore, we focus on the uplink data transfer (UE to UEC) and do not consider transferring processed data back to UEs. Similar to previous studies like [28,29,30], a classic air-to-ground probabilistic path loss model [31] is used. There are line-of-sight (LoS) links and non-line-of-sight (NLoS) links for communications between the UEs and the UEC. Path loss considers both free-space path loss and additional loss due to environmental factors.

P L

denotes path loss, calculated using the following equation:

P L = \frac{η_{L o S} - η_{N L o S}}{1 + a \times exp (- b (arctan (\frac{h}{d}) - a))} + 10 log (h^{2} + d^{2}) + 20 log (f) + 20 log (\frac{4 π}{c}) + η_{N L o S}

(1)

where

η_{L o S}

and

η_{N L o S}

are the average additional losses of the free propagation space loss under LoS and NLoS conditions, respectively; a and b are constants that depend on the propagation environment (e.g., in urban [31], a = 9.61, b = 0.16); h is the height of the UEC and d is the distance between a UE and the UEC’s projection on the ground, while f is the carrier frequency, and c is the speed of light.

Let

p_{i}^{T}

denote the transmission power of UE i. The uplink transmission rate from UE i to the UEC is defined as

r_{i}

. Based on the calculated path loss and noise power during transmission,

r_{i}

can be determined as follows:

r_{i} = B {log}_{2} (1 + \frac{p_{i}^{T}}{P L \times σ})

(2)

where B is the sub-channel bandwidth and

σ

is noise power. This communication model allows us to determine the uplink transmission rate between UEs and the UEC.

3.2. The Computation Model

In this section, the computation model is discussed. A task can be executed locally or computed by the UEC. Therefore, processing tasks locally and computing tasks in the UEC are presented.

3.2.1. Local Computing

Let

l \in S_{t}^{t a s k}

be a task of UE i at time t. Note that

F_{i}^{U E}

and

c (l)

are the computation resource of UE i and the computation cost of task l, respectively. When l is processed locally on UE i, the completion time is denoted

t^{L o c} (l)

and is calculated as follows:

t^{L o c} (l) = \frac{c (l)}{F_{i}^{U E}}

(3)

Furthermore, the energy consumed per CPU cycle of UE i is denoted

μ_{i}^{U E}

. Let

e^{L o c} (l)

denote the energy consumption for local processing, so

e^{L o c} (l)

is computed as

e^{L o c} (l) = c (l) μ_{i}^{U E}

(4)

3.2.2. Computing in a UEC

This subsection discusses how to process a task on the UEC. Let

l \in S_{t}^{t a s k}

be the task of UE i at time t. To execute task l at the UEC, we first consider transmission of the input data for task l from UE i to the UEC. The size of the input data for task l is denoted

d (l)

. Let

t_{i}^{T_I n p u t} (l)

denote the time delay for input data transmission, calculated as

t_{i}^{T_I n p u t} (l) = \frac{d (l)}{r_{i}}

(5)

The energy consumption for uploading the input data from UE i is denoted as

e_{i}^{T_I n p u t} (l)

, given by

e_{i}^{T_I n p u t} (l) = t_{i}^{T_I n p u t} (l) \times p_{i}^{T}

(6)

Let

e_{U E C}^{R_I n p u t} (l)

denote the energy consumption when receiving the input data at the UEC. It is calculated as follows:

e_{U E C}^{R_I n p u t} (l) = t_{i}^{T_I n p u t} (l) \times p_{U E C}^{R}

(7)

where

p_{U E C}^{R}

represents the power consumption per time unit of the UEC when receiving data, measured in watts (W).

The execution time and the energy consumption when executing task l at the UEC are denoted

t_{U E C}^{E x e} (l)

and

e_{U E C}^{E x e} (l)

, respectively. Those values are calculated using Equations (8) and (9):

t_{U E C}^{E x e} (l) = \frac{c (l)}{f_{l}}

(8)

e_{U E C}^{E x e} (l) = c (l) \times μ^{U E C}

(9)

In these equations,

f_{l}

is the computation resource assigned to task l by the UEC, and

μ^{U E C}

represents the energy consumed per CPU cycle of the UEC, measured in joules per cycle (J/cycle).

To calculate the completion time and total energy consumption for task l processed on the UEC, we need to consider both input data transmission and task execution. It is important to note that tasks in AR applications are dependent. If two adjacent tasks are executed on the UEC, the input for the second task can be directly obtained from the output and input of the first task, eliminating the need for additional input data transfer from the UE to the UEC.

Let v be the task of UE i at time

t - 1

. We define a binary variable

b_{l}

to indicate whether task l is the subsequent task of task v in the processing pipeline and whether both tasks are executed on the UEC. If task l is the subsequent task of task v and both are processed on the UEC,

b_{l} = 0

; otherwise,

b_{l} = 1

. Let

t^{U E C} (l)

denote the completion time and

e^{U E C} (l)

be the total energy consumption. Considering the dependency, those values are calculated with Equation (10) and Equation (11), respectively:

t^{U E C} (l) = b_{l} \times t_{i}^{T_i n p u t} (l) + t_{U E C}^{E x e} (l)

(10)

e^{U E C} (l) = b_{l} \times (e_{i}^{T_I n p u t} (l) + e_{U E C}^{R_I n p u t} (l)) + e_{U E C}^{E x e} (l)

(11)

From Equations (10) and (11), when

b_{l} = 0

, this indicates that task l is the subsequent task of task v, and both are executed on the UEC. Consequently, the transmission time and energy consumption for sending and receiving input data from UE i to the UEC can be ignored.

4. Problem Formulation

This section formulates the optimal problem of task offloading and resource allocation based on the system model described previously.

Let

o_{l}

be a binary variable at time t that indicates the task offloading decision for task l. A value of 1 means that task l is offloaded to the UEC, whereas 0 indicates local execution:

o_{l} = \{\begin{matrix} 1, if task l is offloaded to the UEC \\ 0, otherwise \end{matrix}

(12)

The set of all offloading decision variables at time t is denoted by

O = {o_{l} | l \in S_{t}^{t a s k}}

. Note that the resource allocation variable for task l, denoted

f_{l}

, represents the amount of computational resource allocated to task l by the UEC. For all tasks at time t, the set of computation resource variables is

F = {f_{l} | l \in S_{t}^{t a s k}}

.

Let T be the total completion time of all tasks at time t, calculated as

T = \sum_{l \in S_{t}^{t a s k}} (1 - o_{l}) \times t^{L o c} (l) + o_{l} \times t^{U E C} (l)

(13)

The total energy consumption of all UEs and the UEC is denoted E. It is calculated using the following equation:

E = \sum_{l \in S_{t}^{t a s k}} (1 - o_{l}) \times e^{L o c} (l) + o_{l} \times e^{U E C} (l)

(14)

The task offloading and resource allocation problem is formulated as an optimization problem to minimize the total completion time and total energy consumption. We first define the objective function as follows:

J = α \frac{T}{w_{T} \times | S_{t}^{t a s k} |} + (1 - α) \frac{E}{w_{E} \times | S_{t}^{t a s k} |}

(15)

where a tunable parameter,

α \in [0, 1]

, modifies the balance between minimizing completion time and minimizing energy consumption. Higher values for

α

prioritize a faster completion time, while lower values prioritize lower energy consumption.

| S_{t}^{t a s k} |

is the total number of tasks at time t, while

w_{T}

and

w_{E}

are the coefficients for rescaling the time and energy, respectively. Specifically,

w_{T}

is the longest possible completion time for locally executing a task, and

w_{E}

is the highest energy consumption from processing a task locally. These values help to normalize the contributions of time and energy to the objective function.

The problem can be formulated as an optimization problem minimizing J:

min_{O, F} (J)

(16)

subject to:

\begin{matrix} \begin{matrix} o_{l} \in {0, 1}, \forall l \in {S_{t}^{t a s k} | n (l) \notin {n_{1}, n_{5}}} \end{matrix} \end{matrix}

(17)

\begin{matrix} \begin{matrix} o_{l} = 0, \forall l \in {S_{t}^{t a s k} | n (l) \in {n_{1}, n_{5}}} \end{matrix} \end{matrix}

(18)

\begin{matrix} \begin{matrix} \sum_{l \in S_{t}^{t a s k}} (1 - o_{l}) \times e^{L o c} (l) + o_{l} \times b_{l} \times e_{i}^{T_i n p u t} (l) \leq E_{i}^{U E}, \forall i \in S_{U E} \end{matrix} \end{matrix}

(19)

\begin{matrix} \begin{matrix} \sum_{l \in S_{t}^{t a s k}} o_{l} \times e^{U E C} (l) - b_{l} \times e_{i}^{T_i n p u t} (l) \leq E^{U E C} \end{matrix} \end{matrix}

(20)

\begin{matrix} \begin{matrix} f_{l} = 0, \forall l \in {S_{t}^{t a s k} | o_{l} = 0} \end{matrix} \end{matrix}

(21)

\begin{matrix} \begin{matrix} f_{l} > 0, \forall l \in {S_{t}^{t a s k} | o_{l} = 1} \end{matrix} \end{matrix}

(22)

\begin{matrix} \begin{matrix} \sum_{l \in S_{t}^{t a s k}} o_{l} \times f_{l} \leq F^{U E C} \end{matrix} \end{matrix}

(23)

Equations (17) and (18) imply that the video source and renderer can only be executed locally, whereas other types of tasks could be offloaded to the UEC or processed locally. The energy constraints of UEs and the UEC are given in Equation (19) and Equation (20), respectively. Specifically, the total energy consumption by each UE and by the UEC should not exceed their residual energy levels. Equation (21) states that the UEC does not assign the resource to locally executed tasks. By contrast, Equation (22) guarantees that if a task is offloaded and needs to be executed, the UEC must assign computing resource to the task. The limitation on the computation resource at the UEC is given in constraint (23), i.e., the total computation resource assigned to tasks must not exceed the computation resource of the UEC.

5. Problem Decomposition

From the problem formulation, we obtain a problem that contains two types of variables (i.e., binary variables (

O

) and continuous variables (

F

)). Combining binary and continuous variables leads to a non-convex mixed-integer non-linear programming problem, which is difficult to solve and has high computational complexity. To address this problem, we decompose the problem into two simpler subproblems. The first subproblem, an integer problem, focuses on determining the optimal offloading decisions for tasks using binary variables. The second subproblem, a continuous problem, allocates resources to offload tasks using continuous variables.

In order to decompose the problem, we reformulate it as a two-stage optimization problem. Accordingly, the reformulation is as follows:

(P 1) min_{O} (min_{F} (J)) subject to : Equations (17) - (23)

(24)

where reformulated problem P1 can be characterized as a two-stage minimization process. The first stage minimizes the objective function with respect to the decision variables (

O

). However, this minimization is achieved by solving a nested minimization problem for a fixed value of

O

. This nested problem focuses on resource allocation decisions denoted by

F

and aims to find the minimum value of J. All the original problem constraints remain.

A key observation from reformulated problem P1 is decoupling of the constraints related to task offloading decisions and resource allocation decisions. This separation allows us to analyze the problem characteristics based on the fixed values of either

O

or

F

. Specifically, if resource allocation

F

is fixed, P1 becomes an integer optimization problem of the decision variables in

O

. In contrast, if we fix the task offloading decisions

(O)

, P1 transforms into a continuous optimization problem since minimization is performed only with respect to the continuous variables in

F

. Taking advantage of this decoupling, we decompose the problem into two subproblems: a master problem and a subproblem. Let us denote the master problem as P2 and the subproblem as P3. The master problem can be written as follows:

(P 2) min_{O} (J^{*}) subject to : Equations (17) - (20)

(25)

where

J^{*}

represents the optimal objective value obtained from subproblem P3 for a given fixed task offloading decision. P3 is as follows:

(P 3) J^{*} = min_{F} (J) subject to : Equations (21) - (23)

(26)

P2 focuses on finding the optimal value of

J^{*}

with respect to the task offloading decisions

(O)

. P3 aims to minimize original objective function J with respect to resource allocation decisions

F

, considering the fixed task offloading decisions provided by the master problem. The constraints for the master problem include Equations (17)–(20), whereas subproblem P3 is subject to the remaining constraints, Equations (21)–(23).

6. Algorithm

This section presents the dual network architecture-based task offloading algorithm and the Karush–Kuhn–Tucker-based resource allocation algorithm. The proposed framework is shown in Figure 3. Specifically, DNA-TO minimizes the objective, and the best task-offloading decision is obtained. The reward of each DNA-TO solution is calculated based on the resource allocation solution from KKT-RA. The remainder of this section describes DNA-TO and KKT-RA in detail.

6.1. DNA-Based Task Offloading

Proximal policy optimization with a dual network architecture [19] is a novel approach to the task offloading problem. The DNA addresses a key challenge in reinforcement learning: the presence of higher noise levels in policy learning compared to value learning. By separating these tasks into distinct networks, the DNA potentially reduces overall noise in the system. In this section, we propose the DNA-TO algorithm to solve problem P2.

6.1.1. Markov Decision Process Formulation

In reinforcement learning, a model-free Markov decision process (MDP) [32] describes environments where an agent interacts and learns without a pre-defined model of transition probabilities and rewards. By exploring actions and observing the resulting states and the corresponding rewards, the agent builds its understanding of the environment. These experiences enable the agent to estimate transition probabilities and rewards. This section proposes a specific model-free MDP formulation for problem P2. Specifically, the state space, the action space, and the reward function are defined. This enables the DNA agent to learn optimal task-offloading strategies. Let

S

,

A

, and

R

denote the state space, the action space, and the reward function, respectively. Their definitions are as follows.

(a)

State space

S

: At the beginning of each time slot t, the state that includes the environment information is collected by the UEC. Let

s_{t} \in S

be the state in time slot t. It is defined as follows:

s_{t} = {O, P, M, C, D, G}

(27)

where

O

,

P

,

M

,

C

,

D

,

G

are defined as follows.

$O = {o_{i} ∣ \forall i \in S_{t}^{task}}$ represents the set of offloading decisions made for tasks in time slot t. The value of $o_{i}$ indicates whether task i is executed locally ( $o_{i} = 0$ ) or offloaded and processed by the UEC ( $o_{i} = 1$ ).
$P = {p_{i} ∣ \forall i \in S_{t - 1}^{task}}$ is the set of offloading decisions for tasks in the previous time slot $(t - 1)$ .
$M = {m_{i} ∣ \forall i \in S_{U E}}$ represents the set of relationships between tasks in time slot t and the tasks in time slot $(t - 1)$ for all UEs. Specifically, $m_{i}$ captures the relationship between the tasks of UE i. A value of $m_{i} = 1$ indicates that the current task of UE i is the subsequent task in the processing pipeline of its previous task; otherwise, $m_{i} = 0$ .
$C = {c_{i} ∣ \forall i \in S_{t}^{task}}$ represents the set of computation costs for tasks in the current time slot; the values are normalized to fall between 0 and 1.
$D = {d_{i} ∣ \forall i \in S_{t}^{task}}$ is the set of input sizes for all tasks in the current time slot. The values are also normalized to fall between 0 and 1.
$G = {x_{i}, y_{i} ∣ \forall i \in S_{t}^{U E}}$ is the set of location coordinates for all UEs in the system; $(x_{i}, y_{i})$ is the location of UE i. Both x and y coordinates are normalized to fall between 0 and 1.

(b)

Action space

A

: This defines the set of all possible actions the agent can take in a single time slot. Let

a_{t} = {i | i \in [1, 2 \times | S_{t}^{task} |]}

be the action in time slot t, where

| S_{t}^{task} |

is the number of tasks in time slot t. Each action is represented by a single integer value within the range

[1, 2 \times | S_{t}^{task} |]

. The value of

a_{t}

determines whether a task is offloaded or executed locally.

Offloading: If $a_{t} \leq | S_{t}^{task} |$ , it means the task with index $a_{t}$ in $S_{t}^{task}$ was chosen for offloading to the UEC.
Local execution: If $a_{t} > | S_{t}^{task} |$ , it indicates the task with index $(a_{t} - | S_{t}^{task} |)$ in $S_{t}^{task}$ is executed locally on the device.

(c)

Immediate reward function

R

: To minimize the objective,

R

is designed to negatively correlate with the objective value such that actions leading to lower objective values receive higher rewards. Additionally, to satisfy constraints (19) and (20) in P2, two penalty terms are considered. For constraint (19), let

v_{i}

be a binary variable indicating whether the remaining energy of UE i violates constraint (19) or not. It is defined as follows:

v_{i} = \{\begin{matrix} 1, if UE i violates constraint (19) \\ 0, otherwise \end{matrix}

(28)

We define

ω_{1}

as representing the number of UEs that violate constraint (19). It is calculated as follows:

ω_{1} = \sum_{i \in S_{U E}} v_{i}

(29)

For constraint (20),

ω_{2}

is a binary variable to indicate whether constraint (20) is violated or not.

ω_{2}

is defined as:

ω_{2} = \{\begin{matrix} 1, if constraint (20) is violated \\ 0, otherwise \end{matrix}

(30)

As penalty terms,

ω_{1}

and

ω_{2}

are used to discourage the agent from taking actions that violate constraints (19) and (20), respectively. Let s, a, and

s^{'}

be the current state, the action taken, and the next state, respectively.

J (s^{'})

denotes the value of the objective function for the next state. The reward denoted by

r (s, a, s^{'})

, received by the agent for taking action a in state s and transitioning to state

s^{'}

, is calculated as follows:

r (s, a, s^{'}) = - [ϕ \times J (s^{'}) + (1 - ϕ) [ρ \frac{ω_{1}}{| S_{U E} |} + (1 - ρ) ω_{2}]]

(31)

where

ϕ

and

ρ

are two tunable parameters;

ϕ

balances the contribution of the objective value and the penalties to the overall reward. A higher

ϕ

emphasizes the objective, whereas a lower

ϕ

gives more weight to the penalties, and

ρ

is used to adjust the effect of the two penalty terms.

| S_{U E} |

denotes the total number of UEs and is used to normalize the value of

ω_{1}

.

6.1.2. The DNA-TO Algorithm

A DNA allows for independent learning of value and policy, preventing interference. We propose a modified version of the DNA for the task offloading problem (DNA-TO). These are two networks in DNA-TO. The policy network, denoted by

θ_{π} (s)

, outputs policy

π

and an estimated value,

V_{π} (s)

, while the value network, denoted by

θ_{V}

, outputs only a value,

V_{V} (s)

. To improve control over noise levels for both policy and value learning, the DNA uses two separate return estimations, both using

T D (λ)

[33] as follows:

V_{a d v} (s) = T D^{(γ, λ_{π})} (s)

(32)

and

V_{t a r g} (s) = T D^{(γ, λ_{V})} (s)

(33)

where

γ \in [0, 1]

is a discount factor, while

λ_{V}

and

λ_{π}

are hyperparameters that determine the balance between the variance-bias trade-off in each estimate.

V_{t a r g} (s)

is the target for training the value function, while the other value estimate,

V_{a d v} (s)

, is used to compute advantage estimates for policy updates as follows:

\hat{A} (s) = V_{a d v} (s) - V_{V} (s)

(34)

The DNA training process is divided into three distinct phases: policy, value, and distillation. Each phase focuses on optimizing a single objective using a unique optimizer and a hyperparameter set for a specific number of epochs.

Policy phase: The policy phase optimizes the $θ_{π}$ network. It uses the clipped surrogate objective from proximal policy optimization (PPO) [34] with an entropy bonus, and the policy loss is defined as

$L^{C L I P} (θ_{π}) = \hat{E} [min (φ (θ_{π}) \hat{A} (s), clip (φ (θ_{π}), 1 - ϵ, 1 + ϵ) \hat{A} (s)) + c_{e b} \cdot S [π (s)]]$

(35)

where $φ (θ_{π})$ is the probability ratio between current and old policies, and $ϵ$ is the clipping coefficient. $S [π (s)]$ and $c_{e b}$ are the entropy of the policy and the entropy bonus coefficient, respectively.
Value phase: The value phase optimizes the $θ_{V}$ network. It uses the squared-error value loss. The loss function is defined as follows:

$L^{V F} (θ_{V}) = \hat{E} [{(V_{V} (s) - V_{t a r g} (s))}^{2}]$

(36)
Distillation phase: In this phase, knowledge from value network $θ_{V}$ is transferred to the policy network, $θ_{π}$ , through a constrained distillation update [35]. This is achieved using the mean squared error while softly constraining the network’s policy. The distillation loss function is

$L^{D} (θ_{π}) = \hat{E} [{(V_{π} (s) - V_{V} (s))}^{2}] + δ \cdot \hat{E} [KL (π_{old} (\cdot | s), π (\cdot | s))]$

(37)

where $δ$ is the policy constraint coefficient. $KL (π_{old} (\cdot | s), π (\cdot | s))$ is the Kullback–Leibler (KL) divergence between the old and new policy distributions.

6.2. Karush–Kuhn–Tucker-Based Resource Allocation

In this section, resource allocation problem P3 is solved by using Karush–Kuhn–Tucker conditions [36]. As shown in constraint (21), if task l is locally executed,

f_{l} = 0

. Therefore, only tasks that are offloaded and executed at the UEC must be assigned computation resources. Let

U_{t}^{t a s k}

be the set of tasks that are offloaded and executed at the UEC in time slot t. Then, the formulation of resource allocation problem P3 can be reformulated as follows:

(P 4) min_{F} (\frac{α}{w_{T}} \sum_{l \in U_{t}^{t a s k}} \frac{c (l)}{f_{l}})

(38)

subject to:

\begin{matrix} \begin{matrix} f_{l} > 0, \forall l \in U_{t}^{t a s k} \end{matrix} \end{matrix}

(39)

\begin{matrix} \begin{matrix} \sum_{l \in U_{t}^{t a s k}} f_{l} \leq F^{U E C} \end{matrix} \end{matrix}

(40)

Constraints (39) and (40) ensure that each task in

U_{t}^{t a s k}

receives computation resources, while also ensuring that the total allocated resources do not exceed the UEC’s computational capacity. The constraints in Equations (39) and (40) are convex. Let us denote the objective function of P4 as

Γ (F)

. Its second-order derivatives with respect to

f_{l}

are calculated and obtained as follows:

\begin{matrix} \begin{matrix} \frac{δ^{2} Γ (F)}{δ f_{l}^{2}} = \frac{α}{w_{T}} \times \frac{2}{f_{l}^{3}} > 0, \forall l \in U_{t}^{t a s k} \end{matrix} \end{matrix}

(41)

\begin{matrix} \begin{matrix} \frac{δ^{2} Γ (F)}{δ f_{l} δ f_{l^{'}}} = 0, \forall l, l^{'} \in U_{t}^{t a s k} and l \neq l^{'} \end{matrix} \end{matrix}

(42)

Equations (41) and (42) demonstrate that the Hessian matrix of

Γ (F)

is positive-definite. Combined with the convex of the constraints in Equations (39) and (40), this indicates that P4 is a convex optimization problem, making it suitable for optimization using KKT conditions. Let

f_{l}^{*}

represent the optimal resource allocation solution and let

Γ (F^{*})

be the corresponding optimal objective function value. By applying the KKT conditions, we obtain

f_{l}^{*}

and

Γ (F^{*})

as follows:

\begin{matrix} \begin{matrix} f_{l}^{*} = \frac{F^{U E C} \sqrt{c (l)}}{\sum_{l \in U_{t}^{t a s k}} \sqrt{c (l)}} \end{matrix} \end{matrix}

(43)

\begin{matrix} \begin{matrix} Γ (F^{*}) = \frac{α}{w_{T}} \times \frac{1}{F^{U E C}} {[\sum_{l \in U_{t}^{t a s k}} \sqrt{c (l)}]}^{2} \end{matrix} \end{matrix}

(44)

6.3. Algorithm Complexity

This section analyzes the complexity of the proposed algorithms for real-time decision-making in task offloading and resource allocation. In DNA-TO, a trained policy network with a fixed architecture drives task offloading decisions at each time slot t. Initially, a greedy algorithm, with time complexity

O (| S_{t}^{task} |)

, provides an initial task offloading solution. Then, for N iterations, the state is input to the policy network to obtain task offloading decisions with time complexity

O (N)

. The time complexity of the DNA-TO algorithm is

O (| S_{t}^{task} | + N)

. The KKT-RA algorithm, responsible for resource allocation, has a time complexity of

O (| S_{t}^{task} |)

. The combined time complexity of the proposed approach is

O (| S_{t}^{task} | + N)

.

For memory requirements, the primary memory requirement for the DNA-TO algorithm comes from loading the policy network into memory. In this work, the policy network has a feedforward architecture consisting of L hidden layers, each containing N neurons. Analyzing the policy network architecture, let

M_{θ}

be the memory requirement (in bytes) for loading the network. It can be estimated as follows:

M_{θ} = 4 [| S_{t}^{task} | (8 N + 1) + N^{2} (L - 1) + N (L + 1) + 1]

(45)

Let

M_{R A}

denote the memory requirement (in bytes) for KKT-RA. It can be estimated as

M_{R A} = 28 | S_{t}^{task} | + 4

(46)

The total memory requirements of the proposed approach can be estimated as

(M_{θ} + M_{R A})

bytes. Overall, the proposed approach shows a relatively low complexity, making it suitable for real-time scenarios.

7. Evaluation Results

This section evaluates the performance of the proposed algorithm through simulations conducted with various parameter configurations. The results obtained are then compared to other methods.

A UAV-based network is used for the simulation scenario. As shown in Figure 4, a block in the simulation area comprises 169 units. Each unit is a hexagon with a side length of 5 m. A UEC is responsible for communication and task offloading within this block. The UEC is positioned at the center of its block, with its height randomly set between 70 and 100 m. This ensures communication with all UEs in the block at a reasonable speed. The Geolife GPS trajectory dataset was selected to simulate UE movements due to its inclusion of both human location and time information within a suitably sized collection area, which is crucial for accurately simulating AR applications in our scenario. The dataset contains the movement trajectories of 182 individuals in Beijing, China, collected over five years. An area with a high density of data points within the dataset was selected and designated as the simulation area. The Geolife GPS dataset has sparse data for each day, so we combined one month of data (April 2009) to represent a single day in our simulation. Specifically, the original date information was disregarded, and the trajectories of a person on different days were treated as those of different people in the combined dataset. We extracted the latitude and longitude of UEs from this processed data, setting the altitude to zero. Finally, we randomly selected data from 20 people for the single UEC scenario and 60 people for the multiple UEC scenario. The path loss model parameters (a, b,

μ_{L o S}

,

μ_{N L o S}

) were set to 9.61, 0.16, 1.0, and 20, respectively, following the setup for urban environments in [31]. These values characterize the signal attenuation expected in an urban setting, influencing the received signal strength and the reliability of the communication links between UEs and the UEC. The carrier frequency, f, is set to 2 GHz. This frequency is a popular frequency range for various wireless communication technologies and is widely used in various studies [16,37,38]. It can support the high data rates required for AR applications. The noise power,

σ

, from UEs to the UEC is set to −100 dBm. This represents a relatively low level of background noise, which is a reasonable assumption for UAV-based AR networks operating in open environments. The bandwidth, B, of each UE is set to 20 MHz. This bandwidth allocation aligns with common practices in modern mobile networks and enables sufficient data rates for AR content transmission. The transmission power of UEs is set to 20 dBm, a realistic value for mobile devices and commonly used in various studies [16,39,40]. The tunable parameter,

α

, is set to 0.5. This means the impacts of execution latency and energy consumption are equally important in the objective function. For

ϵ

,

c_{e b}

, and

δ

, the values were set to 0.2,

10^{- 3}

, and 1, respectively, as in [19]. The available processing capacity of the UEC was set to 50 GHz (Intel Core i9-13900K), while UE capacities were randomly assigned within the range of 1.8 GHz to 2 GHz (ARM Cortex-A57). Following [13,41], the thermal design power of each processor was used to estimate energy consumption. Specifically, energy consumption per CPU cycle (

μ^{U E C}

and

μ_{i}^{U E}

) was set to 1.1 × 10⁻⁹ W and 2.259 × 10⁻⁹ W for UEs and UECs, respectively. Additional experimental settings are detailed in Table 2. In the table, the data sizes for tasks and the required CPU cycles were set up based on the input sizes and the required CPU cycles for AR applications [11,42,43,44].

The simulations were run on a computer equipped with an Intel Core i9-10900X CPU @3.70 GHz with an NVIDIA Quadro RTX 4000 GPU and 64 GB of RAM.

To see how well the proposed algorithms performed, we compared simulation results against three baseline methods.

Greedy algorithm: This method selects the processing device with the lowest weighted sum of energy consumption and execution latency for each task. The KKT-RA algorithm, the same algorithm in our proposal, was used to calculate these metrics and determine the final resource allocation solution.
Genetic algorithm (GA): A modified genetic algorithm [45] determined the task offloading decision, while the KKT-RA algorithm was used to compute the objective function value and the final resource allocation solution.
Soft actor critic (SAC): This approach is similar to DNA-TO. However, instead of using the DNA for task offloading, a soft actor critic [46] determines the task offloading decision. Like other methods, KKT-RA was used to calculate the reward for each state and find the final resource allocation solution.

The performance of the proposed algorithm and the baseline algorithms was evaluated in two scenarios: a UAV-based network with a single UEC and a UAV-based network with multiple UECs (i.e., three UECs). The key performance metrics (i.e., average task execution latency and total energy consumption) were collected over 600 time slots (each time slot is one second) and used to compare the different methods.

7.1. Simulation Results

7.1.1. Results with a Single UEC

This section focuses on evaluating the proposed algorithm in a scenario with a single UEC, as shown in Figure 4. The impact of varying the number of UEs, the task sizes, and the required CPU cycles is presented and analyzed.

(a): The effect from different numbers of UEs

Figure 5a shows the impact on average task execution latency from varying the number of UEs. As the number of UEs increased from 5 to 20, a general trend of increasing latency was observed across the algorithms. The greedy algorithm exhibited the highest latency. DNA-TO demonstrated the best overall performance, particularly in scenarios with a larger number of UEs (e.g., 20 UEs). Notably, when the number of UEs was relatively low (e.g., 5, 10, and 15 UEs), the performance gap between DNA-TO, the GA, and the SAC was small. However, this gap widened significantly at 20 UEs, indicating that DNA-TO is able to handle more complex environments, while the GA and the SAC only performed comparably well in simpler scenarios.

The relationship between the number of UEs and the total energy consumption is presented in Figure 5b. In general, the upward trend in energy consumption was observed for all algorithms when the number of UEs increased. The greedy algorithm and the GA exhibited similar energy consumption, both of which were higher than the other methods. DNA-TO also obtained the best performance with a larger number of UEs. While the performance gap between DNA-TO and the SAC algorithm was minor with fewer UEs, it became more pronounced at 20 UEs, further demonstrating DNA-TO’s ability to efficiently manage task offloading decisions in complex scenarios with higher user densities.

(b): The effect of different task data sizes

In this section, results from various task data sizes are presented for 20 UEs. Table 3 shows the different data size setups, which increased from setup 1 to setup 4 for each task.

The average execution latency with different task data sizes is shown in Figure 6a. Latency increased as data size increased, but the rate of increase varied among the algorithms, with the GA and the greedy algorithm exhibiting a significantly steeper incline compared to DNA-TO and the SAC. DNA-TO had lower latency, highlighting its efficiency in handling tasks with varying complexities. The gap between DNA-TO and other methods widened as data size increased, and DNA-TO was able to maintain low latency even with larger data sizes.

Figure 6b shows total energy consumption across different task data sizes. The trend shows a general decrease in total energy consumption as the data size increased for all algorithms. Observe that the GA and the greedy algorithm exhibited the highest energy consumption, whereas DNA-TO consistently showed the lowest. The energy difference between DNA-TO and the other algorithms became more significant with larger data sizes, highlighting its efficiency in energy consumption, especially for larger data sizes.

(c): The effect of different CPU cycle requirements

In this section, the results when the required CPU cycles varied were collected for 20 UEs. Table 4 presents the different requirements. Needed CPU cycles increased from setup 1 to setup 4 for each task.

Figure 7a illustrates the relationship between average task execution latency and the required CPU cycles. As the required CPU cycles increased, there was a general upward trend in latency. The greedy algorithm showed the most significant increase in latency, followed by the GA, the SAC, and DNA-TO. DNA-TO obtained the lowest latency for all CPU cycle setups.

Total energy consumption is displayed in Figure 7b. In general, energy consumption increased as the required number of CPU cycles increased. DNA-TO maintained the lowest energy consumption across all CPU cycle requirements. This shows DNA-TO can optimize energy usage more effectively than the other methods.

7.1.2. Results with Multiple UECs

As shown in Figure 8, this section evaluates the proposed algorithm in a scenario with three UECs. Each UEC was responsible for 20 UEs in each block. The effects from varying the task sizes and the required CPU cycles are investigated and analyzed. For the different task sizes, the configurations detailed in Table 3 were employed, while varying the CPU cycle requirements used configurations specified in Table 4.

(a): The effect from different task data sizes

Figure 9a displays the average task execution latency for different data size setups. It shows a trend similar to the single UEC scenario. When the data size increased, latency increased, indicating that larger data sizes lead to longer processing times. DNA-TO outperformed the other algorithms, maintaining the lowest average latency with larger data sizes (e.g., setups 3 and 4). This demonstrates that DNA-TO schedules tasks effectively, even with large data sizes.

Figure 9b presents the total energy consumption of the four different algorithms (DNA-TO, SAC, GA, and greedy) across the various data size setups. Overall, total energy consumption decreased when data size increased. The lowest energy consumption was obtained by DNA-TO, while the GA archived the highest value. This indicates DNA-TO can effectively minimize energy consumption while maintaining performance.

(b): The effect of different CPU cycle requirements

The relationship between average task execution latency and different required CPU cycles is presented in Figure 10a. As the required CPU cycles increased, there was a clear upward trend in latency for all algorithms. DNA-TO obtained the shortest latency, while the SAC and the GA had similar latency, and the greedy algorithm had the longest. This demonstrates DNA-TO’s ability to maintain shorter latency than the others, even under high computational loads.

Figure 10b presents the total energy consumption for the four algorithms with different CPU cycle requirements. DNA-TO had the lowest energy consumption, while the SAC consumed slightly more energy than DNA-TO. Both DNA-TO and the SAC algorithm remained significantly more efficient than the GA and the greedy algorithm.

The proposed system operates independently on each UEC, handling the UEs within its designated block. This decentralized approach ensures that increasing the number of UECs does not affect the algorithm’s performance on any individual UEC. Consequently, the proposed system can readily scale to accommodate networks with a larger number of UECs. This scalability is demonstrated in the simulation results, where the performance in terms of latency remains consistent between scenarios with one UEC and three UECs. The observed increase in total energy consumption is expected due to the corresponding rise in the number of UECs, UEs, and processed tasks.

8. Conclusions

This paper presents a novel system model and algorithms for task offloading and resource allocation in UAV-based augmented reality networks. By equipping UAVs with edge computing servers, we overcame the limitations of traditional mobile devices and cloud-based solutions for AR applications. Our model explicitly addressed the challenges of dependent tasks in AR. An optimization problem was formulated to minimize the overall task completion time and energy consumption, subject to constraints on resource availability. To resolve these issues, we proposed the DNA-TO algorithm, leveraging the advantages of a dual network architecture to reduce noise and enhance decision-making in the reinforcement learning process. Additionally, the KKT-RA algorithm efficiently allocated computation resources to offloaded tasks. To evaluate the effectiveness of our proposed approach, we conducted various simulations using real movement data from the Geolife GPS trajectory dataset. The results show that our algorithm outperformed existing methods (e.g., the SAC algorithm, the GA, and the greedy algorithm) in terms of both latency and energy consumption. However, there are several potential challenges to implementing the proposed method in real-world scenarios. Factors such as adverse weather conditions, varying obstacle distributions, and limited UEC battery capacity can significantly impact the feasibility and quality of the AR experience. In future work, we plan to analyze the effects of these challenges and extend this work to study solutions, such as strategies for UEC charging and replacement, cooperative operation strategies between UECs, and using UECs with diverse capabilities to improve the AR experience. Additionally, we intend to test our proposed framework in real-world AR applications to validate its performance under realistic conditions and identify further areas for improvement.

Author Contributions

Conceptualization, D.V.A.D., S.A., and S.Y.; Formal analysis, D.V.A.D., S.A., and S.Y.; Funding acquisition, S.Y.; Investigation, D.V.A.D., S.A., and S.Y.; Methodology, D.V.A.D., S.A., and S.Y.; Project administration, S.Y.; Supervision, S.Y.; Validation, D.V.A.D., S.A., and S.Y.; Writing—original draft, D.V.A.D., S.A., and S.Y.; Writing—review and editing, D.V.A.D., S.A., and S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the 2024 Research Fund of University of Ulsan.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mendoza-Ramírez, C.E.; Tudon-Martinez, J.C.; Félix-Herrán, L.C.; Lozoya-Santos, J.d.J.; Vargas-Martínez, A. Augmented reality: Survey. Appl. Sci. 2023, 13, 10491. [Google Scholar] [CrossRef]
Boos, U.C.; Reichenbacher, T.; Kiefer, P.; Sailer, C. An augmented reality study for public participation in urban planning. J. Locat. Based Serv. 2023, 17, 48–77. [Google Scholar] [CrossRef]
Mao, C.C.; Chen, C.H. Augmented reality of 3D content application in common operational picture training system for army. Int. J. Hum.–Comput. Interact. 2021, 37, 1899–1915. [Google Scholar] [CrossRef]
Al-Ansi, A.M.; Jaboob, M.; Garad, A.; Al-Ansi, A. Analyzing augmented reality (AR) and virtual reality (VR) recent development in education. Soc. Sci. Humanit. Open 2023, 8, 100532. [Google Scholar] [CrossRef]
Ashwini, K.B.; Savitha, R.; Ananya, H. Application of Augmented Reality Technology for Home Healthcare Product Visualization. ECS Trans. 2022, 107, 10921. [Google Scholar] [CrossRef]
Davis, L.; Aslam, U. Analyzing consumer expectations and experiences of Augmented Reality (AR) apps in the fashion retail sector. J. Retail. Consum. Serv. 2024, 76, 103577. [Google Scholar]
Villagran-Vizcarra, D.C.; Luviano-Cruz, D.; Pérez-Domínguez, L.A.; Méndez-González, L.C.; Garcia-Luna, F. Applications analyses, challenges and development of augmented reality in education, industry, marketing, medicine, and entertainment. Appl. Sci. 2023, 13, 2766. [Google Scholar] [CrossRef]
Huynh, L.N.; Lee, Y.; Balan, R.K. Deepmon: Mobile gpu-based deep learning framework for continuous vision applications. In Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services, Niagara Falls, NY, USA, 19–23 June 2017; pp. 82–95. [Google Scholar]
Liu, Q.; Huang, S.; Opadere, J.; Han, T. An edge network orchestrator for mobile augmented reality. In Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, HI, USA, 16–19 April 2018; pp. 756–764. [Google Scholar]
Dustdar, S.; Avasalcai, C.; Murturi, I. Invited Paper: Edge and Fog Computing: Vision and Research Challenges. In Proceedings of the 2019 IEEE International Conference on Service-Oriented System Engineering (SOSE), San Francisco, CA, USA, 4–9 April 2019; pp. 96–9609. [Google Scholar] [CrossRef]
Wang, C.; Zhang, S.; Qian, Z.; Xiao, M.; Wu, J.; Ye, B.; Lu, S. Joint server assignment and resource management for edge-based MAR system. IEEE/ACM Trans. Netw. 2020, 28, 2378–2391. [Google Scholar] [CrossRef]
Chen, X.; Liu, G. Joint optimization of task offloading and resource allocation via deep reinforcement learning for augmented reality in mobile edge network. In Proceedings of the 2020 IEEE International Conference on Edge Computing (EDGE), Beijing, China, 19–23 October 2020; pp. 76–82. [Google Scholar]
Akter, S.; Duong, D.V.A.; Kim, D.Y.; Yoon, S. Task Offloading and Resource Allocation in UAV-aided Emergency response Operations via Soft Actor Critic. IEEE Access 2024, 12, 69258–69275. [Google Scholar] [CrossRef]
Sun, G.; He, L.; Sun, Z.; Wu, Q.; Liang, S.; Li, J.; Niyato, D.; Leung, V.C.M. Joint Task Offloading and Resource Allocation in Aerial-Terrestrial UAV Networks with Edge and Fog Computing for Post-Disaster Rescue. IEEE Trans. Mob. Comput. 2024, 23, 8582–8600. [Google Scholar] [CrossRef]
Morshed Alam, M.; Moh, S. Joint Optimization of Trajectory Control, Task Offloading, and Resource Allocation in Air-Ground Integrated Networks. IEEE Internet Things J. 2024, 11, 24273–24288. [Google Scholar] [CrossRef]
Wu, G.; Liu, Z.; Fan, M.; Wu, K. Joint Task Offloading and Resource Allocation in Multi-UAV Multi-Server Systems: An Attention-based Deep Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2024, 73, 11964–11978. [Google Scholar] [CrossRef]
An, X.; Fan, R.; Hu, H.; Zhang, N.; Atapattu, S.; Tsiftsis, T.A. Joint Task Offloading and Resource Allocation for IoT Edge Computing with Sequential Task Dependency. IEEE Internet Things J. 2022, 9, 16546–16561. [Google Scholar] [CrossRef]
Yan, J.; Bi, S.; Zhang, Y.J.; Tao, M. Optimal Task Offloading and Resource Allocation in Mobile-Edge Computing with Inter-User Task Dependency. IEEE Trans. Wirel. Commun. 2020, 19, 235–250. [Google Scholar] [CrossRef]
Aitchison, M.; Sweetser, P. DNA: Proximal policy optimization with a dual network architecture. Adv. Neural Inf. Process. Syst. 2022, 35, 35921–35932. [Google Scholar]
Zheng, Y.; Fu, H.; Xie, X.; Ma, W.Y.; Li, Q. Geolife GPS Trajectory Dataset—User Guide, Geolife GPS trajectories 1.1 ed.; Microsoft: Redmond, WA, USA, 2011. [Google Scholar]
Fan, R.; Liang, B.; Zuo, S.; Hu, H.; Jiang, H.; Zhang, N. Robust Task Offloading and Resource Allocation in Mobile Edge Computing with Uncertain Distribution of Computation Burden. IEEE Trans. Commun. 2023, 71, 4283–4299. [Google Scholar] [CrossRef]
He, Y.; Zhai, D.; Zhang, R.; Du, J.; Aujla, G.S.; Cao, H. A Mobile Edge Computing Framework for Task Offloading and Resource Allocation in UAV-assisted VANETs. In Proceedings of the IEEE INFOCOM 2021-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Vancouver, BC, Canada, 10–13 May 2021; pp. 1–6. [Google Scholar] [CrossRef]
Xu, D.; Xu, D. Cooperative task offloading and resource allocation for UAV-enabled mobile edge computing systems. Comput. Netw. 2023, 223, 109574. [Google Scholar] [CrossRef]
Consul, P.; Budhiraja, I.; Garg, D.; Kumar, N.; Singh, R.; Almogren, A.S. A Hybrid Task Offloading and Resource Allocation Approach for Digital Twin-Empowered UAV-Assisted MEC Network Using Federated Reinforcement Learning for Future Wireless Network. IEEE Trans. Consum. Electron. 2024, 70, 3120–3130. [Google Scholar] [CrossRef]
Zhang, P.; Su, Y.; Li, B.; Liu, L.; Wang, C.; Zhang, W.; Tan, L. Deep Reinforcement Learning Based Computation Offloading in UAV-Assisted Edge Computing. Drones 2023, 7, 213. [Google Scholar] [CrossRef]
Elgendy, I.A.; Meshoul, S.; Hammad, M. Joint task offloading, resource allocation, and load-balancing optimization in multi-UAV-aided MEC systems. Appl. Sci. 2023, 13, 2625. [Google Scholar] [CrossRef]
Seid, A.M.; Boateng, G.O.; Mareri, B.; Sun, G.; Jiang, W. Multi-Agent DRL for Task Offloading and Resource Allocation in Multi-UAV Enabled IoT Edge Network. IEEE Trans. Netw. Serv. Manag. 2021, 18, 4531–4547. [Google Scholar] [CrossRef]
Hao, J. Machine Learning for Road Active Safety in Vehicular Networks. Ph.D. Thesis, Institut Polytechnique de Paris, Palaiseau, France, 2024. [Google Scholar]
Lin, N.; Tang, H.; Zhao, L.; Wan, S.; Hawbani, A.; Guizani, M. A PDDQNLP algorithm for energy efficient computation offloading in UAV-assisted MEC. IEEE Trans. Wirel. Commun. 2023, 22, 8876–8890. [Google Scholar] [CrossRef]
Yan, J.; Zhao, X.; Li, Z. Deep Reinforcement Learning Based Computation Offloading in UAV-Assisted Vehicular Edge Computing Networks. IEEE Internet Things J. 2024, 11, 19882–19897. [Google Scholar] [CrossRef]
Bor-Yaliniz, R.I.; El-Keyi, A.; Yanikomeroglu, H. Efficient 3-D placement of an aerial base station in next generation cellular networks. In Proceedings of the 2016 IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia, 22–27 May 2016; pp. 1–5. [Google Scholar] [CrossRef]
Shakya, A.K.; Pillai, G.; Chakrabarty, S. Reinforcement learning algorithms: A brief survey. Expert Syst. Appl. 2023, 231, 120495. [Google Scholar] [CrossRef]
Sutton, R.S. Learning to predict by the methods of temporal differences. Mach. Learn. 1988, 3, 9–44. [Google Scholar] [CrossRef]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
Gordon, G.; Tibshirani, R. Karush-kuhn-tucker conditions. Optimization 2012, 10, 725. [Google Scholar]
Al-Hourani, A.; Kandeepan, S.; Lardner, S. Optimal LAP Altitude for Maximum Coverage. IEEE Wirel. Commun. Lett. 2014, 3, 569–572. [Google Scholar] [CrossRef]
Sadia, R.; Akter, S.; Yoon, S. Ellipsoidal Trajectory Optimization for Minimizing Latency and Data Transmission Energy in UAV-Assisted MEC Using Deep Reinforcement Learning. Appl. Sci. 2023, 13, 12136. [Google Scholar] [CrossRef]
Tian, J.; Wang, D.; Zhang, H.; Wu, D. Service Satisfaction-Oriented Task Offloading and UAV Scheduling in UAV-Enabled MEC Networks. IEEE Trans. Wirel. Commun. 2023, 22, 8949–8964. [Google Scholar] [CrossRef]
Zheng, J.; Cai, Y.; Wu, Y.; Shen, X. Dynamic Computation Offloading for Mobile Cloud Computing: A Stochastic Game-Theoretic Approach. IEEE Trans. Mob. Comput. 2019, 18, 771–786. [Google Scholar] [CrossRef]
Akter, S.; Kim, D.Y.; Yoon, S. Task offloading in multi-access edge computing enabled UAV-aided emergency response operations. IEEE Access 2023, 11, 23167–23188. [Google Scholar] [CrossRef]
Romli, R.; Razali, A.F.; Ghazali, N.H.; Hanin, N.A.; Ibrahim, S.Z. Mobile augmented reality (AR) marker-based for indoor library navigation. IOP Conf. Ser. Mater. Sci. Eng. 2020, 767, 012062. [Google Scholar] [CrossRef]
Gherghina, A.; Olteanu, A.C.; Tapus, N. A marker-based augmented reality system for mobile devices. In Proceedings of the 2013 11th RoEduNet International Conference, Sinaia, Romania, 17–19 January 2013; pp. 1–6. [Google Scholar] [CrossRef]
Siriwardhana, Y.; Porambage, P.; Liyanage, M.; Ylianttila, M. A Survey on Mobile Augmented Reality with 5G Mobile Edge Computing: Architectures, Applications, and Technical Aspects. IEEE Commun. Surv. Tutorials 2021, 23, 1160–1192. [Google Scholar] [CrossRef]
Alhijawi, B.; Awajan, A. Genetic algorithms: Theory, genetic operators, solutions, and applications. Evol. Intell. 2024, 17, 1245–1256. [Google Scholar] [CrossRef]
Haarnoja, T.; Zhou, A.; Hartikainen, K.; Tucker, G.; Ha, S.; Tan, J.; Kumar, V.; Zhu, H.; Gupta, A.; Abbeel, P.; et al. Soft actor-critic algorithms and applications. arXiv 2018, arXiv:1812.05905. [Google Scholar]

Figure 1. The system model for augmented reality applications.

Figure 2. Processing an AR application.

Figure 3. The framework of the proposed scheme.

Figure 4. The scenario with a single UEC.

Figure 5. The results from varying the number of UEs. (a) Average task execution latency. (b) Total energy consumption.

Figure 6. Results of the various task data sizes. (a) Average task execution latency. (b) Total energy consumption.

Figure 7. The results from the various CPU cycle requirements. (a) Average task execution latency. (b) Total energy consumption.

Figure 8. The scenario with three UECs.

Figure 9. Results from the various task data sizes. (a) Average task execution latency. (b) Total energy consumption.

Figure 10. The results from the various CPU cycle requirements. (a) Average task execution latency. (b) Total energy consumption.

Table 1. Parameters in the system model.

For the UEC
$E^{U E C}$	The remaining energy of the UEC
$F^{U E C}$	The available computation resource on the UEC
$f_{l}$	The computation resource assigned to task l by the UEC
$μ^{U E C}$	The energy consumed per CPU cycle of the UEC
$p_{U E C}^{R}$	The power consumption per time unit of the UEC when receiving data
h	The height of the UEC
For the UEs
$S_{U E}$	The set of UEs
$E_{i}^{U E}$	The remaining energy of UE i
$p_{i}^{T}$	The transmission power of UE i
$r_{i}$	The uplink transmission rate from UE i to the UEC
$F_{i}^{U E}$	The computation resource of UE i
$μ_{i}^{U E}$	The energy consumed per CPU cycle of UE i
Other parameters
$S_{t a s k_t y p e}$	The set of task types
$S_{t}^{t a s k}$	The set of tasks in time slot t
$n (l)$	The type of task l
$c (l)$	The computation cost of task l
$d (l)$	The input data size of task l
a and b	Constants that depend on the propagation environment in Equation (1)
c	The speed of light
B	The sub-channel bandwidth
$t^{L o c} (l)$	The completion time of task l when executing locally
$e^{L o c} (l)$	The energy consumption by UE i for locally processed task l
$t_{i}^{T_I n p u t} (l)$	The time delay for transmitting the input of task l from $U E_{i}$ to the UEC
$e_{i}^{T_I n p u t} (l)$	The energy consumption by UE i when uploading task l to the UEC
$e_{U E C}^{R_I n p u t} (l)$	The energy consumption by the UEC when receiving input data for task l
$t_{U E C}^{E x e} (l)$	The execution time of task l when executed by the UEC
$e_{U E C}^{E x e} (l)$	The energy consumption when executing task l at the UEC
$t^{U E C} (l)$	The completion time of task l when offloaded to the UEC
$e^{U E C} (l)$	The total energy consumption when offloading task l to the UEC

Table 2. Experimental settings.

Parameter	Value
Number of UECs	{1, 3}
Number of UEs	20
Bandwidth B (in megahertz)	20
Data size of task (in megabytes): Video source Tracker Mapper Object Recognizer Renderer	[0.5, 8] [0.5, 8] [0.5, 16] [0.5, 8] [0.5, 8]
Required CPU cycles to complete task: Video source Tracker Mapper Object Recognizer Renderer	[0.1 × 10⁹, 0.5 × 10⁹] [0.1 × 10⁹, 0.5 × 10⁹] [0.1 × 10⁹, 1 × 10⁹] [0.1 × 10⁹, 0.5 × 10⁹] [0.1 × 10⁹, 0.5 × 10⁹]
Energy capacity of UECs (in megajoules)	6
Energy capacity of UEs (in kilojoules)	[10, 20]
Computation resource of UECs (in gigahertz)	50
Computation resource of UEs (in gigahertz)	[1.8, 2]
The carrier frequency f (in gigahertz)	2
The transmission power of UEs (in decibel-milliwatt)	20
Path loss parameter a	9.61
Path loss parameter b	0.16
Path loss parameter $μ_{L o S}$	1.0
Path loss parameter $μ_{N L o S}$	20
Tunable parameter $α$	0.5
The clipping coefficient $ϵ$	0.2
The entropy bonus coefficient $c_{e b}$	$10^{- 3}$
The policy constraint coefficient $δ$	1

Table 3. Different task data size setups (in megabytes).

Task	Setup 1	Setup 2	Setup 3	Setup 4
Video source	[0.5, 2]	[2, 4]	[4, 6]	[6, 8]
Tracker	[0.5, 2]	[2, 4]	[4, 6]	[6, 8]
Mapper	[0.5, 4]	[2, 8]	[4, 12]	[6, 16]
object Recognizer	[0.5, 2]	[2, 4]	[4, 6]	[6, 8]
Renderer	[0.5, 2]	[2, 4]	[4, 6]	[6, 8]

Table 4. CPU cycle requirements.

Task	Setup 1	Setup 2	Setup 3	Setup 4
Video source	[0.1 × 10⁹, 0.2 × 10⁹]	[0.2 × 10⁹, 0.3 × 10⁹]	[0.3 × 10⁹, 0.4 × 10⁹]	[0.4 × 10⁹, 0.5 × 10⁹]
Tracker	[0.1 × 10⁹, 0.2 × 10⁹]	[0.2 × 10⁹, 0.3 × 10⁹]	[0.3 × 10⁹, 0.4 × 10⁹]	[0.4 × 10⁹, 0.5 × 10⁹]
Mapper	[0.1 × 10⁹, 0.4 × 10⁹]	[0.2 × 10⁹, 0.6 × 10⁹]	[0.3 × 10⁹, 0.8 × 10⁹]	[0.4 × 10⁹, 1 × 10⁹]
Object Recognizer	[0.1 × 10⁹, 0.2 × 10⁹]	[0.2 × 10⁹, 0.3 × 10⁹]	[0.3 × 10⁹, 0.4 × 10⁹]	[0.4 × 10⁹, 0.5 × 10⁹]
Renderer	[0.1 × 10⁹, 0.2 × 10⁹]	[0.2 × 10⁹, 0.3 × 10⁹]	[0.3 × 10⁹, 0.4 × 10⁹]	[0.4 × 10⁹, 0.5 × 10⁹]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Van Anh Duong, D.; Akter, S.; Yoon, S. Task Offloading and Resource Allocation for Augmented Reality Applications in UAV-Based Networks Using a Dual Network Architecture. Electronics 2024, 13, 3590. https://doi.org/10.3390/electronics13183590

AMA Style

Van Anh Duong D, Akter S, Yoon S. Task Offloading and Resource Allocation for Augmented Reality Applications in UAV-Based Networks Using a Dual Network Architecture. Electronics. 2024; 13(18):3590. https://doi.org/10.3390/electronics13183590

Chicago/Turabian Style

Van Anh Duong, Dat, Shathee Akter, and Seokhoon Yoon. 2024. "Task Offloading and Resource Allocation for Augmented Reality Applications in UAV-Based Networks Using a Dual Network Architecture" Electronics 13, no. 18: 3590. https://doi.org/10.3390/electronics13183590

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Task Offloading and Resource Allocation for Augmented Reality Applications in UAV-Based Networks Using a Dual Network Architecture

Abstract

1. Introduction

2. Related Work

3. The System Model

3.1. The Communication Model

3.2. The Computation Model

3.2.1. Local Computing

3.2.2. Computing in a UEC

4. Problem Formulation

5. Problem Decomposition

6. Algorithm

6.1. DNA-Based Task Offloading

6.1.1. Markov Decision Process Formulation

6.1.2. The DNA-TO Algorithm

6.2. Karush–Kuhn–Tucker-Based Resource Allocation

6.3. Algorithm Complexity

7. Evaluation Results

7.1. Simulation Results

7.1.1. Results with a Single UEC

7.1.2. Results with Multiple UECs

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI