Task Offloading Strategy for UAV-Assisted Mobile Edge Computing with Covert Transmission

Hu, Zhijuan; Zhou, Dongsheng; Shen, Chao; Wang, Tingting; Liu, Liqiang

doi:10.3390/electronics14030446

Open AccessArticle

Task Offloading Strategy for UAV-Assisted Mobile Edge Computing with Covert Transmission

by

Zhijuan Hu

^1,*,†,

Dongsheng Zhou

^1,†,

Chao Shen

^1,*,

Tingting Wang

² and

Liqiang Liu

¹

School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, China

²

School of Telecommunications Engineering, Xidian University, Xi’an 710071, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2025, 14(3), 446; https://doi.org/10.3390/electronics14030446

Submission received: 19 December 2024 / Revised: 14 January 2025 / Accepted: 15 January 2025 / Published: 23 January 2025

(This article belongs to the Special Issue Research in Secure IoT-Edge-Cloud Computing Continuum)

Download

Browse Figures

Versions Notes

Abstract

:

Task offloading strategies for unmanned aerial vehicle (UAV) -assisted mobile edge computing (MEC) systems have emerged as a promising solution for computationally intensive applications. However, the broadcast and open nature of radio transmissions makes such systems vulnerable to eavesdropping threats. Therefore, developing strategies that can perform task offloading in a secure communication environment is critical for both ensuring the security and optimizing the performance of MEC systems. In this paper, we first design an architecture that utilizes covert communication techniques to guarantee that a UAV-assisted MEC system can securely offload highly confidential tasks from the relevant user equipment (UE) and calculations. Then, utilizing the Markov Decision Process (MDP) as a framework and incorporating the Prioritized Experience Replay (PER) mechanism into the Deep Deterministic Policy Gradient (DDPG) algorithm, a PER-DDPG algorithm is proposed, aiming to minimize the maximum processing delay of the system and the correct detection rate of the warden by jointly optimizing resource allocation, the movement of the UAV base station (UAV-BS), and the transmit power of the jammer. Simulation results demonstrate the convergence and effectiveness of the proposed approach. Compared to baseline algorithms such as Deep Q-Network (DQN) and DDPG, the PER-DDPG algorithm achieves significant performance improvements, with an average reward increase of over 16% compared to DDPG and over 53% compared to DQN. Furthermore, PER-DDPG exhibits the fastest convergence speed among the three algorithms, highlighting its efficiency in optimizing task offloading and communication security.

Keywords:

mobile edge computing; cover communication; unmanned aerial vehicle; deep deterministic policy gradient; prioritized experience replay

1. Introduction

With the advancement of 5G technology, CPU-bound applications on user equipment (UE) are increasingly prevalent and thriving, which has led to significant demand for computing resources, generating high levels of energy consumption. While Mobile Cloud Computing (MCC) can address this issue to some extent, the cloud server is generally physically far away from the UE, which causes high transmission latency and adversely affects user experience [1]. In this context, as a promising technology, Mobile Edge Computing (MEC) relocates cloud computation assets and facilities nearer to the UE and provides real-time processing to speed up data analysis and decision-making processes [2].

In recent years, with the advantages of reduced production cost, high adaptability in deployment, and the wide range of applications, the research on unmanned aerial vehicle (UAV) -assisted MEC has attracted extensive attention, especially in resource allocation and path planning [3]. Li et al. [4] constructed a Dinkelbach algorithm that aims to minimize UAV energy expenditure and maximize energy efficiency by optimizing computational offloading. Diao et al. [5] formulated an optimization algorithm that jointly optimizes task data distribution to conserve energy and UAV trajectories. Lin et al. [6] proposed a PDDQNLP algorithm to tackle the challenges of UAV trajectory design and terminal device decisions, optimizing the offloading decisions and time allocation. Selim et al. [7] also suggested incorporating Device-to-Device (D2D) communication as a supplementary means to facilitate both computation offloading and communication within the UAV-MEC system. However, in most research on UAV-assisted MEC systems, information is primarily conveyed via radio transmissions. Given the inherently broadcast and accessible characteristics of radio communications, UAV-assisted MEC faces a significant risk of eavesdropping [8,9].

Traditionally, cryptography technology can be used to share keys between nodes to enable secure communications [10], but the computational load and time increases, which is contrary to the goal of MEC to reduce latency [11]. In addition, physical layer security techniques also enhance UAV communication confidentiality. They maximize the disparity between legitimate and eavesdropping channels through resource allocation and noise generation, making it difficult for eavesdroppers to decode information accurately [12,13]. Yet, these methods alone cannot fully protect against eavesdropping. In certain critical scenarios, concealing transmissions is essential.

As a high-level security requirement, covert communication not only prevents the message content from being deciphered by adversaries, but also ensures that the existence of wireless transmission is undetected by keeping the detection probability of the warden below a set threshold [14]. Chen et al. [15] demonstrated that a maximum of

O (\sqrt{n})

bits can be securely and secretly transmitted from Alice to Bob over AWGN channels in n transmissions. Following this principle of square root scaling, Yan et al. [16] investigate the optimality of Gaussian signaling for covert communications under Kullback–Leibler (KL) divergence constraints. They demonstrate that Gaussian signaling maximizes mutual information when constrained by an upper bound on

D (p_{1} | | p_{0})

. Using these principles, it was found that despite their high detectability due to dominant LoS links, the flexible 3-D mobility of UAV base stations (UAV-BSs) can improve system performance [17]. Consequently, studies on covert communications involving UAVs have been conducted in the past few years [18,19,20,21,22]. In [18], an integrated UAV positioning and transmit power optimization was aimed at maximizing communication capacity within a stealth-constrained three-node model. Chen et al. [19] propose an optimal design for transmit power and IRS reflection matrix in an IRS-assisted system to maximize detection at a legitimate receiver while ensuring low detection at a warden. Moreover, Mao et al. [20] present a framework for energy-efficient UAV-assisted short-packet transmission systems that addresses security against cooperative detection and eavesdropping by maximizing the Minimum Covert Secrecy Energy Efficiency (MCSEE) through a dual-loop iterative optimization algorithm. In the studies [21,22], a scenario involving covert transmissions from a UAV-BS to multiple users was examined, and the average covert performance was ultimately enhanced by co-optimizing the user association, the transmit power, and the UAV trajectory. In addition, artificial noise is also an effective strategy to impair the detection capabilities of wardens, while complementing the flexibility of UAV-BSs. By fine-tuning the transmit power of a cooperative jammer, it is possible to introduce uncertainty in the received power for wardens while keeping the negative effects on legitimate users to a minimum; therefore, the covert performance will be enhanced. In [23], Arzykulov et al. proposed a virtual partitioning of reconfigurable intelligent surface (RIS) elements to improve physical layer security by enhancing the signal for legitimate users and increasing artificial noise (AN) for illegitimate users. A UAV with two antennas, one for receiving covert data and the other for transmitting jamming signals, was studied by Zhou et al. [24]. Du et al. [25] formulate the problem of achieving covert communication and high data rates as a Nash bargaining game and propose two algorithms and demonstrate their effectiveness. However, the existing research above is still incomplete. For example, there has been insufficient exploration on covert transmission from one single UAV-BS to multiple end devices. Additionally, few research efforts have addressed task offloading in covert communication scenarios, and there is a scarcity of employing deep reinforcement learning to tackle these challenges.

It is worth noting that recent studies have further advanced the field of covert communication. For instance, Wang et al. [26] explored the trade-offs among reliability, covertness, latency, and transmission rate in wireless short-packet communications, proposing optimization frameworks to maximize throughput under proactive warden scenarios. Meanwhile, Ji et al. [27] proposed an OOK scheme with phase deflection for covert communication, leveraging Willie’s phase uncertainty to enhance transmission rates while ensuring covertness. Although these works provide valuable insights into the design and optimization of covert communication systems, their research focus is distinct from the scope of this study, which centers on the integration of covert communication into UAV-assisted MEC systems.

In this research, we consider an MEC system assisted by UAVs and covert communications, covering both air and ground network layers, in which the fixed-position warden and the slow-moving UEs are located on the ground, and the UAVs in the air play the roles of the jammer and the UAV-BS, respectively. The UEs offload excessive processing-demanding tasks to the UAV-BS to implement computation, and to prevent data transmission activities from being detected by the warden, the jammer transmits interference signals to the warden. We divide the system period equally into some time slots and assume orthogonal frequency division multiple access between the UAV-BS and the UEs. By collaboratively optimizing the movement of the UAV-BS, the resource allocation of the system and the transmit power of the jammer, the system delay, and the detection accuracy of the warden are minimized, which ensures the flexibility, real-time performance, and security of the system. Unlike previous studies, our work addresses the joint optimization of task offloading, resource allocation, and covert communication in UAV-assisted MEC systems, leveraging deep reinforcement learning to dynamically adapt to the changing environment. Specifically, we tackle the challenge of task offloading from UEs to a UAV-BS under covert communication constraints, a scenario that has been largely overlooked in existing research. This approach not only enhances the security and efficiency of MEC systems but also opens new avenues for research in UAV-assisted covert communication. The main contributions of this paper are summarized below.

We establish a system that employs a UAV-BS and a UAV-Jammer to assist UEs with demanding computational tasks while remaining undetected by a warden. In this system, both the UEs and the UAV-BS can perform calculations while simultaneously transmitting and receiving information. After developing transmission and computation models for the UEs and UAV-BS, as well as formulating binary hypothesis testing to assess the surveillance mechanisms of the warden, we frame the overall task offloading strategy of the system as an optimization problem.
The optimization problem is reformulated as a Markov decision process (MDP), and we employ a Deep Deterministic Policy Gradient (DDPG) algorithm to obtain the best policy. To improve the efficiency of experience replay in DDPG and accelerate the learning process, we integrate a prioritized experience replay (PER) mechanism into the DDPG framework and propose a novel PER-DDPG algorithm, which means we switch from the previous uniform sampling method to priority sampling. By adopting this approach, we accelerate the algorithm’s convergence and significantly enhance its performance.
To evaluate the performance of our proposed method, extensive simulations are performed. Initially, we verify the impact of various parameters on the convergence of the proposed algorithm. Subsequently, ablation experiments are carried out to confirm the effectiveness of the proposed algorithm. Ultimately, simulation results from our comparisons demonstrate that our proposed solution outperforms existing methods regarding reductions in both system delay and the detection accuracy of the warden.

The remainder of the paper is organized as follows: Section 2 presents the system model and formulates the optimization problem of the secrecy transmission and task offloading. In Section 3, we construct the MDP and propose the PER-DDPG algorithm. Finally, performance simulations and conclusions are provided in Section 4 and Section 5.

2. System Model

A system model of UAV-assisted covert MEC is shown in Figure 1, composed of a UAV-BS, a UAV jammer, a warden, and K UEs. Let

K = {1, 2, \dots, K}

represent the set of UEs, and let the system duration be discretized into I equal time slots, denoted as

T = {1, 2, \dots, I}

. At every time slot

i \in T

, the UE

k \in K

generates a large number of computational tasks while moving at a low speed. Based on environmental factors such as distance, task size, and other relevant parameters, the UAV-BS flies to the vicinity of the selected UE and receives a portion of the task offloaded by the UE. During this time, a warden is stationed on the ground to monitor UEs while the jammer hovers at a fixed location and secures the communication link between UAV-BS and UEs by emitting interference to the warden. In the 3D Cartesian coordinate system, let

h_{1}

and

h_{2}

be the altitudes of the UAV-BS and jammer; then, the locations of the UAV-BS, jammer, warden, and UE k can be denoted by

L^{B S} (i) = (x^{B S} (i), y^{B S} (i), h_{1})

,

L^{J} (i) = (x^{J} (i), y^{J} (i), h_{2})

,

L^{W} (i) = (x^{W} (i), y^{W} (i), 0)

and

L_{k}^{U E} (i) = (x_{k}^{U E} (i), y_{k}^{U E} (i), 0)

, respectively.

2.1. Communications Model

For convenience,

U = {UAV - BS, jammer}

denotes the group of aerial nodes consisting of a UAV-BS and jammer, and

V = {UEs, warden}

denotes the group of ground nodes including UEs and the warden. Assuming that the air-to-ground link in this system is a combination of line-of-sight (LoS) and non-line-of-sight (NLoS) channels [28], in time slot i, the likelihood of LoS between nodes

u \in U

and

v \in V

can be approximated as follows:

p_{u . v}^{L o S} (i) = \frac{1}{1 + a exp (- b (θ_{u, v} (i) - a))}, u \in U, v \in V,

(1)

where

θ_{u, v} (i) = (180 / π) arctan (h_{u} / r_{u, v} (i))

is the elevation angle from u to v, with

r_{u, v} (i) = \sqrt{{(x_{u} - x_{v})}^{2} + {(y_{u} - y_{v})}^{2}}

,

h_{u}

represents the elevation angle and horizontal distance from u to v, and a and b are environmental conditions associated with the concentration of buildings and the distribution of building heights, etc. Using

f_{c}

to denote the carrier frequency, let

μ_{L o S}

and

μ_{N L o S}

denote the loss factors for LoS and NLoS links, and c denote the light speed; the average path loss between u and v can be calculated by the following:

L_{u, v} (i) = (p_{u, v}^{L o S} (i) μ_{L o S} + p_{u, v}^{N L o S} (i) μ_{N L o S}) {(\frac{4 π f_{c} d_{u, v} (i)}{c})}^{2},

(2)

where

d_{u, v} (i) = \sqrt{{(x_{u} - x_{v})}^{2} + {(y_{u} - y_{v})}^{2} + h_{u}^{2}}

is the Euclidean distance between u and v, and

p_{u, v}^{N L o S} (i) = 1 - p_{u, v}^{L o S}

is the NLoS probability. Then, the reciprocal of the average path loss represents the average channel gain between u and v:

G_{u, v} (i) = \frac{1}{L_{u, v} (i)} .

(3)

To prevent communication interference, the frequency bands allocated by the UAV-BS to each UE are orthogonal, as are those assigned by the warden. The total bandwidth B of the UAV-BS is equally distributed to K UEs, so that each UE has a communication bandwidth of

b_{k} = B / K

. Therefore, the wireless transmission rate from UE k to UAV-BS is

r_{k} (i) = b_{k} {log}_{2} (1 + \frac{P_{k} G_{u, v} (i)}{σ^{2}}),

(4)

where the Gaussian white noise power in the communication link is given by

σ^{2}

, and

P_{k}

takes the maximum value of the transmit power of UE k in order to reduce the system delay.

2.2. Computational Model

At each time slot i, the UE k generates tasks to be computed, denoted by a binary group

T a s k_{k} = {D_{k} (i), s}

, with

D_{k} (i)

representing the size of the computation task and s the value of CPU cycles required per unit bit. Assuming that

R_{k} (i) \in [0, 1]

is the fraction of tasks delegated to the UAV-BS, the fraction retained in the UE k is

(1 - R_{k} (i))

. Let

f_{U E}

represent the computational power of the UE; consequently, the local processing latency of UE k is

t_{k}^{l o c a l} (i) = \frac{(1 - R_{k} (i)) D_{k} (i) s}{f_{U E}} .

(5)

In time slot i, the UAV-BS transitions from

L^{B S} (i)

to the new location

L^{B S} (i + 1) = (x^{B S} (i) + v (i) t_{f l y} cos β (i), y^{B S} (i) + v (i) t_{f l y} sin β (i), h_{1})

, with speed

v (i) \in [0, v_{m a x}]

and angle

β (i) \in [0, 2 π]

, and the energy used to fly can be represented as follows:

E_{f l y} (i) = 0.5 M_{B S} t_{f l y} {∥v (i)∥}^{2},

(6)

where

M_{B S}

and

t_{f l y}

are the payload mass and fixed flight time of the UAV-BS, respectively. Although UAV-BS hovering also consumes energy, it is so tiny compared to flight energy and can be ignored. Similarly, the downlink transmission delay is disregarded due to the minimal size of computation results from the UAV-BS. In this case, the processing delay caused by a UAV-BS consists of two parts, i.e., transmission delay

t_{k}^{t r} (i)

and computational delay

t_{B S}^{c o m} (i)

:

t_{k}^{t r} (i) = \frac{R_{k} (i) D_{k} (i)}{r_{k} (i)},

(7)

t_{B S}^{c o m} (i) = \frac{R_{k} (i) D_{k} (i) s}{f_{B S}},

(8)

where

f_{B S}

denotes the computing frequency of the UAV-BS CPU. Correspondingly, the energy consumption for offloading tasks is also determined by both communication and computation. Ref. [29] provides a detailed computational model, but since the downlink energy consumption in this paper is negligible, we omit the communication energy consumption from the UAV-BS to the UEs and simplify the computational energy consumption as follows:

E_{B S}^{c o m} (i) = η f_{B S}^{3} t_{B S}^{c o m} (i) = η f_{B S}^{2} R_{k} (i) D_{k} (i) s,

(9)

where

η

is the energy consumption factor related to the UAV-BS CPU model.

2.3. Binary Hypothesis Testing at Warden

The warden adopts binary hypothesis testing based on received signal observations to detect UE data transmission in covert communications. Assuming that the warden obtains the precise locations of all UEs and the jammer through radar or UAV position observation [22], the average receive power of the warden in time slot i is

T (i) = \{\begin{matrix} P_{J} (i) G_{J, W} (i) + σ^{2}, & H_{0} \\ \sum_{k = 1}^{K} P_{k} G_{k, W} (i) + P_{J} (i) G_{J, W} (i) + σ^{2}, & H_{1}, \end{matrix}

(10)

where

P_{J} (i)

represents the transmit power of the jammer, while

H_{0}

and

H_{1}

denote the null hypothesis (UE is silent) and the alternative hypothesis (UE is transmitting), respectively. Set the detection threshold to

τ^{t h}

, and use

D_{0}

and

D_{1}

to denote decisions in support of

H_{0}

and

H_{1}

, respectively; then, the false alarm rate

φ (i)

and miss detection rate

ψ (i)

of the warden can be calculated as follows:

\begin{matrix} φ (i) = \Pr \{D_{1} | H_{0}\} = \Pr \{P_{J} (i) G_{J, W} (i) + σ^{2} \geq τ^{t h}\}, \end{matrix}

(11)

\begin{matrix} ψ (i) = \Pr \{D_{0} | H_{1}\} = \Pr \{\sum_{k = 1}^{K} P_{k} G_{k, W} (i) + P_{J} (i) G_{J, W} (i) + σ^{2} \leq τ^{t h}\} . \end{matrix}

(12)

Suppose

P_{J} (i)

follows a uniform distribution within

[0, P_{J}^{m a x}]

; its probability density function (PDF) is

f_{P_{J} (i)} (x) = \{\begin{matrix} \frac{1}{P_{J}^{m a x}}, & 0 \leq x \leq P_{J}^{m a x} \\ 0, & o t h e r w i s e . \end{matrix}

(13)

Applying (13) to Equations (11) and (12), we obtain

\begin{matrix} φ (i) & = \Pr \{P_{J} (i) \geq \frac{τ^{t h} - σ^{2}}{G_{J, W} (i)}\} = \int_{max (0, \frac{τ^{t h} - σ^{2}}{G_{J, W} (i)})}^{P_{J} (i)} f_{P_{J} (i)} (x) d x \\ = \{\begin{matrix} 1, & τ^{t h} \leq σ^{2} \\ 1 - \frac{τ^{t h} - σ^{2}}{P_{J} (i) G_{J, W} (i)}, & σ^{2} < τ^{t h} \leq μ_{1} \\ 0, & μ_{1} < τ^{t h}, \end{matrix} \end{matrix}

(14)

\begin{matrix} ψ (i) & = \Pr \{P_{J} (i) \leq \frac{τ^{t h} - \sum_{k = 1}^{K} P_{k} G_{k, W} (i) - σ^{2}}{G_{J, W} (i)}\} = \int_{0}^{min (P_{J} (i), \frac{τ^{t h} - \sum_{k = 1}^{K} P_{k} G_{k, W} (i) - σ^{2}}{G_{J, W} (i)})} f_{P_{J} (i)} (x) d x \\ = \{\begin{matrix} 0, & τ^{t h} \leq μ_{2} \\ \frac{τ^{t h} - \sum_{k = 1}^{K} P_{k} G_{k, W} (i) - σ^{2}}{P_{J} (i) G_{J, W} (i)}, & μ_{2} < τ^{t h} \leq μ_{3} \\ 1, & μ_{3} < τ^{t h}, \end{matrix} \end{matrix}

(15)

where

μ_{1} ≜ P_{J} (i) G_{J, W} (i) + σ^{2}

,

μ_{2} ≜ \sum_{k = 1}^{K} P_{k} G_{k, W} (i) + σ^{2}

, and

μ_{3} ≜ \sum_{k = 1}^{K} P_{k} G_{k, W} (i) + P_{J} (i) G_{J, W} (i) + σ^{2}

.

The detection performance of the warden can be assessed by the error rate

ξ (i) = φ (i) + ψ (i)

. If

μ_{1} \leq μ_{2}

, i.e.,

P_{j} (i) g_{j, w} (i) \leq \sum_{k = 1}^{K} P_{k} g_{k, w} (i)

, by setting

τ^{t h} = μ_{2}

, it is possible to achieve a detection error rate of zero at the warden. Consequently, it is imperative to examine the broader scenario where

μ_{2} \leq μ_{1} \leq μ_{3}

, for which the detection error rate can be derived as follows:

ξ (i) = \{\begin{matrix} 1, & τ^{t h} \leq σ^{2} \\ 1 - \frac{τ^{t h} - σ^{2}}{P_{J} (i) G_{J, W} (i)}, & σ^{2} < τ^{t h} \leq μ_{2} \\ 1 - \frac{\sum_{k = 1}^{K} P_{k} G_{k, W} (i)}{P_{J} (i) G_{J, W} (i)}, & μ_{2} < τ^{t h} \leq μ_{1} \\ \frac{τ^{t h} - \sum_{k = 1}^{K} P_{k} G_{k, W} (i) - σ^{2}}{P_{J} (i) G_{J, W} (i)}, & μ_{1} < τ^{t h} \leq μ_{3} \\ 1, & μ_{3} < τ^{t h} . \end{matrix}

(16)

With the goal of minimizing the total detection error rate, the warden sets an appropriate threshold. From (16), it can be seen that in order to ensure that

ξ (i) < 1

, there must be

σ^{2} < τ^{t h} \leq μ_{3}

. Moreover,

ξ (i)

monotonically decreases with

τ^{t h}

for

σ^{2} < τ^{t h} \leq μ_{2}

and monotonically increases for

μ_{1} < τ^{t h} \leq μ_{3}

, which implies that

ξ (i)

is a continuous function of

τ^{t h}

. Thus, by setting the threshold

μ_{2} < τ^{t h^{*}} \leq μ_{1}

, the warden can obtain the minimum detection error rate

ξ^{*} (i)

as follows:

ξ^{*} (i) = 1 - \frac{\sum_{k = 1}^{K} P_{k} G_{k, W}^{*} (i)}{P_{J}^{*} (i) G_{J, W}^{*} (i)} .

(17)

2.4. Problem Formulation

By collaboratively optimizing resource allocation, UAV-BS movement, and jammer transmission power, the processing delay of the system and the correct detection rate of the warden can be minimized, thereby ensuring the effective utilization of limited computing resources. The specific problems are expressed as follows:

\begin{matrix} min_{\{α_{k} (i), L (i), R_{k} (i), ξ^{*} (i)\}} ( & \sum_{i = 1}^{I} \sum_{k = 1}^{K} α_{k} (i) max (t_{k}^{l o c a l} (i), t_{k}^{t r} (i) + t_{b}^{c o m} (i)) + (1 - ξ^{*} (i))) \end{matrix}

(18a)

\begin{matrix} s . t . & α_{k} (i) \in \{0, 1\}, \forall i \in T, k \in K, \end{matrix}

(18b)

\begin{matrix} \sum_{k = 1}^{K} α_{k} (i) = 1, \forall i, \end{matrix}

(18c)

\begin{matrix} 0 \leq R_{k} (i) \leq 1, \forall i, k, \end{matrix}

(18d)

\begin{matrix} ξ^{*} (i) \geq 1 - ϵ, \forall i, \end{matrix}

(18e)

\begin{matrix} L (i) \in R, \forall i, \end{matrix}

(18f)

\begin{matrix} 0 \leq v (i) \leq v_{m a x}, \forall i, \end{matrix}

(18g)

\begin{matrix} \sum_{i = 1}^{I} (E_{f l y} (i) + E_{B S}^{c o m} (i)) \leq E_{B S}, \forall k, \end{matrix}

(18h)

\begin{matrix} 0 \leq P_{J} (i) \leq P_{J}^{m a x}, \forall i, \end{matrix}

(18i)

\begin{matrix} \sum_{i = 1}^{I} \sum_{k = 1}^{K} α_{k} (i) D_{k} (i) = D, \end{matrix}

(18j)

where constraints (18b) and (18c) make sure that the UAV-BS serves only one UE per time slot, and constraint (18d) specifies a scope of values for the offload portion of the computation task. In constraint (18e),

ϵ

is the permitted correct detection rate of the warden and is typically set to a small value. In order to ensure covert transmission from UE to UAV-BS, the

ξ^{*} (i)

must exceed

(1 - ϵ)

. Furthermore, the movement area of the UEs, the flight speed of the UAV-BS, and the transmit power of the jammer are limited by constraints (18f), (18g), and (18i). Simultaneously, constraints (18h) and (18j) indicate that the energy consumption of the UAV-BS within the relevant duration does not surpass the battery limit and all computational tasks must be accomplished within the specified time frame.

3. PER-DDPG-Based Task Offloading Optimization

Equation (18) belongs to the class of non-convex problems of mixed-integer non-linear programs (MINLPs). Because of the complex and variable environment during the offloading decision process, it needs to support a continuous action space, resulting in an excessively large state space dimension, which means that the traditional optimization methods struggle to address these challenges. Therefore, we transform it into a Markov Decision Process (MDP) model and then solve it by applying DRL techniques, with the objective of lowering the total system delay and the correct detection rate of the warden through a dynamic decision-making approach. Moreover, the DDPG algorithm is refined for the purpose of cumulative reward maximization.

3.1. Construction of MDP

The MDP is represented by a quintuple

(S, A, P, R, γ)

consisting of the state space

S

, the action space

A

, the state transition probability

P

, the reward function

R

, and the discount factor

γ

. Continuous time is discretized into multiple time moments i, with each state s stored in the set of states

S

and each action a stored in the set of actions

A

. The variable r denotes the reward at the current moment. By observing the environment and experimenting with different actions, the agent trains on maximizing the reward of the system until it identifies the action that yields the highest reward.

3.1.1. State Space

The state space of our UAV-assisted covert MEC system is co-determined by K UEs, the UAV-BS, and the jammer, denoted as follows:

\begin{matrix} S_{i} = & (E_{B S} (i), L^{B S} (i), L^{W} (i), L_{1}^{U E} (i), \dots, L_{k}^{U E} (i), E_{J} (i), D_{r e m a i n} (i), D_{1} (i), \dots, D_{k} (i)), \end{matrix}

(19)

where

E_{B S} (i)

and

E_{J} (i)

represent the remaining UAV-BS and jammer batteries, respectively, and

D_{r e m a i n} (i)

denotes the amount of computational tasks remaining for the entire period.

3.1.2. Action Space

The agent selects an action based on the detected environment and the system’s current state. The action space includes the selection of the UE k to be served, the determination of the flight speed and angle of the UAV-BS, the task offload rate, and the transmit power of the jammer, expressed as follows:

A_{i} = (k (i), β (i), v (i), R_{k} (i), P_{J} (i)),

(20)

where

k (i)

denotes that the UAB-BS serves UE k at time slot i.

β (i)

and

v (i)

represent the angle of flight and the velocity of the UAV-BS, respectively.

R_{k} (i)

indicates the task offloading ratio of UE k, while

P_{J} (i)

represents the jamming power of jammer. It is worth noting that

k (i)

is a discrete variable, whereas the other variables are continuous.

3.1.3. Reward Function

Since the behavior of the agent depends on the rewards it receives, selecting an appropriate reward function is essential. As defined in (18a), this paper aims to maximizing the reward while keeping the system’s processing delay and the warden’s detection accuracy at a minimum. Hence, the reward function is formulated as follows:

\begin{matrix} R_{i} & = r (s_{i}, a_{i}) = - \sum_{i = 1}^{I} [ω_{1} (t_{k}^{t r} (i) + t_{B S}^{c o m} (i)) + ω_{2} (1 - ξ^{*} (i))], \end{matrix}

(21)

where

(t_{k}^{t r} (i) + t_{B S}^{c o m} (i))

is the processing delay of the system, and

(1 - ξ^{*} (i))

is the maximum correct detection rate of the warden.

Due to the highly continuous nature of the state and action space in the system, obtaining an accurate state transition probability matrix becomes challenging. Therefore, to explore the best policy in the presence of uncertain state transition probabilities, a DDPG-based algorithm is employed.

3.2. PER-DDPG-Based Solution

The DDPG algorithm is a type of deep reinforcement learning designed for the policy space in continuous time, based on the actor–critic (AC) framework [30]. DDPG employs two distinct Deep Neural Networks (DNNs): the actor network

μ (s | θ^{μ})

, which describes the function of the policy, and the critic network

Q (s, a | θ^{Q})

, which describes the function of the Q-value. Furthermore, to enhance learning stability, each network has a corresponding target network with the same structure: the actor target network

μ^{'}

, parameterized by

θ^{μ^{'}}

; and the critic target network

Q^{'}

, with parameter

θ^{Q^{'}}

.

The critic network

Q (s, a | θ^{Q})

is trained to minimize the loss function

L (θ^{Q})

, which quantifies the error between the predicted Q-value and the target value

y_{i}

, ensuring accurate value estimation for policy optimization:

\begin{matrix} L (θ^{Q}) & = \frac{1}{N} \sum_{i = 1}^{N} [{(y_{i} - Q (s_{i}, a_{i} | θ^{Q}))}^{2}], \end{matrix}

(22a)

\begin{matrix} y_{i} & = r_{i} + γ Q^{'} (s_{i + 1}, μ (s_{i + 1}) | θ^{Q^{'}}) . \end{matrix}

(22b)

The policy gradient

\nabla_{θ^{μ}} J

is updated using the chain rule [31], which adjusts the actor network parameters

θ^{μ}

to maximize the expected Q-value:

\begin{matrix} \nabla_{θ^{μ}} J & \approx \frac{1}{N} \sum_{i = 1}^{N} [\nabla_{θ^{μ}} Q (s, a | θ^{Q}) |_{s = s_{i}, a = μ (s_{i} | θ^{μ})}] \\ = \frac{1}{N} \sum_{i = 1}^{N} [\nabla_{a} Q (s, a | θ^{Q}) {|_{s = s_{i}, a = μ (s_{i} | θ^{μ})} \nabla_{θ^{μ}} μ (s | θ^{μ}) |}_{s = s_{i}}] . \end{matrix}

(23)

The whole training process of DDPG can be summarized as shown in Figure 2. Firstly, after the previous training step, the actor network

μ

outputs

μ (s_{i})

. Since DDPG is an off-policy algorithm, we can adequately explore the set of states by extending the action space. This can be achieved by including a behavioral bias

n_{i}

to produce an action

a_{i} = μ (s_{i}) + n_{i}

, where

n_{i}

follows a Gaussian distribution

n_{i} \sim N (μ_{e}, σ_{e, i}^{2})

, with

μ_{e}

being the mean along with

σ_{e, i}

being the standard deviation. The agent observes the subsequent state

s_{i + 1}

and the associated reward

r_{i}

after executing

a_{i}

in the environment. The experience replay buffer will then contain the transition

(s_{i}, a_{i}, r_{i}, s_{i + 1})

. Secondly, the algorithm chooses N transitions

(s_{j}, a_{j}, r_{j}, s_{j + 1})

at random out of the buffer to construct a minibatch, used to update both critic and actor networks. By using the minibatch, the action

μ^{'} (s_{j + 1})

is generated by the actor target network

μ^{'}

and passed to the critic target network

Q^{'}

. And the critic network can use minibatch and

μ^{'}

to compute the target value

y_{j}

according to (22b). Thirdly, an optimizer will update the critic network

Q

to reduce the loss function. Then, actor network

μ

passes the action of the minibatch

a = μ (s_{j})

to critic network to calculate the gradient of the action a using

\nabla_{a} Q (s, a | θ^{Q}) |_{s = s_{j}, a = μ (s_{j})}

. Parameter

\nabla_{θ^{μ}} μ (s | θ^{μ}) |_{s = s_{j}}

can be calculated using its respective optimizer. These two gradients are then used to update the actor network:

\nabla_{θ^{μ}} J \approx \frac{1}{N} \sum_{j = 1}^{N} [\nabla_{a} Q (s, a | θ^{Q}) {|_{s = s_{j}, a = μ (s_{j} | θ^{μ})} \nabla_{θ^{μ}} μ (s | θ^{μ}) |}_{s = s_{j}}] .

(24)

At last, the target network of the actor and the target network of the critic will be updated softly by the agent with a small constant

ι

:

\begin{matrix} θ^{Q^{'}} \leftarrow θ^{Q} + (1 - ι) θ^{Q^{'}}, \\ θ^{μ^{'}} \leftarrow θ^{μ} + (1 - ι) θ^{μ^{'}} . \end{matrix}

(25)

The PER mechanism improves an agent’s learning efficiency by prioritizing important samples with larger TD-errors during the training phase [32]. A greater TD-error suggests that the sample is more significant for the agent’s learning. In the training process, however, samples with large TD-errors are seldom encountered. By elevating the likelihood of selecting these samples with high TD-errors, we can enhance the overall sample utilization, thereby refining the efficacy of the learning approach. The TD-error

T_{x}

quantifies the difference between the predicted Q-value and the target value, serving as a measure of the sample’s importance for learning:

T_{x} ≜ r_{i} + γ Q^{'} (s_{i + 1}, μ (s_{i + 1}) | θ^{Q^{'}}) - Q (s_{i}, a_{i} | θ^{Q}) .

(26)

To address the limitations of relying solely on TD-error, including the reduction in diversity and the issue of bias, we introduced two methods: stochastic sampling and importance sampling. Stochastic sampling guarantees that every sample from the experience replay buffer has a chance of being selected, which in turn improves the diversity of the training data. Specifically, the priority

p (x)

of the sample x determines the sampling probability of each transition, with higher priorities assigned to transitions that have larger TD-errors:

\begin{matrix} p (x) & = \frac{p_{x}^{α}}{\sum_{k} p_{k}^{α}}, \end{matrix}

(27a)

\begin{matrix} p_{x} & = | T_{x} | + ε, \end{matrix}

(27b)

where

α

determines the level of prioritization applied,

p_{x}

represents the priority of sample x, and

ε

is a tiny positive value.

The bias problem refers to the tendency of the agent to update empirical samples with high temporal-difference (TD) error, which alters the original probability distribution and introduces errors into the model. As a result, this may prevent the model from converging during neural network training. Therefore, to correct the bias introduced by prioritized sampling, importance sampling weights

w_{I S_x}

are incorporated:

w_{I S_x} = {(\frac{1}{N} \cdot \frac{1}{p (x)})}^{ϱ},

(28)

with N denoting the overall number of samples, and the exponent

ϱ

is typically modified to one from its starting point.

In traditional DDPG algorithms, uniform sampling from the experience replay buffer often results in inefficient use of samples, as all samples are treated equally regardless of their importance to the learning process. This can slow down convergence and reduce sample efficiency. To address this limitation, we propose an improved PER-DDPG algorithm that integrates the DDPG algorithm with the PER mechanism. The PER mechanism assigns higher sampling probabilities to transitions with larger TD errors, as these transitions are typically more informative for learning. By focusing on the most critical samples, PER-DDPG significantly improves sample efficiency and accelerates convergence [33,34]. The loss function (22a) is adjusted to incorporate importance sampling weights

w_{I S_x}

, which balance the prioritization of high TD-error transitions and the stability of the learning process:

L (θ^{Q}) = \frac{1}{N} \sum_{i = 1}^{N} [w_{I S_x} {(y_{i} - Q (s_{i}, a_{i} | θ^{Q}))}^{2}] .

(29)

The training process of the proposed PER-DDPG method is shown in Algorithm 1.

Algorithm 1 PER-DDPG-based task offloading algorithm.

1:: Initialize $μ (s | θ^{μ})$ and $Q (s, a | θ^{Q})$ with the parameters $θ^{μ}$ and $θ^{Q}$ , respectively.
2:: Initialize $θ^{μ^{'}} = θ^{μ}$ , and $θ^{Q^{'}} = θ^{Q}$ , respectively.
3:: Initialize the experience replay buffer M.
4:: for each episode $e = 1, 2, \dots, E$ do
5:: Reset system simulation parameters and obtain the first observation state $s_{1}$ .
6:: for $i = 1, 2, \dots, I$ do
7:: Normalizes $s_{i}$ to ${\hat{s}}_{i}$ .
8:: Obtain the action $a_{i}$ using the actor network $θ^{μ}$ with the noise $n_{i}$ .
9:: Execute $a_{i}$ , obtain the reward $r_{i}$ from Equation (21) and observe the next state $s_{i + 1}$ .
10:: Normalizes $s_{i + 1}$ to ${\hat{s}}_{i + 1}$ .
11:: if the buffer M is not full then
12:: Save the transition $(\hat{s_{i}}, a_{i}, r_{i}, {\hat{s}}_{i + 1})$ into M and calculate the priority $p_{i}$ of transition $(\hat{s_{i}}, a_{i}, r_{i}, {\hat{s}}_{i + 1})$ with (27b).
13:: else
14:: Replace a transition with the lowest priority in buffer M with $(\hat{s_{i}}, a_{i}, r_{i}, {\hat{s}}_{i + 1})$ , and set $p_{i} = {max}_{j < i} p_{j}$ .
15:: for $j = 1, 2, \dots, N$ do
16:: Sample the jth transition from M with probability $p (j)$ using (27a) and (27b).
17:: Update the priority $p_{j}$ of transition $(\hat{s_{j}}, a_{j}, r_{j}, {\hat{s}}_{j + 1})$ .
18:: end for
19:: Update $θ^{Q}$ of critic net by minimizing the loss (29).
20:: Update $θ^{μ}$ of actor net with the sampled policy gradient (24).
21:: Soft update critic target net and actor target net (25).
22:: end if
23:: end for
24:: end for

4. Simulation Results

This section uses numerical simulations to confirm the computational offloading performance of the UAV-assisted MEC system operating under covert communication conditions based on the improved PER-DDPG approach. The experiments were executed on a computer fitted with a 2.20 GHz Intel Core i9-13900HX CPU, a NVIDIA GeForce RTX 4060 graphics card, 16 GB RAM and a Windows operating system.

4.1. Parameters Setting

We set the ground area in the system as a two-dimensional square with a size of

100 \times 100 m^{2}

, where the UEs and a warden were randomly distributed. In addition, the Table 1 shows the other parameters involved in the experiment. These values were incorporated into the algorithm for multiple runs, and performance comparisons were conducted using the obtained average values.

4.2. Convergence Analysis

To find the optimal values of the key hyperparameters involved in the algorithm comparison, a series of experiments were firstly carried out. Figure 3 shows the convergence performance of the PER-DDPG algorithm with various learning rates. Assuming the actor network and critic network have different learning rates, it is clear that PER-DDPG attains optimal performance with learning rates of

α_{A c t o r} = 0.001

and

α_{C r i t i c} = 0.002

. Conversely, due to their large update steps, a local optimal convergence is achieved with higher learning rates of

α_{A c t o r} = 0.1

and

α_{C r i t i c} = 0.2

. Furthermore, convergence slows when the learning rates decrease to

α_{A c t o r} = 0.00001

and

α_{C r i t i c} = 0.00002

because DNN updates are more sluggish and require more iterations.

The discount factor

γ

is another parameter that influences the convergence of the proposed PER-DDPG algorithm. When

γ

is too large, the model may place excessive emphasis on long-term rewards while neglecting short-term gains, leading to slower convergence and diminished generalization ability. Conversely, if

γ

is too small, the model may overly focus on short-term rewards, resulting in short-sighted behavior and unstable learning. As illustrated in Figure 4, the PER-DDPG algorithm achieves optimal performance when

γ = 0.01

. Therefore, in subsequent experiments, the discount factor

γ

will be kept at 0.01.

Figure 5 shows the effect of the exploration parameter

σ_{e}

on the PER-DDPG algorithm. When

σ_{e} = 0.001

and

σ_{e} = 0.01

, PER-DDPG falls into a local optimum. As

σ_{e}

increases, the space of the random noise distribution generated and the action space that the agent can explore become larger, so the PER-DDPG algorithm performs best when

σ_{e} = 0.1

.

Figure 6 evaluates how the PER-DDPG algorithm performs without the PER mechanism or state normalization. On the one hand, without the PER module, the convergence speed of the training algorithm is slowed down, which leads to an inability to find the best unloading policy. On the other hand, the absence of the deep neural network (DNN) produces large outputs due to large values in the state space that are not normalized, whereupon the algorithm requires more training iterations to converge and the training process becomes highly unstable.

4.3. Performance Comparison

In order to verify the superiority of the proposed PER-DDPG algorithm, three task offloading algorithms are selected for simulation and comparison under the same application scenarios. The first is the DQN algorithm, which addresses discrete action problems; the second is the DDPG algorithm, which handles a continuous action space; and the third is the PER-DDPG algorithm. Figure 7 illustrates the variation in delay and the correct detection rate of the warden over different algorithms with the number of training sessions, when the computational task size is 100 Mb. As shown in the figure, while all three algorithms are able to converge eventually, DQN cannot accurately determine the best offloading strategy. This limitation arises from DQN’s inability to effectively explore the space between discrete actions. In contrast, the other two algorithms, which are capable of exploring a continuous action space, can more effectively identify the optimal strategy. However, the PER-DDPG algorithm outperforms the DDPG algorithm when it comes to convergence. The addition of the PER mechanism in DDPG enables more frequent replay of both very successful and very unsuccessful experiences, and these experience samples carry higher learning value, resulting in faster convergence and improved performance. After convergence, the average reward obtained by PER-DDPG was −49.32, while the average reward for DDPG was −58.87, and for DQN, it was −106.1. Compared to DDPG, the average reward obtained by PER-DDPG increased by more than 16.22%, and when compared to DQN, the average reward obtained by PER-DDPG also increased by more than 53.52%. Additionally, the convergence speed of PER-DDPG was the fastest among the three algorithms.

The detection probabilities of the warden obtained by the three algorithms are illustrated in Figure 8. After convergence, we observe that the PER-DDPG and DDPG algorithms achieve warden average detection error rates of 98.24% and 96.04%, respectively, both exceeding the 95% threshold. In contrast, the DQN algorithm shows a lower average detection error rate of 93.55%, which does not meet the 95% threshold. Additionally, consistent with Figure 7, PER-DDPG converges the fastest among the three algorithms.

Figure 9 illustrates the changes in various evaluation indicators of the three algorithms as the

f_{U E}

varies. As can be seen in Figure 9a, we can observe the differences in the convergence performance among the three algorithms at different values of

f_{U E}

. Previous experiments have demonstrated that the proposed PER-DDPG algorithm’s convergence performance is superior to DDPG and DQN at

f_{U E}

= 0.2 GHz, while all three algorithms gradually degrade. Notably, as

f_{U E}

increases, the reward values for all three algorithms also rise, although the overall downward trend remains unchanged. This phenomenon can be attributed to the ability of PER-DDPG and DDPG to explore continuous action sets, a capability that DQN lacks. However, it is evident that the performance gap between the three algorithms is narrowing, with DQN showing the largest increase. Figure 9b provides further insight into this trend. As

f_{U E}

increases, the task offloading rate for all three algorithms decreases. This decline is attributed to the enhanced computing power of the UEs, which leads to a greater tendency to perform task calculations locally. Consequently, this reduces the delay difference between local task execution and offloading tasks to the UAV-BS, ultimately resulting in a decrease in the overall system delay. Among the three algorithms, DQN exhibits the smallest change in offloading rate. This is due to DQN’s difficulty in navigating discrete action spaces, preventing it from identifying better offloading solutions. The primary factor influencing DQN’s final convergence performance in Figure 9a is the increase in

f_{U E}

, which contributes to an overall increase in reward value, even if DQN’s performance is suboptimal. Additionally, Figure 9c shows that the detection error rate of the warden remains relatively stable as

f_{U E}

increases. This stability is due to the fact that, although the proportion of tasks executed locally by the UE has risen, its transmission power has not changed, which does not significantly impact the warden’s detection behavior.

Figure 10 and Figure 11 show the performance changes of the three algorithms under different computational task sizes and the number of UEs, respectively. The results show that the proposed PER-DDPG algorithm consistently outperforms DDPG and DQN in different scenarios. As the state space becomes more complex with the increase in tasks or the increase in UEs, DQN has difficulty in exploring and optimizing effectively, resulting in either a significant drop in reward value (for task size increases) or strong instability (for UE number increases). In contrast, PER-DDPG and DDPG, with their continuous action space design and more efficient exploration strategies, can better adapt to the increasing complexity and maintain more stable performance. Both figures highlight a common limitation of DQN: it cannot effectively handle the growth of state dimension. Whether due to the increase of computational task size or the increase of UE number, DQN’s discrete action space and low sample efficiency make it difficult to explore and optimize in high-dimensional environments. Compared to the more stable performance of PER-DDPG and DDPG, DQN’s reward values fluctuate or drop significantly, which fully illustrates this limitation. In addition, the PER mechanism of PER-DDPG further enhances its ability to adapt to different task scales and UE numbers, demonstrating excellent performance and stability.

In summary, the proposed PER-DDPG algorithm outperforms the other two algorithms. Consequently, after integrating optimization of resource allocation, UAV-BS movement, and jammer transmit power, this algorithm achieves a relatively small maximum processing delay for the system and a low correct detection rate by the warden. At the same time, while deploying the proposed system with current UAV and MEC technologies is feasible, it still faces challenges such as limited UAV battery life, computational resource constraints, and the need for robust communication protocols in dynamic environments. Additionally, while covert communication enhances privacy and security by preventing eavesdropping, it also raises ethical concerns regarding potential misuse. Advances in energy-efficient hardware, lightweight algorithms, and adaptive communication strategies, along with strict regulatory frameworks, will be critical for the real-world deployment of such systems.

5. Conclusions

In this research, we investigated a strategy for task offloading in UAV-assisted covert edge computing. Firstly, we established a UAV-assisted MEC task offloading model that adheres to covert communication requirements. An optimization problem was formulated with the objective to minimize system delay and detection accuracy of the warden by jointly optimizing resource allocation, UAV motion, and jammer signal strength. Secondly, we introduced a novel PER-DDPG algorithm, which enables UAVs to efficiently perform task offloading calculations while ensuring communication security. Finally, we perform a set of experiments to validate the feasibility of the proposed algorithm and demonstrate its superior performance in comparison to several benchmark algorithms. For future work, we plan to extend our approach to more complex and dynamic scenarios, such as multi-UAV systems with collaborative task offloading, the incorporation of adversarial learning techniques to address intelligent adversaries, and the integration of advanced MEC frameworks, including federated learning and blockchain technology, to enhance security, privacy, and scalability in UAV-assisted covert edge computing.

Author Contributions

Conceptualization, Z.H. and D.Z.; methodology, Z.H. and D.Z.; software, D.Z.; validation, Z.H., D.Z., and C.S.; formal analysis, Z.H. and D.Z.; investigation, Z.H. and D.Z.; resources, Z.H. and D.Z.; data curation, D.Z. and C.S.; writing—original draft preparation, D.Z.; writing—review and editing, Z.H. and T.W.; visualization, D.Z. and L.L.; supervision, Z.H.; project administration, Z.H. and T.W.; funding acquisition, C.S. and L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the National Natural Science Foundation of China under Grant 52302505, the Shaanxi Key Research and Development Program of China under Grant 2023-YBGY-027, the Special scientific research plan project of Shaanxi Provincial department of education Project Grants 23JK0477, and the Shaanxi Provincial Youth Natural Science Foundation Grants 2024JC-YBQN-0660.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Angel, N.A.; Ravindran, D.; Vincent, P.M.D.R.; Srinivasan, K.; Hu, Y.C. Recent Advances in Evolving Computing Paradigms: Cloud, Edge, and Fog Technologies. Sensors 2022, 22, 196. [Google Scholar] [CrossRef] [PubMed]
Feng, C.; Han, P.; Zhang, X.; Yang, B.; Liu, Y.; Guo, L. Computation Offloading in Mobile Edge Computing Networks: A Survey. J. Netw. Comput. Appl. 2022, 202, 103366. [Google Scholar] [CrossRef]
Huda, S.A.; Moh, S. Survey on Computation Offloading in UAV-Enabled Mobile Edge Computing. J. Netw. Comput. Appl. 2022, 201, 103341. [Google Scholar] [CrossRef]
Li, M.; Cheng, N.; Gao, J.; Wang, Y.; Zhao, L.; Shen, X. Energy-Efficient UAV-Assisted Mobile Edge Computing: Resource Allocation and Trajectory Optimization. IEEE Trans. Veh. Technol. 2020, 69, 3424–3438. [Google Scholar] [CrossRef]
Diao, X.; Zheng, J.; Cai, Y.; Wu, Y.; Anpalagan, A. Fair Data Allocation and Trajectory Optimization for UAV-Assisted Mobile Edge Computing. IEEE Commun. Lett. 2019, 23, 2357–2361. [Google Scholar] [CrossRef]
Lin, N.; Tang, H.; Zhao, L.; Wan, S.; Hawbani, A.; Guizani, M. A PDDQNLP Algorithm for Energy Efficient Computation Offloading in UAV-Assisted MEC. IEEE Trans. Wirel. Commun. 2023, 22, 8876–8890. [Google Scholar] [CrossRef]
Selim, M.M.; Rihan, M.; Yang, Y.; Ma, J. Optimal Task Partitioning, Bit Allocation and Trajectory for D2D-assisted UAV-MEC Systems. Peer-Peer Netw. Appl. 2021, 14, 215–224. [Google Scholar] [CrossRef]
Ouyang, J.; Pan, Y.; Xu, B.; Lin, M.; Zhu, W.P. Achieving Secrecy Energy Efficiency Fairness in UAV-Enabled Multi-User Communication Systems. IEEE Wirel. Commun. Lett. 2022, 11, 918–922. [Google Scholar] [CrossRef]
Tsao, K.Y.; Girdler, T.; Vassilakis, V.G. A Survey of Cyber Security Threats and Solutions for UAV Communications and Flying Ad-Hoc Networks. Hoc Netw. 2022, 133, 102894. [Google Scholar] [CrossRef]
Yoon, K.; Park, D.; Yim, Y.; Kim, K.; Yang, S.K.; Robinson, M. Security Authentication System Using Encrypted Channel on UAV Network. In Proceedings of the 2017 First IEEE International Conference on Robotic Computing (IRC), Taichung, Taiwan, 10–12 April 2017; pp. 393–398. [Google Scholar] [CrossRef]
Wang, H.M.; Zhang, X.; Jiang, J.C. UAV-Involved Wireless Physical-Layer Secure Communications: Overview and Research Directions. IEEE Wirel. Commun. 2019, 26, 32–39. [Google Scholar] [CrossRef]
Zhang, G.; Wu, Q.; Cui, M.; Zhang, R. Securing UAV Communications via Joint Trajectory and Power Control. IEEE Trans. Wirel. Commun. 2019, 18, 1376–1389. [Google Scholar] [CrossRef]
Hua, M.; Wang, Y.; Wu, Q.; Dai, H.; Huang, Y.; Yang, L. Energy-Efficient Cooperative Secure Transmission in Multi-UAV-Enabled Wireless Networks. IEEE Trans. Veh. Technol. 2019, 68, 7761–7775. [Google Scholar] [CrossRef]
Jiang, X.; Chen, X.; Tang, J.; Zhao, N.; Zhang, X.Y.; Niyato, D.; Wong, K.K. Covert Communication in UAV-Assisted Air-Ground Networks. IEEE Wirel. Commun. 2021, 28, 190–197. [Google Scholar] [CrossRef]
Chen, X.; An, J.; Xiong, Z.; Xing, C.; Zhao, N.; Yu, F.R.; Nallanathan, A. Covert Communications: A Comprehensive Survey. IEEE Commun. Surv. Tutor. 2023, 25, 1173–1198. [Google Scholar] [CrossRef]
Yan, S.; Cong, Y.; Hanly, S.V.; Zhou, X. Gaussian Signalling for Covert Communications. IEEE Trans. Wirel. Commun. 2019, 18, 3542–3553. [Google Scholar] [CrossRef]
Zeng, Y.; Wu, Q.; Zhang, R. Accessing From the Sky: A Tutorial on UAV Communications for 5G and Beyond. Proc. IEEE 2019, 107, 2327–2375. [Google Scholar] [CrossRef]
Yan, S.; Hanly, S.V.; Collings, I.B. Optimal Transmit Power and Flying Location for UAV Covert Wireless Communications. IEEE J. Sel. Areas Commun. 2021, 39, 3321–3333. [Google Scholar] [CrossRef]
Chen, Z.; Yan, S.; Zhou, X.; Shu, F.; Ng, D.W.K. Intelligent Reflecting Surface-Assisted Passive Covert Wireless Detection. IEEE Trans. Veh. Technol. 2024, 73, 2954–2959. [Google Scholar] [CrossRef]
Mao, H.; Liu, Y.; Xiao, Z.; Han, Z.; Xia, X.G. Energy Efficient Defense Against Cooperative Hostile Detection and Eavesdropping Attacks for UAV-Aided Short-Packet Transmissions. IEEE Trans. Veh. Technol. 2024, 1–14. [Google Scholar] [CrossRef]
Jiang, X.; Yang, Z.; Zhao, N.; Chen, Y.; Ding, Z.; Wang, X. Resource Allocation and Trajectory Optimization for UAV-Enabled Multi-User Covert Communications. IEEE Trans. Veh. Technol. 2021, 70, 1989–1994. [Google Scholar] [CrossRef]
Zhou, X.; Yan, S.; Hu, J.; Sun, J.; Li, J.; Shu, F. Joint Optimization of a UAV’s Trajectory and Transmit Power for Covert Communications. IEEE Trans. Signal Process. 2019, 67, 4276–4290. [Google Scholar] [CrossRef]
Arzykulov, S.; Celik, A.; Nauryzbayev, G.; Eltawil, A.M. Artificial Noise and RIS-Aided Physical Layer Security: Optimal RIS Partitioning and Power Control. IEEE Wirel. Commun. Lett. 2023, 12, 992–996. [Google Scholar] [CrossRef]
Zhou, X.; Yan, S.; Shu, F.; Chen, R.; Li, J. UAV-Enabled Covert Wireless Data Collection. IEEE J. Sel. Areas Commun. 2021, 39, 3348–3362. [Google Scholar] [CrossRef]
Du, H.; Niyato, D.; Xie, Y.A.; Cheng, Y.; Kang, J.; Kim, D.I. Performance Analysis and Optimization for Jammer-Aided Multiantenna UAV Covert Communication. IEEE J. Sel. Areas Commun. 2022, 40, 2962–2979. [Google Scholar] [CrossRef]
Wang, M.; Yao, Y.; Xia, B.; Chen, Z.; Wang, J. Covert and Reliable Short-Packet Communications Over Fading Channels Against a Proactive Warder: Analysis and Optimization. IEEE Trans. Wirel. Commun. 2024, 23, 3932–3945. [Google Scholar] [CrossRef]
Ji, X.; Zhu, R.; Zhang, Q.; Li, C.; Cao, D. Enhancing Covert Communication in OOK Schemes by Phase Deflection. IEEE Trans. Inf. Forensics Secur. 2024, 19, 9775–9788. [Google Scholar] [CrossRef]
Al-Hourani, A.; Kandeepan, S.; Lardner, S. Optimal LAP Altitude for Maximum Coverage. IEEE Wirel. Commun. Lett. 2014, 3, 569–572. [Google Scholar] [CrossRef]
Liu, Z.; Li, Z.; Gong, Y.; Wu, Y.C. RIS-Aided Cooperative Mobile Edge Computing: Computation Efficiency Maximization via Joint Uplink and Downlink Resource Allocation. IEEE Trans. Wirel. Commun. 2024, 23, 11535–11550. [Google Scholar] [CrossRef]
Lillicrap, T. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic Policy Gradient Algorithms. In Proceedings of the 31st International Conference on Machine Learning, Bejing, China, 22–24 June 2014; Xing, E.P., Jebara, T., Eds.; Volume 32, Proceedings of Machine Learning Research. pp. 387–395. [Google Scholar]
Zhu, M.; Tian, K.; Wen, Y.Q.; Cao, J.N.; Huang, L. Improved PER-DDPG based nonparametric modeling of ship dynamics with uncertainty. Ocean. Eng. 2023, 286, 115513. [Google Scholar] [CrossRef]
Tang, X.; Zhou, H.; Wang, F.; Wang, W.; Lin, X. Longevity-Conscious Energy Management Strategy of Fuel Cell Hybrid Electric Vehicle Based on Deep Reinforcement Learning. Energy 2022, 238, 121593. [Google Scholar] [CrossRef]
Kong, X.; Lu, W.; Wu, J.; Wang, C.; Zhao, X.; Hu, W.; Shen, Y. Real-Time Pricing Method for VPP Demand Response Based on PER-DDPG Algorithm. Energy 2023, 271, 127036. [Google Scholar] [CrossRef]

Figure 1. Unmanned aerial vehicle (UAV) -assisted mobile edge computing (MEC) scenario.

Figure 2. Deep Deterministic Policy Gradient (DDPG) algorithm structure.

Figure 3. Convergence performance with varying learning rates of the PER-DDPG algorithm.

Figure 4. Convergence result using varying discount factors.

Figure 5. Convergence performance with varying exploration parameters.

Figure 6. Performance without PER mechanism or state normalization.

Figure 7. Performance of various algorithms with task size of D = 100 Mb.

Figure 8. Detection error rate of warden in different algorithms.

Figure 9. Various indicators of three algorithms under different computing capabilities of UEs. (a) Convergence performance. (b) Offloading ratio. (c) Detection error rate of warden.

Figure 10. Performance of various algorithms with different task size.

Figure 11. Performance of various algorithms as number of UEs varies from 1 to 10.

Table 1. Main parameters and assumptions.

Parameter	Descriptions	Value
K	Number of UEs	4
T	Entire time period	320 s
I	Time slots	40
$h_{1}$	UAV-BS flight height	100 m
$h_{2}$	Jammer flight height	110 m
$M_{U A V}$	UAV weight	9.65 kg
$v_{m a x}$	Maximum flight speed	50 m/s
$t_{f l y}$	Flight time	1 s
$f_{c}$	Carrier frequency	2GHz
$μ_{L o S}$	Path loss factor for LoS links	3 dB
$μ_{N L o S}$	Path loss factor for NLoS links	23 dB
B	Communications bandwidth	1 MHz
$σ^{2}$	Variance of AWGN	−100 dBm
$ϵ$	Allowed correct detection rate	0.05
$P_{k}$	UE transmit power	0.1 W
$E_{B S}$	UAV-BS battery	500 kJ
s	Required CPU cycles per bit	1000 cycles/bit
$f_{U E}$	UE computing capability	0.2 GHz
$f_{U A V}$	UAV-BS computing capability	1.2 GHz

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hu, Z.; Zhou, D.; Shen, C.; Wang, T.; Liu, L. Task Offloading Strategy for UAV-Assisted Mobile Edge Computing with Covert Transmission. Electronics 2025, 14, 446. https://doi.org/10.3390/electronics14030446

AMA Style

Hu Z, Zhou D, Shen C, Wang T, Liu L. Task Offloading Strategy for UAV-Assisted Mobile Edge Computing with Covert Transmission. Electronics. 2025; 14(3):446. https://doi.org/10.3390/electronics14030446

Chicago/Turabian Style

Hu, Zhijuan, Dongsheng Zhou, Chao Shen, Tingting Wang, and Liqiang Liu. 2025. "Task Offloading Strategy for UAV-Assisted Mobile Edge Computing with Covert Transmission" Electronics 14, no. 3: 446. https://doi.org/10.3390/electronics14030446

APA Style

Hu, Z., Zhou, D., Shen, C., Wang, T., & Liu, L. (2025). Task Offloading Strategy for UAV-Assisted Mobile Edge Computing with Covert Transmission. Electronics, 14(3), 446. https://doi.org/10.3390/electronics14030446

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Task Offloading Strategy for UAV-Assisted Mobile Edge Computing with Covert Transmission

Abstract

1. Introduction

2. System Model

2.1. Communications Model

2.2. Computational Model

2.3. Binary Hypothesis Testing at Warden

2.4. Problem Formulation

3. PER-DDPG-Based Task Offloading Optimization

3.1. Construction of MDP

3.1.1. State Space

3.1.2. Action Space

3.1.3. Reward Function

3.2. PER-DDPG-Based Solution

4. Simulation Results

4.1. Parameters Setting

4.2. Convergence Analysis

4.3. Performance Comparison

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI