Security-Aware Task Offloading Using Deep Reinforcement Learning in Mobile Edge Computing Systems

Lu, Haodong; He, Xiaoming; Zhang, Dengyin

doi:10.3390/electronics13152933

Open AccessArticle

Security-Aware Task Offloading Using Deep Reinforcement Learning in Mobile Edge Computing Systems

by

Haodong Lu

¹

,

Xiaoming He

² and

Dengyin Zhang

^2,*

¹

College of Telecommunications and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing 210003, China

²

School of Internet of Things, Nanjing University of Posts and Telecommunications, Nanjing 210003, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(15), 2933; https://doi.org/10.3390/electronics13152933

Submission received: 21 June 2024 / Revised: 18 July 2024 / Accepted: 22 July 2024 / Published: 25 July 2024

(This article belongs to the Special Issue Network Security Management in Heterogeneous Networks)

Download

Browse Figures

Versions Notes

Abstract

:

With the proliferation of intelligent applications, mobile devices are increasingly handling computation-intensive tasks but often struggle with limited computing power and energy resources. Mobile Edge Computing (MEC) offers a solution by enabling these devices to offload computation-intensive tasks to resource-rich edge servers, thus reducing processing latency and energy consumption. However, existing task-offloading strategies often neglect critical security concerns. In this paper, we propose a security-aware task-offloading framework that utilizes Deep Reinforcement Learning (DRL) to solve these challenges. Our framework is designed to minimize the latency of task accomplishment and energy consumption while ensuring data security. We model system utility as a Markov Decision Process (MDP) and design a Proximal Policy Optimization (PPO)-based algorithm to derive optimal offloading strategies. Experimental results demonstrate that the proposed algorithm outperforms traditional methods regarding task execution latency and energy consumption.

Keywords:

task offloading; deep reinforcement learning; mobile edge computing

1. Introduction

The rapid advancement of Artificial Intelligence (AI) has facilitated the widespread proliferation of intelligent applications, including personalized recommendations [1], face recognition [2], and keyboard Emoji prediction [3]. These applications, primarily based on Deep Learning (DL), require substantial computational resources. Although mobile devices have become more powerful, they still lack sufficient capacity to execute complex DL models locally. To address these challenges, Mobile Edge Computing (MEC), also known as multi-access edge computing, has become a promising solution [4,5]. MEC allows mobile devices to offload computing tasks to nearby edge servers, significantly reducing task processing latency and energy consumption [6].

In MEC, two critical issues related to task offloading need to be resolved. The first issue is determining whether each mobile device should offload its tasks to an edge server. Once the decision to offload is made, the next step is selecting the appropriate edge server. Various algorithms have been proposed to optimize these decisions. Xu et al. [7] introduced an algorithm to optimize task offloading in MEC by balancing latency and risk management. Ding et al. [8] explored a Non-Orthogonal Multiple Access (NOMA)-assisted MEC scenario, jointly optimizing power and time allocation to reduce energy consumption. Additionally, Bi et al. [9] proposed a strategy for a wireless-powered MEC scenario that optimizes offloading decisions and power transfer.

Despite these advancements, the security aspects of task offloading have not been sufficiently addressed [10,11,12,13]. The data transmitted between mobile devices and edge servers are susceptible to various security threats, such as interception [14] and unauthorized access [15]. For example, without robust security measures, cyber attackers could potentially intercept sensitive data or manipulate the task execution process [16]. Most existing offloading algorithms focus primarily on performance metrics like latency and energy consumption, often overlooking crucial security considerations. This gap underscores the need for a comprehensive approach that integrates robust security measures into the offloading process to ensure data integrity and privacy [17].

In this paper, we explore security-aware task offloading in MEC systems using a DRL-based approach. Our primary objective is to minimize the task execution time and energy consumption while ensuring data security. To achieve this, we model system utility as a weighted sum of task execution latency and energy consumption. We have designed a task-offloading algorithm utilizing Proximal Policy Optimization (PPO) to achieve a near-optimal computational offloading strategy that minimizes system utility and incorporates security considerations into the decision-making process. Through this integration, our approach provides a more robust and efficient MEC system. The main contributions of our work can be summarized as follows:

Task Offloading for MEC Systems: We undertake a thorough investigation of task offloading within MEC systems, focusing on the security aspects of data transmission between servers and mobile devices.
DRL-based Task-Offloading Algorithm: We model system utility as a Markov Decision Process (MDP) and introduce a novel task-offloading algorithm using a DRL approach. This algorithm dynamically learns and adapts to the MEC environment to optimize task-offloading decisions.
Performance Evaluation: Our results indicate that our proposal significantly outperforms traditional methods in minimizing task execution latency and energy consumption while maintaining high levels of data security.

The remainder of this paper is structured as follows. Section 2 reviews related work on task offloading in MEC systems. Section 3 details the system model and problem formulation. Section 4 describes the proposed DRL-based algorithm for security-aware task offloading. Section 5 validates the performance of the proposed offloading algorithm. Finally, the conclusion is presented in Section 6.

2. Related Work

2.1. Task Offloading in MEC

Task offloading in MEC systems has recently drawn significant attention from industry and academia. Lyu et al. [18] proposed an asymptotically optimal task-offloading approach for MEC employing a quantized dynamic programming algorithm to enhance scalability with minimal extra energy cost. Eshraghi et al. [19] investigated joint offloading decisions and resource allocation in mobile cloud networks and proposed the TORAUC algorithm, which optimizes offloading decisions and resource allocation to minimize system costs. Tang et al. [20] introduced a model-free DRL-based distributed algorithm for task offloading in MEC, incorporating LSTM, dueling DQN, and double-DQN techniques to minimize long-term costs, significantly reducing task drop rates and average latency compared to existing algorithms.

Wang et al. [21] developed a decentralized multi-user offloading framework, DEBO, for MEC. This framework optimizes user rewards under network latency by addressing unknown stochastic system-side information, achieving near-optimal performance with sub-linear regret across various scenarios. Liu et al. [22] proposed COFE, a dependent task-offloading framework for MEC and cloud systems, which adaptively assigns computation-intensive tasks with dependent constraints to improve the user experience, using a heuristic ranking-based algorithm to minimize the average makespan and reduce deadline violations. Wang et al. [23] explored multiobjective optimization in a multi-user and multi-server MEC scenario, focusing on joint task offloading, power assignment, and resource allocation. They developed an evolutionary algorithm to minimize response latency, energy consumption, and cost, significantly enhancing user offloading benefits. Fang et al. [24] introduced a dynamic offloading decision algorithm, named DODA-DT, for MEC that employs a DRL-based algorithm to reduce the task execution time and energy consumption across multiple devices under varying wireless conditions. Tan et al. [25] optimized the task offloading and allocation of physical resources in collaborative MEC networks using OFDMA, proposing a two-level alternation method that combines a heuristic algorithm for offloading and collaboration decisions with DRL for optimizing resource allocation.

2.2. Security-Aware Task Offloading in MEC

Task requests often involve sensitive data, making security and privacy concerns critical when offloading such data to edge servers for processing [26]. To address these challenges, Samy et al. [27] developed a blockchain-based architecture to enhance security in task offloading within MEC systems and implemented a DRL-based algorithm to optimize both energy and time costs in scenarios involving multiple users and tasks. Elgendy et al. [28] developed a multi-user resource allocation and task-offloading model that incorporates AES encryption for data security, optimizing system efficiency in terms of time and energy consumption. Wu et al. [29] investigated secure offloading for a wireless-powered MEC system, proposing a physical layer security-assisted scheme where a power beacon also acts as a cooperative jammer. This scheme maximizes secrecy energy efficiency by optimizing transmit power, time allocation, and task partitioning while satisfying secrecy and energy constraints. Asheralieva et al. [30] employed Lagrange coded computing to facilitate fast and secure offloading of request tasks in MEC systems. This method ensures efficient load and bandwidth allocation while promoting timely task completion. For a detailed comparison of our work with existing studies, please refer to Table 1.

While the above studies have employed DRL [31], blockchain [32], and other methods to optimize task offloading and protect data security, there remains significant potential for further exploration in addressing the challenges of secure task offloading in MEC systems.

3. System Model and Problem Formulation

This section introduces the MEC system model for task offloading. Specifically, the system consists of multiple mobile devices, denoted by

M = {1, 2, \dots, m, \dots, M}

, and multiple edge servers, represented by

N = {1, 2, \dots, n, \dots, N}

. The set of tasks to be executed is indicated by

X = {1, 2, \dots, x, \dots, X}

. The MEC system operates in episodes, and each is subdivided into time slots

T = {1, 2, \dots, t, \dots, T}

with a duration of

Δ

seconds [33]. Our focus is on the computational tasks from mobile devices, each of which is characterized as indivisible and capable of being processed either locally on the mobile device or offloaded to one of the edge servers. For offloaded tasks, data encryption is implemented to secure the data during transmission. Subsequent sections will detail the specific system models for the mobile device and edge server, as illustrated in Figure 1.

3.1. Communication Model

This subsection introduces the communication model used during task offloading. We assume that each device can only offload tasks to a single edge server that falls within its wireless coverage area at a given time slot [20]. The connectivity between a mobile device m and an edge server n at time t is represented by

ζ_{m, n} (t)

, where

ζ_{m, n} (t) = 1

indicates that mobile device m is within the communication range of edge server n, and

ζ_{m, n} (t) = 0

otherwise. Task transmission utilizes Orthogonal Frequency Division Multiple Access (OFDMA). Accordingly, the transmission rate, denoted by

r_{m, n} (t)

, is defined as follows:

r_{m, n} (t) = b_{m}^{n} {log}_{2} (1 + \frac{p_{m} g_{m}}{b_{m}^{n} σ^{2}}),

(1)

where

b_{m}^{n}

is the channel bandwidth,

g_{m}

is the channel gain,

p_{m}

is the uplink transmission power of mobile device m, and

σ^{2}

denotes the SINR in the wireless link.

During the task-offloading process, a mobile device consumes communication bandwidth

b_{m}^{n}

when offloading a task to an edge server. If the required bandwidth is less than the currently available bandwidth

B_{n}^{a v a} (t)

, the task is offloaded immediately. Otherwise, the task waits until sufficient bandwidth becomes available.

3.2. Security Model

When offloading computational tasks to the edge server, the offloading data may be susceptible to various types of network attacks [34]. This paper proposes encrypting the offloaded data to ensure data security in edge task offloading. The Advanced Encryption Standard (AES) is utilized to encrypt the transmission of task data [35]. The AES is chosen for its robust security features, efficiency, and widespread acceptance as a standard for data encryption. Its symmetric encryption mechanism ensures fast encryption and decryption processes, which is critical for real-time task-offloading scenarios where low latency is essential.

For mobile devices requiring task offloading, a 128-bit AES key is first generated to encrypt the tasks before offloading them to the server. The edge server then uses the same key to decrypt the received encrypted data and execute the tasks. Upon completion, the server returns the task results to the mobile device. To formalize the encryption decision for the offloaded task, we introduce the variable

α_{m} \in {0, 1}

. Specifically,

α_{m} = 0

indicates that the offloaded task does not require encryption, while

α_{m} = 1

indicates that the offloaded task must be encrypted before transmission to the edge server.

3.3. Computing Model

Based on the communication and security models, we introduce the computation model that governs task-offloading requests on mobile devices within the MEC system. An arriving task x at time slot t is represented by

Γ_{m}^{x} (t) = \{I_{m}^{x} (t), λ_{m}^{x} (t), ℓ_{m}^{x} (t)\}

, where

I_{m}^{x} (t)

denotes the data size of the offloading task,

λ_{m}^{x} (t)

specifies the CPU cycles required to complete the task, and

ℓ_{m}^{x} (t)

defines the task’s execution deadline. The parameters

I_{m}^{x} (t)

and

λ_{m}^{x} (t)

are determined by specific application needs and are typically provided by the program vendor. Each mobile device selects the optimal execution destination for an arriving computation task, choosing either local processing or offloading to the edge server. We formulate two computational modes based on these operational dynamics: mobile device computing for local processing and edge server computing for offloaded tasks.

3.3.1. Mobile Device Computing

Transmission latency is negligible when a task request

Γ_{m}^{x}

is processed locally. Therefore, the focus is only on the local execution latency as well as the energy consumption, which are the primary concerns in this scenario. These metrics for mobile device m can be calculated as follows:

D_{m, x}^{l o c a l} (t) = \frac{λ_{m}^{x} (t)}{f_{m}},

(2)

E_{m, x}^{l o c a l} (t) = ξ_{m} λ_{m}^{x} (t),

(3)

where

f_{m}

represents the CPU frequency of the mobile device, and

ξ_{m}

denotes the energy consumption per CPU cycle.

3.3.2. Edge Server Computing

Edge server computing involves completely offloading the task to servers. To this end, the mobile device m first transmits the task request to the edge server n. Once the task is received, the edge server begins processing it. Once the task is completed, the results are sent back to the mobile device. Since the time required to return results is considerably shorter than the time needed for uploading tasks, we exclude the return time from our calculations [7]. In this context, task execution latency comprises both transmission latency and processing latency. The processing latency includes the computational latency at the edge server side, as well as the latency for data encryption and decryption, expressed as

D_{m, x}^{c o m p} (t) = \frac{η_{m, x}}{f_{m}} + \frac{δ_{n, x}}{f_{n}} + \frac{λ_{m}^{x} (t)}{f_{n}},

(4)

where

η_{m, x}

and

δ_{n, x}

present the CPU cycles required for encrypting and decrypting the data, respectively.

f_{n}

represents the CPU frequency at the edge server.

Based on the data transmission rate defined in Equation (1), the transmission latency is calculated as follows:

D_{m, x}^{c o m m} (t) = \frac{I_{m}^{x} (t)}{r_{m, n} (t)} .

(5)

Combining Equations (4) and (5), the total execution latency for task offloading is expressed as

D_{m, x}^{e d g e} (t) = D_{m, x}^{c o m p} (t) + D_{m, x}^{c o m m} (t) .

(6)

For the offloading strategy, energy consumption primarily comprises the energy used for task transmission and the energy required for data encryption. These components are formulated as follows:

E_{m, x}^{c o m p} (t) = ξ_{m} η_{m, x},

(7)

E_{m, x}^{c o m m} (t) = p_{m} \frac{I_{m}^{x} (t)}{r_{m, n} (t)} .

(8)

By integrating Equations (7) and (8), the total energy consumed for task offloading can be represented as follows:

E_{m, x}^{e d g e} (t) = E_{m, x}^{c o m p} (t) + E_{m, x}^{c o m m} (t) .

(9)

In summary, the total latency and energy consumption for task x on mobile device m are defined as follows:

D_{m, x}^{t o t a l} (t) = [(1 - β_{m, x}) D_{m, x}^{l o c a l} (t) + β_{m, x} D_{m, x}^{e d g e} (t)],

(10)

E_{m, x}^{t o t a l} (t) = [(1 - β_{m, x}) E_{m, x}^{l o c a l} (t) + β_{m, x} E_{m, x}^{e d g e} (t)],

(11)

where

D_{m, x}^{l o c a l}

and

D_{m, x}^{e d g e}

represent the local and remote execution latencies, and

E_{m, x}^{l o c a l}

and

E_{m, x}^{e d g e}

represent local and remote energy consumption, respectively.

β_{m, x}

is a binary indicator, where

β_{m, x} = 0

indicates local execution and

β_{m, x} = 1

indicates remote execution.

3.4. Problem Formulation

In this paper, we aim to minimize the system costs related to task offloading in MEC systems by taking into account both the task completion time and energy consumption. To achieve this, we define the total system costs as follows:

\begin{matrix} \min & \frac{1}{| T |} \sum_{t = 1}^{T} \sum_{m = 1}^{M} (D_{m, x}^{t o t a l} (t) + λ E_{m, x}^{t o t a l} (t)) \\ s . t & 0 < f_{n} (t) \leq F_{n}^{a v a} (t) (C 1) \\ 0 < f_{m} (t) \leq F_{m}^{a v a} (t) (C 2) \\ 0 < b_{m}^{n} \leq B_{n}^{a v a} (t) (C 3) \\ D_{m, x} (t) \leq ℓ_{m}^{x} (t) (C 4) \\ β_{m, x} \in {0, 1}, \forall m (C 5) \end{matrix}

(12)

where

λ

represents the weight coefficient of energy consumption, indicating the relative importance of execution latency and energy consumption across different tasks. Constraint (C1) ensures that the computing resources required for each task

f_{n} (t)

do not exceed the mobile device’s available resources. Constraint (C2) specifies that local computation for each task

f_{m} (t)

remains within the mobile device’s capabilities. Constraint (C3) guarantees that the bandwidth utilized for offloaded tasks does not surpass the edge server’s available bandwidth. Constraint (C4) imposes time limits on task processing to ensure timely completion. Finally, Constraint (C5) ensures that the offloading decision

β_{m}

is binary, distinctly classifying tasks as either offloaded or executed locally.

4. DRL-Based Offloading Algorithm

To address the optimization challenge outlined in Section 3.4, efficient task offloading from mobile devices to edge servers is essential within MEC systems. Task offloading, however, is proven to be an NP-hard problem [34]. Recent advancements in Deep Reinforcement Learning (DRL) have demonstrated its superior capabilities in various model-free control problems, making it well suited for our task-offloading scenario. Motivated by these developments, we adopt the PPO algorithm [36] to solve the dynamic and complex decisions involved in task offloading. This section defines the task-offloading procedure as an MDP and explains the PPO algorithm and its implementation.

4.1. MDP Formulation

The results of task offloading are influenced by multiple factors, including local computing resources, the number of edge servers, and the current available resources on edge servers. In addition, the current offloading status is affected by the actions taken in the previous step. Therefore, the task-offloading process is typically considered to have MDP properties [7]. This subsection defines a discrete-time MDP to describe the edge-assisted task scheduling system. The three main elements of the MDP, i.e., state, action, and reward, are defined as follows.

State: The state of the system captures the characteristics of the network environment within the MEC system, including detailed information about mobile devices and edge servers. Specifically, the state is defined as

s_{t} = \{t, Γ_{m}^{x} (t), F_{n}^{a v a} (t), F_{m}^{a v a} (t)\},

(13)

where t denotes the current time series, and

Γ_{m}^{x} (t)

includes task request details such as the data volume

I_{m}^{x} (t)

to be processed, the required computational resources

λ_{m}^{x} (t)

, and the maximum time constraint

ℓ_{m}^{x} (t)

. Additionally,

F_{n}^{a v a} (t)

and

F_{m}^{a v a} (t)

indicate the currently available computational resources of the mobile devices and edge servers, respectively. This state formulation ensures that decision-making accounts for both detailed task requests and the availability of resources.

Action: The agent selects an action from a set of possible options according to the current system state. The action space is defined as

a_{t} = \{M D, E S_{1}, \dots, E S_{n}, \dots, E S_{N}\},

(14)

where

M D

denotes processing the task locally, and

E S_{n}

refers to offloading the task to the n-th edge server.

Reward: At each time step, executing an action yields an immediate reward. The agent aims to maximize the cumulative rewards by adjusting its behavior based on these reward signals. This iterative learning approach continuously refines the agent’s strategy for optimal task performance. The reward function is derived from the system cost in Equation (12) and is expressed as

r_{t} = D_{m, x}^{t o t a l} (t) - λ E_{m, x}^{t o t a l} (t) .

(15)

4.2. Preliminaries of DRL

DRL trains agents to make decisions by performing actions within an environment to maximize cumulative rewards. In DRL, decision-making is typically modeled as an MDP, where each current state depends only on its preceding state. Within this framework, the agent observes the environment, selects an action, transitions to a new state, and receives a corresponding reward. The cumulative reward, denoted by

G_{t}

, is the sum of discounted future rewards, calculated as

G_{t} = \sum_{k = 0}^{\infty} γ^{k} R_{t + k + 1}

(16)

where

γ

denotes the discount factor, ranging from 0 to 1.

R_{t + k + 1}

represents the reward received at

t + k + 1

.

The expected cumulative reward from a given state s, known as the state value

V (s)

, is defined as

V (s) = E [\sum_{k = 0}^{\infty} γ^{k} R_{t + k + 1} ∣ S_{t} = s] .

(17)

Furthermore, the value of taking a specific action a in state s, known as the action-value function

Q (s, a)

, is expressed as

Q (s, a) = E [\sum_{k = 0}^{\infty} γ^{k} R_{t + k + 1} ∣ S_{t} = s, A_{t} = a] .

(18)

The Bellman optimality equation [7], which connects the values of state and state–action pairs, is given by

Q (S_{t}, A_{t}) = E [R_{t + 1} + γ V (S_{t + 1})] .

(19)

Finally, the advantage function

A (s, a)

is defined as

A (s, a) = Q (s, a) - V (s) .

(20)

Policy gradient (PG) methods, such as REINFORCE, are policy-based DRL algorithms [37] that optimize a loss function to update policy parameters

θ

to maximize expected cumulative rewards. The policy gradient is defined by the following equation:

L (θ) = - E_{s \sim d^{π}, a \sim π_{θ}} [\nabla_{θ} log π_{θ} (a ∣ s) A^{π} (s, a)],

(21)

where

θ

represents the parameters of the policy

π

,

s \sim d^{π}

denotes states sampled from the distribution under policy

π

,

a \sim π_{θ}

indicates actions sampled from the policy, and

\nabla_{θ} log π_{θ} (a ∣ s)

represents the gradient of the log-probability of selecting action a in state s.

Despite their effectiveness, PG methods face challenges such as high variance and inefficiency due to their reliance on complete state sequences via Monte Carlo sampling. These issues led to the development of more robust algorithms like Proximal Policy Optimization (PPO) [36], which builds on PG principles but incorporates advanced strategies to improve learning stability and efficiency.

PPO is an evolution of the Actor–Critic (AC) architecture, a sophisticated form of PG that employs two neural networks: the actor that dictates the policy and the critic that evaluates the action outcome based on the state value. This dual-network structure enables the continuous learning and adjustment of the policy using more stable and lower-variance feedback from the critic. In an AC framework, the actor updates its policy based on

L_{actor} = \frac{π_{θ^{'}} (A_{t} ∣ S_{t})}{π_{θ} (A_{t} ∣ S_{t})} A^{π_{θ}},

(22)

where

π_{θ^{'}} (A_{t} ∣ S_{t})

represents the new policy.

To further enhance the efficacy and stability of policy updates, PPO introduces an innovative clipping mechanism in the policy update step, known as PPO-clip. This mechanism ensures that adjustments to the policy do not deviate excessively from the previous policy, thus maintaining a balance between rapid learning and stability.

The PPO-clip algorithm adjusts the policy parameters

θ

to maximize the expected return while ensuring that the new policy remains close to the previous policy

θ_{o l d}

. The update is formulated as follows:

θ_{n e w} = \underset{θ}{arg max} E_{s, a \sim π_{θ_{o l d}}} [L (s, a, θ_{o l d}, θ)] .

(23)

The objective function L is defined by

\begin{matrix} L (s, a, θ_{o l d}, θ) = min (\frac{π_{θ} (a ∣ s)}{π_{θ_{o l d}} (a ∣ s)} A^{π_{θ_{o l d}}} (s, a), \\ clip (\frac{π_{θ} (a ∣ s)}{π_{θ_{o l d}} (a ∣ s)}, 1 - ς, 1 + ς) A^{π_{θ_{o l d}}} (s, a)), \end{matrix}

(24)

where

ς

is a hyperparameter that limits the extent of policy updates. For ease of representation, we denote the ratio of the new policy

π_{θ}

to the old policy

π_{θ_{o l d}}

for taking an action a in state s by

ρ (s, a) = \frac{π_{θ} (a ∣ s)}{π_{θ_{o l d}} (a ∣ s)}

. The clipping function is defined as

clip (x, 1 - ς, 1 + ς) = \{\begin{matrix} 1 - ς & if x < 1 - ς \\ x & if 1 - ς \leq x \leq 1 + ς \\ 1 + ς & if x > 1 + ς \end{matrix}

(25)

The advantage function

A^{π_{θ_{o l d}}} (s, a)

is calculated as

A^{π_{o l d}} (s, a) = E_{π_{o l d}} [\sum_{k = 0}^{\infty} γ^{t} r_{t + k + 1} ∣ s_{t} = s, a_{t} = a] - V^{π_{o l d}} (s) .

(26)

The DRL agent is trained using an AC approach, which has been effectively applied in various domains. The PPO algorithm optimizes the actor network. During training, the critic network is updated by minimizing the Mean Squared Error (MSE) between its prediction and the target value function, defined by the following loss function:

L (ϕ) = {(r_{t} + γ V_{ϕ} (s_{t + 1}) - V_{ϕ} (s_{t}))}^{2} .

(27)

The loss function for the policy network includes a clipped objective to ensure that updates to the policy remain within an acceptable range. This is formally defined as

L (θ) = min (ρ \cdot A^{π_{o l d}} (s_{t}, a_{t}), clip (ρ, 1 - ς, 1 + ς) \cdot A^{π_{o l d}} (s_{t}, a_{t}))

(28)

The advantage function, used for policy updates, is calculated as follows:

A^{π_{θ_{o l d}}} (s_{t}, a_{t}) = Q^{π_{θ_{o l d}}} (s_{t}, a_{t}) - V_{ϕ} (s_{t}) .

(29)

4.3. Complexity Analysis

In this section, we analyze the complexity of the PPO algorithm. This paper adopts an AC architecture to improve the stability of the training process. The complexity of the algorithm stems from the calculation of model parameters [38]. Since scheduling tasks are represented as vectors, fully connected networks are primarily used for model construction. Therefore, the computational complexity of these fully connected networks can be represented as

O (\sum_{l = 1}^{L - 1} n_{l} \cdot n_{l - 1})

, where

n_{l}

denotes the number of neurons in the lth hidden layer.

4.4. Task Offloading Using PPO

Figure 2 illustrates the proposed PPO-based task-offloading framework, which operates in two alternating phases: interaction and training. During the interaction phase, the system initializes the actor and critic networks and begins gathering experience data. At each time slot t, the agent selects mobile devices sequentially, utilizing observations to generate policies via the actor network. Each mobile device guided by these policies interacts with the environment, transitioning to subsequent states. The experience data, i.e., states, actions, and rewards, is preserved in the replay buffer. The interaction phase is determined once the buffer reaches capacity.

The training phase begins by sampling batches of data from the replay buffer, denoted by

b \in D

. In the initial learning round, these batches directly feed into the primary actor and critic networks without importance sampling. In subsequent rounds within the same learning episode, data are processed using the updated and original networks within the importance sampling module, supporting the training of new network configurations. Once an episode is complete, the buffer is cleared, and the interaction phase is re-initiated to refill the replay buffer with fresh experience data. This cyclical approach ensures the continuous learning and adaptation of the networks, optimizing the task-offloading process in MEC.

Training Workflow: Algorithms 1 and 2 further detail the algorithm update and data collection processes, respectively. Algorithm 1 begins with the initialization of the task scheduling environment, scheduling algorithm parameters, and the experience buffer D (Line 3 and Line 4). The agent interacts with the environment within each episode to generate experience data, which are then stored in buffer D (Line 6). Once the data in D reaches a preset threshold, the agent proceeds with the model update. This includes extracting a batch of experiences with sample size b from D for parameter updates (Line 7). The agent then calculates the advantage function and state value and uses this information to update the critic and actor networks using the SGD algorithm (Lines 8–11). At the end of each episode, the experience buffer D is cleared (Line 13). The above training process will be repeated until the model converges. Subsequently, the policy network

π_{θ}

can be deployed in the actual offloading system.

Algorithm 2 outlines the complete process by which mobile devices interact with the environment to generate training data. During the data collection phase, the experience buffer is initialized (Line 1). At each t, the edge server sorts the task requests from the mobile devices (Line 3). For each task request, the environment state

s_{t}

related to the current task offloading is constructed, and the policy network generates an

(N + 1)

-dimensional offloading decision (Lines 5 and 6). The mobile device then executes the task based on this decision and receives the corresponding reward (Line 7). This information, including the state transition and reward, is compiled into a complete sample and stored in the experience buffer D (Line 8).

Algorithm 1 DRL-based task offloading.

1:: Input: Task-offloading environment;
2:: Output: Offloading strategy $π_{θ}$ ;
3:: Initialize: Parameters $θ$ and $ϕ$ in actor network $π_{θ}$ and critic network $V_{ϕ}$ ;
4:: Initialize: Replay buffer D;
5:: for $E p i s o d e = 1, 2, \dots, E p i s o d e_{m a x}$ do
6:: Collect data via Algorithm 2 and store in D;
7:: for $E a c h$ $b \in D$ do
8:: Compute advantage A according to Equation (29);
9:: Obtain state values $V (s_{t + 1})$ and $V (s_{t})$ from critic networks;
10:: Update parameters $ϕ$ according to Equation (27);
11:: Update parameters $θ$ according to Equation (28);
12:: end for
13:: Empty the replay buffer D.
14:: end for

Algorithm 2 Data collection for DRL-based task offloading.

1:: Initialize: Replay buffer D;
2:: for Each time slot t do
3:: Sort the order of task requests;
4:: for Each mobile device do
5:: Observe the current environment state $s_{t}$ ;
6:: Compute action $a_{t}$ using policy network $π_{θ}$ with input $s_{t}$ ;
7:: Perform action $a_{t}$ , transition to state $s_{t + 1}$ , and collect reward $r_{t}$ ;
8:: Record the transition $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ in buffer D.
9:: end for
10:: end for

5. Performance Evaluation

In this section, we first outline the configuration of the MEC systems and the parameters of the algorithm. We then proceed to compare the proposed offloading algorithm against other approaches to validate its performance across various scenarios.

5.1. Experiment Settings

(1) Training Setup: We consider an experimental scenario with 30 mobile devices and 4 edge servers. Each mobile device is assigned a CPU capacity selected from the set

{0.2, 0.4, 0.6, \dots, 1.4}

GHz to simulate computational heterogeneity. In contrast, the CPU capacity for each MEC server is fixed at 10 GHz. Depending on the task, it is randomly determined whether the offloading process requires encrypted data transmission. For tasks requiring encryption, 100 megacycles are allocated for encryption and decryption processes. For the wireless transmission model, we set the communication bandwidth

b_{m}^{n}

to 2 MHz and the uplink transmission power

p_{m}

to 0.25 W. The Rayleigh fading channel

g_{m}

is modeled according to the methods described in [9], expressed as

g_{m} = A {(3 * 10^{8} / 4 π f d)}^{2}

, where A is the antenna gain, f is the carrier frequency set at 915 MHz, and d represents the distance between the user and the server. Table 2 outlines the main system parameters.

For the PPO-based offloading algorithm, the convergence of neural networks is highly dependent on the selection of hyperparameters. To identify the most appropriate hyperparameters, we employ Neural Network Intelligence (NNI) (https://github.com/microsoft/nni/ (accessed on 15 March 2024)), an automated learning tool, to conduct an exhaustive search. The optimal parameters identified through this method are then used for the final training of our algorithm. Specifically, the neural network configuration for the PPO algorithm includes two hidden layers, each with 128 neurons, in both the actor and critic networks. The batch size is set to 32. The learning rates are set at 0.003 for the actor and 0.001 for the critic, with a discount factor of 0.9. The buffer size is 10,000. We implement the algorithm using the PyTorch framework, updating the model with the Adam optimizer [39]. To compare the performance of the offloading algorithm under different encryption requirements, we denote the offloading process with data encryption as PPO-E and the offloading process without data encryption as PPO-WE.

(2) Baselines: We compare the offloading performance of the proposed algorithm against four methods, described as follows:

Local Execution: All tasks are executed locally on the device without offloading or data transmission, i.e., $β_{m, x} = 0$ .
Full Offloading: All tasks are offloaded to edge servers for execution, i.e., $β_{m, x} = 1$ .
Offloading based on DQN without security (DQN-WS): This approach utilizes the DQN algorithm for task offloading but does not incorporate security measures for data transmission.
Offloading based on DQN with security (DQN-S): Similar to DQN-WS, this method employs the DQN algorithm but includes task encryption to secure data transmission.

For the DQN algorithm, we construct a two-layer neural network with 64 units in the first layer and 128 in the second. We utilize an experience replay buffer of size 10,000 and set the learning rate at 0.001 [40]. DRL-based algorithms, such as DQN and PPO, mainly use a trial-and-error learning method, continuously interacting with the environment to generate reward signals. These signals guide the agent in refining its decision-making model parameters, thereby enhancing performance.

5.2. Algorithm Convergence Comparison

We first assess the convergence of DRL-based models for task offloading, considering different data encryption conditions. The encryption status is controlled by the parameter

α

. Data transmission encryption is not used when

α = 0

, making the process standard task offloading. When

α = 1

, data transmission encryption is enforced. We evaluate the average rewards of the agents over 1000 episodes.

As shown in Figure 3, the PPO-based offloading algorithm outperforms the DQN algorithm in both encrypted and non-encrypted scenarios. Notably, in scenarios involving data transmission encryption, both DQN-E and PPO-E exhibit lower overall rewards compared to their non-encrypted counterparts, i.e., DQN-WE and PPO-WE, confirming that encryption imposes a performance penalty. Despite this, PPO-E achieves a reward nearly equivalent to its performance in the non-encrypted state. Specifically, by episode 100, PPO-E reaches a reward value of about −100, while DQN-E achieves only about −190. This result highlights the PPO algorithm’s superior sample efficiency, enabling it to learn and adapt more effectively with limited interaction samples. Moreover, the PPO algorithm’s mechanism of limiting the magnitude of policy updates ensures stability and consistency in the learning process, reducing efficiency losses due to policy fluctuations. Overall, the performance superiority of the PPO algorithm in this task scheduling scenario, especially its significant advantage in sample efficiency, establishes it as a preferred solution for managing encrypted-task-offloading challenges in edge computing contexts.

5.3. Average System Performance Analysis

After offline training, the converged DRL network is saved for subsequent online task offloading. During online offloading, only the actor network is utilized for model inference to generate a specific offloading decision. Upon receiving these decisions, the terminal device transitions to the next offloading state

s_{t + 1}

. To evaluate the actual performance of the algorithm, the actor network is adopted to infer multiple offloading tasks, and the average of these inferences is taken as the final performance metric. This assessment includes comparing average system cost, average latency, and average energy consumption across different offloading methods.

As shown in Figure 4, the proposed PPO-based offloading algorithm demonstrates superior performance. In non-encrypted scenarios, the average system cost using PPO-WE is 85, whereas the average system costs for local execution, full offloading, and DQN-WE are 238, 193, and 129, respectively. In encrypted scenarios, PPO-E reduces the average system cost by 31.9% compared to DQN-E. Note that in our experiments, the tasks are mainly compute-intensive. Therefore, the benefits of fully offloading tasks to edge servers far outweigh those of local processing, resulting in the average overhead of local computation exceeding that of the full offloading strategy. Figure 4 also compares the average execution latency and energy consumption, demonstrating that the proposed offloading algorithm achieves the lowest average latency and energy consumption compared to other methods. Specifically, PPO-WE reduces the average latency by 77.5%, 58.9%, and 28.1% compared to local execution, full offloading, and DQN-WE, respectively. In encrypted data transmission, PPO-E reduces average energy consumption by 38.2% compared to DQN-E. In summary, the proposed offloading algorithm outperforms other methods in the overall system cost, effectively enhancing the MEC performance.

5.4. Impact of Number of Edge Servers

In MEC systems, mobile devices offload task requests to edge servers, which utilize their computing resources to process these tasks and return the results. However, a high volume of task requests can deplete the limited computing resources of edge servers, potentially leading to increased latency in task offloading. One way to address this issue is by increasing the number of edge servers, which provides additional computing resources to mobile devices and improves offloading performance. Therefore, the number of edge servers is critical to the overall offloading efficiency.

Figure 5 shows the average system cost for various edge servers while keeping the number of mobile devices fixed at 30. It can be observed that with only two edge servers, the limited resources result in the highest average system cost. With its efficient offloading strategy, the PPO algorithm effectively improves system performance under resource-constrained conditions. In both encrypted and non-encrypted scenarios, PPO-E and PPO-WE reduce the average system cost by 38.6% and 41.1%, respectively, compared to DQN-E and DQN-WE. Notably, when the number of edge servers increases from six to eight, the overall performance improvement of different methods is relatively small. This is because the computing resources of edge servers are no longer a bottleneck for task offloading, and mobile devices have sufficient resources to offload tasks. With eight servers, PPO-WE achieves the lowest average system cost, reducing it by 65.1% compared to DQN-WE. In summary, as the number of edge servers increases, the overall offloading performance of tasks improves significantly. The proposed algorithm effectively optimizes the decision-making process of task offloading, achieving a lower average system cost.

6. Conclusions

This paper investigates the critical issue of secure task offloading in MEC systems, highlighting the limitations of current strategies that often neglect fundamental security aspects. To this end, we propose a security-aware task-offloading framework utilizing DRL. Specifically, we employ the AES encryption method to ensure the security of data transmission during task offloading. We formulate task offloading as an MDP and adopt the PPO algorithm to optimize task execution latency and energy consumption, thereby minimizing system utility while ensuring data security. Comprehensive performance evaluations demonstrate that the proposed framework effectively balances computational efficiency with security, providing a robust solution for MEC systems.

In future work, we will expand on the following points: (1) Security-Aware Collaborative Offloading for Multiple Mobile Devices: Given the heterogeneity in computational resources among different mobile devices, it is possible to offload computational tasks to devices with idle or stronger computational resources while ensuring security. This collaborative offloading strategy can enhance overall system efficiency and task processing capabilities. (2) Federated Reinforcement Learning-Based Task Offloading: To further enhance the security of the task-offloading process, we can leverage the privacy-preserving characteristics of federated learning. By deploying decision models across different devices, federated learning can improve the response time of the decision-making process while maintaining high levels of data security.

Author Contributions

Conceptualization, H.L. and D.Z.; data curation, H.L.; funding acquisition, D.Z.; investigation, H.L. and X.H.; methodology, H.L.; project administration, D.Z.; software, H.L.; supervision, D.Z.; validation, X.H.; writing—original draft, H.L.; writing—review and editing, X.H. and D.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 61872423 and in part by the Postgraduate Research and Practice Innovation Program of Jiangsu Province under Grant KYCX22_0956.

Data Availability Statement

Data are contained within the paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this paper:

$M$	Number of mobile devices.
$N$	Number of edge servers.
$T$	Time slots in each episode.
$Δ$	Length of each time slot.
$ζ_{m, n} (t)$	Connectivity status between mobile device m and edge server n at time t.
$b_{m}^{n}$	Channel bandwidth between mobile device m and edge server n.
$g_{m}$	Channel gain between mobile device and edge server.
$p_{m}$	Uplink transmission power of mobile device m.
$σ^{2}$	Signal-to-Interference-plus-Noise Ratio (SINR) in wireless link.
$Γ_{m} (t)$	Computational task request at time t for mobile device m.
$I_{m} (t)$	Data size of offloading task at time t for mobile device m.
$λ_{m} (t)$	CPU cycles required for the task requested by mobile device m at time t.
$ℓ_{m} (t)$	Execution deadline for task requested by mobile device m at time t.
$f_{m}$	CPU frequency of mobile device m.
$f_{n}$	CPU frequency of edge server n.
$ξ_{m}$	Energy consumption per CPU cycle for mobile device m.
$η$	CPU cycles required to encrypt the data.
$δ$	CPU cycles required to decrypt the data.
$β$	Binary indicator of execution.

References

Wu, Q.; Chen, X.; Zhou, Z.; Chen, L. Mobile Social Data Learning for User-Centric Location Prediction with Application in Mobile Edge Service Migration. IEEE Internet Things J. 2019, 6, 7737–7747. [Google Scholar] [CrossRef]
Yin, X.; Liu, X. Multi-Task Convolutional Neural Network for Pose-Invariant Face Recognition. IEEE Trans. Image Process. 2018, 27, 964–975. [Google Scholar] [CrossRef] [PubMed]
Tang, Y.; Hou, J.; Huang, X.; Shao, Z.; Yang, Y. Green Edge Intelligence Scheme for Mobile Keyboard Emoji Prediction. IEEE Trans. Mob. Comput. 2024, 23, 1888–1901. [Google Scholar] [CrossRef]
Wang, J.; Du, H.; Niyato, D.; Kang, J.; Xiong, Z.; Rajan, D.; Mao, S.; Shen, X. A Unified Framework for Guiding Generative AI with Wireless Perception in Resource Constrained Mobile Edge Networks. IEEE Trans. Mob. Comput. 2024. [Google Scholar] [CrossRef]
Wang, J.; Du, H.; Niyato, D.; Xiong, Z.; Kang, J.; Mao, S.; Shen, X.S. Guiding AI-Generated Digital Content with Wireless Perception. IEEE Wirel. Commun. 2024. [Google Scholar] [CrossRef]
Mao, Y.; You, C.; Zhang, J.; Huang, K.; Letaief, K.B. A Survey on Mobile Edge Computing: The Communication Perspective. IEEE Commun. Surv. Tutorials 2017, 19, 2322–2358. [Google Scholar] [CrossRef]
Xu, D.; Su, X.; Wang, H.; Tarkoma, S.; Hui, P. Towards Risk-Averse Edge Computing with Deep Reinforcement Learning. IEEE Trans. Mob. Comput. 2024, 23, 7030–7047. [Google Scholar] [CrossRef]
Ding, Z.; Xu, J.; Dobre, O.A.; Poor, H.V. Joint Power and Time Allocation for NOMA–MEC Offloading. IEEE Trans. Veh. Technol. 2019, 68, 6207–6211. [Google Scholar] [CrossRef]
Bi, S.; Zhang, Y.J. Computation Rate Maximization for Wireless Powered Mobile-Edge Computing with Binary Computation Offloading. IEEE Trans. Wirel. Commun. 2018, 17, 4177–4190. [Google Scholar] [CrossRef]
Shirazi, S.N.; Gouglidis, A.; Farshad, A.; Hutchison, D. The Extended Cloud: Review and Analysis of Mobile Edge Computing and Fog from a Security and Resilience Perspective. IEEE J. Sel. Areas Commun. 2017, 35, 2586–2595. [Google Scholar] [CrossRef]
Zhang, T.; Xu, C.; Lian, Y.; Tian, H.; Kang, J.; Kuang, X.; Niyato, D. When Moving Target Defense Meets Attack Prediction in Digital Twins: A Convolutional and Hierarchical Reinforcement Learning Approach. IEEE J. Sel. Areas Commun. 2023, 41, 3293–3305. [Google Scholar] [CrossRef]
Zhang, T.; Xu, C.; Shen, J.; Kuang, X.; Grieco, L.A. How to Disturb Network Reconnaissance: A Moving Target Defense Approach Based on Deep Reinforcement Learning. IEEE Trans. Inf. Forensics Secur. 2023, 18, 5735–5748. [Google Scholar] [CrossRef]
Zhang, T.; Xu, C.; Zou, P.; Tian, H.; Kuang, X.; Yang, S.; Zhong, L.; Niyato, D. How to Mitigate DDoS Intelligently in SD-IoV: A Moving Target Defense Approach. IEEE Trans. Ind. Inform. 2023, 19, 1097–1106. [Google Scholar] [CrossRef]
Ranaweera, P.; Yadav, A.K.; Liyanage, M.; Jurcut, A.D. Service Migration Authentication Protocol for MEC. In Proceedings of the IEEE Global Communications Conference (GLOBECOM), Rio de Janeiro, Brazil, 4–8 December 2022; pp. 5493–5498. [Google Scholar]
Singh, J.; Bello, Y.; Hussein, A.R.; Erbad, A.; Mohamed, A. Hierarchical Security Paradigm for IoT Multiaccess Edge Computing. IEEE Internet Things J. 2021, 8, 5794–5805. [Google Scholar] [CrossRef]
Feng, S.; Xiong, Z.; Niyato, D.; Wang, P. Dynamic Resource Management to Defend Against Advanced Persistent Threats in Fog Computing: A Game Theoretic Approach. IEEE Trans. Cloud Comput. 2021, 9, 995–1007. [Google Scholar] [CrossRef]
Liu, Y.; Du, H.; Niyato, D.; Kang, J.; Xiong, Z.; Jamalipour, A.; Shen, X. ProSecutor: Protecting Mobile AIGC Services on Two-Layer Blockchain via Reputation and Contract Theoretic Approaches. IEEE Trans. Mob. Comput. 2024. [Google Scholar] [CrossRef]
Eshraghi, N.; Liang, B. Joint Offloading Decision and Resource Allocation with Uncertain Task Computing Requirement. In Proceedings of the IEEE Conference on Computer Communications (INFOCOM), Paris, France, 29 April–2 May 2019; pp. 1414–1422. [Google Scholar]
Lyu, X.; Tian, H.; Ni, W.; Zhang, Y.; Zhang, P.; Liu, R.P. Energy-Efficient Admission of Delay-Sensitive Tasks for Mobile Edge Computing. IEEE Trans. Commun. 2018, 66, 2603–2616. [Google Scholar] [CrossRef]
Tang, M.; Wong, V.W. Deep Reinforcement Learning for Task Offloading in Mobile Edge Computing Systems. IEEE Trans. Mob. Comput. 2022, 21, 1985–1997. [Google Scholar] [CrossRef]
Wang, X.; Ye, J.; Lui, J.C. Online Learning Aided Decentralized Multi-User Task Offloading for Mobile Edge Computing. IEEE Trans. Mob. Comput. 2024, 23, 3328–3342. [Google Scholar] [CrossRef]
Liu, J.; Ren, J.; Zhang, Y.; Peng, X.; Zhang, Y.; Yang, Y. Efficient Dependent Task Offloading for Multiple Applications in MEC-Cloud System. IEEE Trans. Mob. Comput. 2023, 22, 2147–2162. [Google Scholar] [CrossRef]
Wang, P.; Li, K.; Xiao, B.; Li, K. Multiobjective Optimization for Joint Task Offloading, Power Assignment, and Resource Allocation in Mobile Edge Computing. IEEE Internet Things J. 2022, 9, 11737–11748. [Google Scholar] [CrossRef]
Fang, J.; Qu, D.; Chen, H.; Liu, Y. Dependency-Aware Dynamic Task Offloading Based on Deep Reinforcement Learning in Mobile-Edge Computing. IEEE Trans. Netw. Serv. Manag. 2024, 21, 1403–1415. [Google Scholar] [CrossRef]
Tan, L.; Kuang, Z.; Zhao, L.; Liu, A. Energy-Efficient Joint Task Offloading and Resource Allocation in OFDMA-Based Collaborative Edge Computing. IEEE Trans. Wirel. Commun. 2022, 21, 1960–1972. [Google Scholar] [CrossRef]
Wang, J.; Du, H.; Tian, Z.; Niyato, D.; Kang, J.; Shen, X. Semantic-Aware Sensing Information Transmission for Metaverse: A Contest Theoretic Approach. IEEE Trans. Wirel. Commun. 2023, 22, 5214–5228. [Google Scholar] [CrossRef]
Samy, A.; Elgendy, I.A.; Yu, H.; Zhang, W.; Zhang, H. Secure Task Offloading in Blockchain-Enabled Mobile Edge Computing with Deep Reinforcement Learning. IEEE Trans. Netw. Serv. Manag. 2022, 19, 4872–4887. [Google Scholar] [CrossRef]
Elgendy, I.A.; Zhang, W.; Tian, Y.C.; Li, K. Resource allocation and computation offloading with data security for mobile edge computing. Future Gener. Comput. Syst. 2019, 100, 531–541. [Google Scholar] [CrossRef]
Wu, M.; Song, Q.; Guo, L.; Lee, I. Energy-Efficient Secure Computation Offloading in Wireless Powered Mobile Edge Computing Systems. IEEE Trans. Veh. Technol. 2023, 72, 6907–6912. [Google Scholar] [CrossRef]
Asheralieva, A.; Niyato, D. Fast and Secure Computational Offloading with Lagrange Coded Mobile Edge Computing. IEEE Trans. Veh. Technol. 2021, 70, 4924–4942. [Google Scholar] [CrossRef]
Li, Y.; Aghvami, A.H.; Dong, D. Intelligent Trajectory Planning in UAV-Mounted Wireless Networks: A Quantum-Inspired Reinforcement Learning Perspective. IEEE Wirel. Commun. Lett. 2021, 10, 1994–1998. [Google Scholar] [CrossRef]
Liu, Y.; Wang, K.; Lin, Y.; Xu, W. LightChain: A Lightweight Blockchain System for Industrial Internet of Things. IEEE Trans. Ind. Inform. 2019, 15, 3571–3581. [Google Scholar] [CrossRef]
Gao, Z.; Yang, L.; Dai, Y. Fast Adaptive Task Offloading and Resource Allocation in Large-Scale MEC Systems via Multiagent Graph Reinforcement Learning. IEEE Internet Things J. 2024, 11, 758–776. [Google Scholar] [CrossRef]
Peng, K.; Xiao, P.; Wang, S.; Leung, V.C. SCOF: Security-Aware Computation Offloading Using Federated Reinforcement Learning in Industrial Internet of Things with Edge Computing. IEEE Trans. Serv. Comput. 2024. [Google Scholar] [CrossRef]
Zhang, W.Z.; Elgendy, I.A.; Hammad, M.; Iliyasu, A.M.; Du, X.; Guizani, M.; El-Latif, A.A.A. Secure and Optimized Load Balancing for Multitier IoT and Edge-Cloud Computing Systems. IEEE Internet Things J. 2021, 8, 8119–8132. [Google Scholar] [CrossRef]
Zhan, Y.; Li, P.; Wu, L.; Guo, S. L4L: Experience-Driven Computational Resource Control in Federated Learning. IEEE Trans. Comput. 2022, 71, 971–983. [Google Scholar] [CrossRef]
Sutton, R.S.; McAllester, D.; Singh, S.; Mansour, Y. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Advances in Neural Information Processing Systems (NIPS); MIT Press: Cambridge, MA, USA, 1999; Volume 12. [Google Scholar]
Samir, M.; Assi, C.; Sharafeddine, S.; Ghrayeb, A. Online Altitude Control and Scheduling Policy for Minimizing AoI in UAV-Assisted IoT Wireless Networks. IEEE Trans. Mob. Comput. 2022, 21, 2493–2505. [Google Scholar] [CrossRef]
Lu, H.; He, X.; Du, M.; Ruan, X.; Sun, Y.; Wang, K. Edge QoE: Computation Offloading with Deep Reinforcement Learning for Internet of Things. IEEE Internet Things J. 2020, 7, 9255–9265. [Google Scholar] [CrossRef]
Li, Y.; Aghvami, A.H.; Dong, D. Path Planning for Cellular-Connected UAV: A DRL Solution with Quantum-Inspired Experience Replay. IEEE Trans. Wirel. Commun. 2022, 21, 7897–7912. [Google Scholar] [CrossRef]

Figure 1. An illustration of MEC systems over wireless connections. Tasks are encrypted before offloading and decrypted upon reaching the edge server to ensure data security.

Figure 2. An illustration of PPO-based task offloading in the MEC system. The algorithm is divided into two main modules. The interaction module uses the actor network to interact with the environment, making specific offloading decisions, and collects experience data for storage in the experience buffer. The training module then samples these data from the experience buffer to update both the actor and critic networks. These modules operate alternately, continuing until the agent achieves convergence.

Figure 3. Training convergence of DRL agent in MEC systems.

Figure 4. Performance comparison of different algorithms.

Figure 5. Average system cost with different numbers of edge servers.

Table 1. Comparison of existing works on task offloading.

Reference	Optimization	DRL-Based	Security	No. of Servers
[18]	Latency and energy	No	No	Single Server
[19]	Energy	No	No	Single Server
[20]	Latency	Yes	No	Multiple Servers
[21]	Latency	Yes	No	Multiple Servers
[22]	Latency	No	No	Multiple Servers
[23]	Latency and energy	No	No	Multiple Servers
[24]	Latency and energy	Yes	No	Single Server
[25]	Energy	Yes	No	Single Server
[27]	Latency and energy	Yes	Yes	Single Server
[28]	Latency and energy	No	Yes	Single Server
[29]	Energy	No	Yes	Single Server
[30]	Latency	Yes	Yes	Single Server
Our Work	Latency and energy	Yes	Yes	Multiple Servers

Table 2. System parameter configurations.

Parameters	Value	Parameters	Value
Number of mobile devices	30	Number of edge servers	4
Task data size	{5, 10, 15, …, 30} MB	System bandwidth	15 MHz
Background noise	−100 dBm	Computation capacity of device	{0.2, 0.4, 0.6, …, 1.4} GHz
MEC server capacity	10 GHz	Transmission power of device	250 mW
Communication bandwidth	2 MHz	Carrier frequency	915 MHz

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, H.; He, X.; Zhang, D. Security-Aware Task Offloading Using Deep Reinforcement Learning in Mobile Edge Computing Systems. Electronics 2024, 13, 2933. https://doi.org/10.3390/electronics13152933

AMA Style

Lu H, He X, Zhang D. Security-Aware Task Offloading Using Deep Reinforcement Learning in Mobile Edge Computing Systems. Electronics. 2024; 13(15):2933. https://doi.org/10.3390/electronics13152933

Chicago/Turabian Style

Lu, Haodong, Xiaoming He, and Dengyin Zhang. 2024. "Security-Aware Task Offloading Using Deep Reinforcement Learning in Mobile Edge Computing Systems" Electronics 13, no. 15: 2933. https://doi.org/10.3390/electronics13152933

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Security-Aware Task Offloading Using Deep Reinforcement Learning in Mobile Edge Computing Systems

Abstract

1. Introduction

2. Related Work

2.1. Task Offloading in MEC

2.2. Security-Aware Task Offloading in MEC

3. System Model and Problem Formulation

3.1. Communication Model

3.2. Security Model

3.3. Computing Model

3.3.1. Mobile Device Computing

3.3.2. Edge Server Computing

3.4. Problem Formulation

4. DRL-Based Offloading Algorithm

4.1. MDP Formulation

4.2. Preliminaries of DRL

4.3. Complexity Analysis

4.4. Task Offloading Using PPO

5. Performance Evaluation

5.1. Experiment Settings

5.2. Algorithm Convergence Comparison

5.3. Average System Performance Analysis

5.4. Impact of Number of Edge Servers

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI