1. Introduction
With the continuous development of mobile communication technology, mobile edge computing (MEC) has emerged as a new computing paradigm and has gained significant attention and application. MEC has been successfully applied in various major scenarios such as autonomous driving, smart cities, smart homes, virtual reality, and facial recognition [
1,
2]. However, during edge computing applications, the leakage of user location privacy poses a potentially significant risk [
3].
Traditional task offloading strategies typically focus on reducing latency and network resource consumption, but they often overlook the protection of user location privacy. For instance, attackers can infer user location information by monitoring the status of edge servers or wireless channels [
4,
5,
6]. Therefore, when users offload their computation tasks to edge servers in the context of MEC, it is important to minimize latency and network resource consumption and ensure user privacy protection.
In order to protect user location privacy while meeting the performance requirements of MEC applications, such as response time and energy consumption, it is necessary to take corresponding technical measures. These measures include location perturbation, anonymization, and differential privacy [
7,
8,
9]. However, current research on location privacy protection in MEC applications has the following major shortcomings:
The system lacks universality and flexibility and is only suitable for static scenarios or single-application demands [
10,
11], making it difficult to adapt to the ever-changing MEC scenarios.
The system falls short of providing adequate security guarantees, leaving it vulnerable to internal and third-party attacks that may compromise the accuracy of location data [
12,
13]. The lack of a trusted identity verification mechanism in the authentication system increases the risk of user location information leakage.
The system lacks a dynamic balance between the computational cost of MEC systems and users’ privacy requirements. Notably, the protection of user location privacy can lead to an increase in the computational burden on MEC servers [
14].
To address the aforementioned shortcomings, enhancing the comprehensiveness of privacy protection mechanisms in MEC applications is necessary. This includes establishing secure communication channels, employing secure computing techniques, and taking measures to prevent untrusted third-party attackers from leaking user location privacy [
15,
16]. In MEC applications, the distance and wireless channel conditions between users and edge servers are closely related; the closer the distance, the better the channel conditions, and the farther the distance, the worse the channel conditions [
17]. If the edge server is untrusted or compromised, attackers can infer wireless channel information by monitoring users’ task offloading rate and deducing their location information. In the context of MEC systems, location perturbation is crucial in preserving user privacy by adding noise or modifying sensitive location information. Differential-privacy-based location perturbation techniques have been studied to protect users’ location privacy [
18]. Thus, this paper investigates location privacy protection in MEC systems based on differential privacy.
This paper aims to investigate the issue of location privacy protection in MEC applications. In addition, different tasks have varying sensitivities to energy consumption and computation delays, resulting in different energy consumption and latency requirements. Additionally, it is significant to consider the dynamic trade-off between computation cost and user privacy requirements. By tackling the challenges associated with location privacy protection and enhancing MEC performance, we can attain sustainable development of location services and optimize MEC applications. To address the trade-off between privacy protection level and energy consumption/latency performance in MEC, we can design an objective function that considers both location privacy and computation cost, aiming to maximize the overall performance of the MEC system.
Traditional research on balancing privacy protection, energy consumption, and latency has often relied on rule-based approaches, which typically require predefined rules and models [
7]. These approaches neglect the dynamic network environment and fail to adapt to complex and changing protection requirements. In contrast, reinforcement learning (RL) stands out by learning through iterative interaction with the environment, facilitating adjustments to environmental changes and uncertainties. Unlike traditional decision-making approaches that often require manual strategy adjustments, RL achieves adaptive strategy refinement through trial-and-error learning. Moreover, deep RL (DRL) combines the strengths of deep learning and RL, employing neural networks to glean insights from the environment and optimize decision-making strategies. As a result, the integration of DRL enhances decision effectiveness, providing a more robust framework for addressing complicated challenges in the dynamic environment. Notably, the inclusion of safe learning mechanisms empowers the learning agent to avoid the selection of high-risk state–action pairs [
19]. Thus, we develop a safe DRL algorithm to solve the designed problem.
This paper proposes a safe deep Q-network (DQN)-based location-privacy-preserved task offloading (SDLPTO) scheme to dynamically balance computational cost and privacy protection. This scheme utilizes differential privacy techniques to protect user location privacy while considering the trade-off between energy consumption, latency, and privacy protection. Simulation results demonstrate the performance advantage of our proposed scheme compared to benchmarks. The main innovations of this paper can be summarized as follows:
We propose a location-privacy-aware task offloading framework that utilizes differential privacy technology to design a perturbed distance probability density function, making it difficult for attackers to infer the user’s actual location from a fake one.
We model the privacy-preserving location perturbation problem as a Markov decision process (MDP). We use the DRL method to adaptively select location-privacy-preserved task offloading (LPTO) policies to avoid location privacy leakage while ensuring computational performance in a dynamic MEC system. This solution can jointly consider location privacy and computational offloading performance, enabling a balance between them.
We develop an SDLPTO scheme to find the optimal location-privacy-preserved task offloading policy. We utilize the DQN algorithm to capture the system state and accelerate policy selection in a dynamic environment. Meanwhile, we implement a safe exploration algorithm for location perturbation and offloading decisions, mitigating potential losses from high-risk state–action pairs.
Simulation results demonstrate that our proposed SDLPTO better balances location privacy and offloading costs. This scheme consistently outperforms benchmark schemes across various task sizes and perturbation distance ranges, demonstrating its advantage in preserving location privacy while minimizing offloading overhead.
The subsequent parts of the paper are organized as follows:
Section 2 discusses related work.
Section 3 presents the proposed system model, location perturbation model, and problem formulation.
Section 4 introduces a safe DQN-based location-privacy-preserved task offloading scheme.
Section 5 gives simulation results and performance analysis, and
Section 6 concludes the paper.
2. Related Work
As a distributed computing model that pushes data processing and storage to the network edge, edge computing has been expanding its application scope, but privacy issues have become increasingly prominent. Data encryption, K-anonymity, blockchain, and location perturbation techniques [
7,
20,
21,
22] have been studied in the context of privacy protection in MEC. More specifically, location perturbation is a technique employed to protect privacy by introducing modifications or perturbations to the original location data. Various technologies, such as differential privacy, path cloaking, temporal clustering, and location truncation, can be utilized for location perturbation [
3,
23,
24].
In recent years, many works have utilized differential-privacy-based location perturbation techniques to effectively protect users’ location privacy [
14,
18]. Differential privacy technology has superior privacy protection effects and can prevent attackers from re-identifying data based on known background knowledge [
18,
25]. In [
18], Wang et al. propose a location-privacy-aware task offloading framework (LPA-Offload) that protects user location privacy by using the location perturbation mechanism based on differential privacy. The scheme formulates the optimal offloading strategy based on an iterative method and then calculates the computation cost and privacy leakage. In [
25], Miao et al. propose an MEPA privacy-aware framework for MEC that uses differential privacy technology to protect location privacy in the dataset domain. A privacy-preserving computation offloading scheme based on the whale optimization algorithm is proposed in [
7]. This scheme uses differential privacy technology to perturb users’ locations and makes offloading decisions based on the perturbed distance. However, this scheme faces challenges in adapting to dynamic environments. It fails to derive an effective location perturbation strategy despite proposing an algorithm to address the convex optimization problem of computation offloading under a given privacy budget. The previous studies mentioned do not consider preserving privacy while optimizing for delay and energy consumption in edge computing. Alternatively, some studies consider privacy preservation but fail to optimize these factors simultaneously. Furthermore, it is important to note that the aforementioned methods are designed for static scenarios and cannot effectively address optimization challenges in dynamic environments.
RL technology has been widely used in dynamic MEC systems [
26,
27,
28], and one of its most important applications is to protect user privacy. The algorithm has the characteristic of adaptive learning and can automatically adjust the learning strategy according to changes in data and the environment. It also uses distributed storage technology to store user data on multiple nodes, thereby preventing user data from being stolen or tampered with by attackers [
29]. In [
17], Min et al. propose a scheme that can protect both user location privacy and user pattern privacy and study a privacy-aware offloading scheme based on RL, which can reduce computation latency and energy consumption and improve the privacy level of medical IoT devices. In [
29], Liu et al. propose a privacy-preserving distributed deep deterministic policy gradient (P2D3PG) algorithm that solves the problem of maximizing the distributed edge caching (EC) hit rate under privacy protection constraints in a wireless communication system with MEC. In [
14], Zhang et al. studied differential privacy and RL task transfer strategy, established an MEC system model, and designed a four-layer policy network as an RL agent, but lacked a balance between privacy and computation offloading performance. To solve the above problems, our work proposes an RL-based algorithm that achieves a balance between privacy protection and computation offloading performance by combining differential privacy and RL technology.
4. Safe DQN-Based Location-Privacy-Preserved Task Offloading
It is typically difficult to employ traditional optimization techniques to obtain the optimal location-privacy-preserved task offloading policy in a dynamic MEC system. In this section, we will show how to utilize a safe DRL method to protect the user’s location privacy while ensuring the performance of MEC. In detail, we first model the privacy-preserving location perturbation problem as an MDP [
33]. Then, we propose an SDLPTO scheme in which risk assessment is performed on state–action pairs to avoid the selection of high-risk disturbance policies, as shown in
Figure 3.
The system’s next state is only related to the state and selected policy of the current time slot. Hence, the MDP model can model the location-privacy-preserved task offloading process. Therefore, we can use RL technology to dynamically explore the optimal location-privacy-preserved task offloading policy [
34]. We define the state, action, reward, and risk level function of the SDLPTO scheme, which can be represented by a tuple
.
State: = [,] is the system state at time slot k, , where is the state set. Before optimizing the performance of the edge computing system and the degree of privacy leakage, we set the number of tasks generated by user devices and the wireless channel condition between the user and the edge server to the environment state.
Action: = [, ] is the system state at time slot k, , where is the action set. We use the task offloading ratio and privacy budget as actions which affect the computational offloading decision and privacy leakage situation, and perturbation location , respectively.
Reward: Considering several factors and long-term optimization, we define the utility as a weighted sum of energy consumption, latency, and privacy leakage level, which can be expressed as .
Risk level function: The risk level of taking in state is at time slot k.
The risk level of the current state–action pair
is evaluated by the user based on the privacy leakage level
, which is evaluated based on Equation (
10). It represents the extent of privacy leakage caused by the perturbation policy
in state
. We assume that there are
L risk levels, with the highest risk level being
, representing the most dangerous behavior state. Conversely, zero represents the lowest risk level. We define
as the safety performance indicators with
L risk thresholds. Consequently, similar to, the risk level
can be evaluated by
where
is an indicator function.
Although the user evaluates the current state–action pair’s risk level by
, the location-privacy-preserved task offloading policy may result in severe privacy leakage. Therefore, the user also estimates the long-term risk level
of the previous location-privacy-preserved task offloading policies to estimate their impact on the future by tracing back the prior
experienced state–action pairs, which are given by
where
is the decay factor.
The long-term expected reward (
Q-value) of the user that adopts the perturbation policy
at state
is updated as follows:
where
is the learning rate and
is the discount factor.
The user takes account of both the
Q-value and
E-value while selecting the location-privacy-preserved task offloading policy. The policy function
is the probability distribution of selecting the offloading policy and location perturbation policy
in the current state
, which is given by
At time slot
k, based on the current state
, the user selects the location-privacy-preserved task offloading policy
according to Equation (
17). Then, the user executes the action
= [
,
] and obtains the system reward
after evaluating the energy consumption, latency, and privacy leakage level. Then, the system state transfers to the next state
.
The experience feedback technique is an important part of the DQN algorithm. The transitions
are stored in a storage pool
, and then some experiences from
are randomly selected to train on a small batch
. The system state
is extended to the location-privacy-preserved task offloading experience sequence denoted by
, consisting of the state
and the previous
H action–state pairs, i.e.,
=[
,
, ⋯,
,
,
]. The experience sequence
is input to the E-network and the Q-network to estimate
and
, respectively. Then, the policy
is selected based on Equation (
17).
The current state–action pair is fed into the E-network to obtain the network’s estimate of the
E-value. Then, the target
E-value is calculated. The difference between the estimated
E-value and the target
E-value is computed, and this difference is used to update the weights
of the E-network. The loss function of the
E-value
is defined as follows:
During training, we use a stochastic gradient descent algorithm to update the weights of the convolutional neural network (CNN). The CNN evaluates the strategy as a
Q-value so that the agent can choose the optimal action based on the current state. By minimizing the mean square error between the estimated network’s output
Q-value and the optimal target
Q-value, the agent can update the Q-network weights
Q and improve its performance in the environment, with the loss function
given by
The process is repeated
H times to update
and
. We also adopt the transfer learning technique to initialize the weights of the two deep CNNs to improve the training efficiency, and the random exploration is avoided at the beginning of learning. For the traditional DQN algorithm, the usual approach involves calculating the target
Q-value and selecting the action with the highest
Q-value as the current policy choice. However, in the SDLPTO algorithm, a risk assessment method is employed during action selection to avoid choosing high-risk actions. The detailed safe DQN-based location-privacy-preserved task offloading is described as Algorithm 1.
Algorithm 1 Safe DQN-based LPTO (SDLPTO) |
- 1:
Initialize the real distance, , , , and according to transfer learning - 2:
for
do - 3:
Observe the system state = [,] - 4:
Input the experience sequence to the Q-network and E-network to estimate the Q-values and E-values - 5:
Select = [, ] based on the offloading policy and location perturbation policy obtained from the network - 6:
Obtain based on (8) at current privacy budget - 7:
Obtain based on (3) at current perturbation distance - 8:
Calculate the average cost and privacy leakage to obtain the utility - 9:
Update the weights of the CNNs for and by applying minibatch updates via (18) and (19) - 10:
end for
|
5. Simulation Setup and Results
In this section, we evaluate the performance advantage of our proposed scheme through simulation experiments. In the context of task offloading in edge computing, we assume that there is one user and one edge server. The coverage range of the MEC server is 500 m [
7]. The mobile user is randomly distributed within this area, and the path loss exponent is assumed to be
= 0.2. All the experiments are implemented by Python 3.8 and on the same machine, with 16 GB RAM and an Intel(R) Core(TM) i5-12500 processor.
The learning rate of the agent is set to 0.004, the discount factor is set to 0.99, and we train the agent for 4000 time slots. Each time slot has a duration of 1 s, similar to [
35]. Our research team determined these parameters by conducting multiple experiments, enabling us to fine-tune the parameters and achieve optimal simulation performance. The rest of the parameter settings for the experiments are shown in
Table 1. Adjusting the system environment parameters might impact the numerical results; they do not alter our approach’s overall trends and advantages.
When the privacy parameter set by the user is larger, the level of privacy leakage is lower, and the user tends to use a larger privacy parameter to protect location privacy. However, on the other hand, the distance between the user and the server may increase, which means that the average cost incurred by the user will be higher.
As shown in
Figure 4a, the computational cost also increases with the increase in the parameter
. When the value of
is high, a larger perturbation range is required to perturb the user’s location to protect location privacy. Because the perturbed position data may have a certain deviation from the actual distance between the user and the server, which might be greater than the real distance, more computation is needed when offloading tasks, thus increasing the computational cost. As the parameter
increases, the level of privacy leakage of the user’s location will decrease. The parameter
balances the trade-off between the user’s privacy leakage level and the computational cost, reflecting the user’s concern about location privacy protection. A higher
value indicates that the user pays more attention to location privacy protection and requires a larger disturbance range to disturb the user’s location. When the disturbance area becomes larger, the distance difference between the disturbed pseudo-location and the actual location may increase, making it difficult for attackers to infer the user’s actual location, thus protecting the user’s location privacy.
Figure 4b shows that as the parameter distance increases, the computational cost also increases. As the distance gradually increases, the range of the perturbation region will become larger, and the probability of the user’s perturbed location being further from the true location will become larger to protect location privacy. When the distance between the user and the server after perturbation becomes greater, the state of the wireless channel may worsen, the user may perform more tasks locally, and the offloading strategy may not be optimal, thus increasing the computational cost. Furthermore, it shows that as the parameter distance increases, the level of location privacy leakage of the user will decrease. As the distance gradually increases, the range of the perturbation region will become larger. The attacker needs to search within a broader area to determine the user’s real location, which greatly increases the difficulty and probability for the attacker to find the user’s real location. In this way, the user’s location privacy is protected.
Figure 5 and
Figure 6 illustrate the performance of the proposed mechanism versus time. From the figure, it can be seen that our proposed SDLPTO mechanism outperforms the No DP and DPRL mechanisms by reducing the privacy leakage level of SDLPTO by 18.2% and 11.2% at time slot 2000, reducing the computational cost by 33.1% and 35.2%, and improving the utility by 33% and 27.2%, respectively. This is because No DP does not consider location privacy, and DPRL only implements location privacy protection and offloading optimization separately; our proposed method jointly optimizes user privacy and computational offloading cost, effectively improving the overall user benefit. Moreover, compared with LPTO, SDLPTO reduces the privacy leakage level and computational cost by 7.7% and 26.7%, respectively, and improves the benefits by 9.1%. This is because safe exploration can avoid selecting operations with higher risk levels, thus reducing privacy leakage and computational cost.
Figure 7a,b illustrate the relationship between average computational cost, privacy leakage, and task size for the four mechanisms. As the task volume increases from 50 Kbit to 300 Kbit [
35], SDLPTO exhibits a 6.0% increase in privacy leakage level and a 5.2-times increase in computation cost. This indicates that as the task scale grows, more computational resources are required to execute these tasks, leading to a significant rise in computation cost. Moreover, as the task volume increases, users tend to offload more tasks to edge servers, which entails greater collection and processing of location information data, potentially increasing the risk of location privacy leakage.
In contrast, the No DP and DPRL mechanisms are only affected in terms of computational cost by increased task volume, while their privacy leakage level remains consistently high. When the task volume reaches 300, SDLPTO demonstrates a reduction of 3.6% in privacy leakage level and a decrease of 11.2% in computational cost compared to LPTO. This indicates that even with increased task volume, our proposed approach can still effectively balance and optimize the trade-off between user privacy requirements and computational cost, reducing the privacy leakage level and average computational cost.
Figure 7c,d illustrate the average performance of the four mechanisms at different range sizes. As the perturbation range increases, there is a possibility of perturbing the user’s real location to a more distant position. With a greater perturbation distance, according to Equation (
3), the signal undergoes attenuation, interference, and other effects during transmission, resulting in degraded channel conditions and an increase in average computational cost. For example, when the perturbation range increases from 50 to 250, both LPTO and SDLPTO experience a respective increase of 39.8% and 40.5% in average computational cost.
Simultaneously, as the range increases, the privacy leakage level of users decreases. However, the No DP mechanism does not consider location privacy protection, leading to consistently high privacy leakage levels. Despite the increase in range resulting in increased costs, our proposed mechanism still outperforms the others. For example, at a perturbation range of 250, SDLPTO achieves a 15.2% reduction in privacy leakage level and a 5.9% decrease in average cost compared to DPRL.