Effective Route Recommendation Leveraging Differentially Private Location Data

Kim, Jongwook

doi:10.3390/math12192977

Open AccessArticle

Effective Route Recommendation Leveraging Differentially Private Location Data

by

Jongwook Kim

Department of Computer Science, Sangmyung University, Seoul 03016, Republic of Korea

Mathematics 2024, 12(19), 2977; https://doi.org/10.3390/math12192977

Submission received: 12 September 2024 / Revised: 22 September 2024 / Accepted: 24 September 2024 / Published: 25 September 2024

(This article belongs to the Special Issue Applied Cryptography and Blockchain)

Download

Browse Figures

Versions Notes

Abstract

The proliferation of GPS-enabled devices and advances in positioning technologies have greatly facilitated the collection of user location data, making them valuable across various domains. One of the most common and practical uses of these location datasets is to recommend the most probable route between two locations to users. Traditional algorithms for route recommendation rely on true trajectory data collected from users, which raises significant privacy concerns due to the personal information often contained in location data. Therefore, in this paper, we propose a novel framework for computing optimal routes using location data collected through differential privacy (DP)-based privacy-preserving methods. The proposed framework introduces a method for accurately extracting transitional probabilities from perturbed trajectory datasets, addressing the challenge of low data utility caused by DP-based methods. Specifically, to effectively compute transitional probabilities, we present a density-adjusted sampling method that enables the collection of representative data across all areas. In addition, we introduce an effective scheme to approximately estimate transitional probabilities based on sampled datasets. Experimental results on real-world data demonstrate the practical applicability and effectiveness of our framework in computing optimal routes while preserving user privacy.

Keywords:

route recommendation; transitional probability; density-adjusted sampling; differential privacy

MSC:

68P27

1. Introduction

The proliferation of GPS-enabled devices and advances in positioning technologies have made it easier to collect user location data, making them valuable in a variety of fields. GPS technology is now ubiquitous, embedded in smartphones, wearables, and vehicles, generating vast amounts of location data every day. These data have diverse applications, ranging from personalized services such as targeted advertising and location-based recommendations to broader applications in urban planning, traffic management, and public safety [1,2,3]. One of the most common and practical uses of these location datasets is route recommendation, which aims to recommend the optimal route between two points (i.e., start and end) to users, thereby improving the navigation experience and helping individuals find the most efficient paths [4,5].

Route recommendation has become a major focus in the field of location-based services. Numerous algorithms have been developed to process trajectory data and identify optimal routes using techniques from data mining, machine learning, and graph theory [6,7,8]. Most of these approaches rely on the availability of true trajectory data collected from a large number of users. In these approaches, accurate and comprehensive data are essential to compute optimal routes and generate highly effective recommendations. However, such data are not always available for all applications because collecting true location information from users raises significant privacy concerns. This is because location data often contain personal information, such as home or work addresses, hospital visits, and even political affiliations [9,10]. For example, analysis of location data could reveal a user’s frequent visits to a medical clinic, indicating specific health problems. As a result, users are becoming increasingly aware of the privacy implications of sharing their location data with application providers. This concern has led to a reluctance to provide true location information, limiting the availability of high-quality datasets for application purposes.

In response to these privacy concerns, considerable research has been conducted to develop methods for protecting location privacy when applications collect, process, and utilize user location information. Anonymity-based approaches, which aim to de-identify user data by transforming personal location information into more generalized data, were among the early solutions [11,12]. However, these methods have limitations regarding their effectiveness and the trade-offs between privacy and data utility. Among the various privacy-preserving techniques, differential privacy (DP) has emerged as a de facto standard. DP ensures that an arbitrary attacker, regardless of their background information, cannot determine whether a particular individual’s data are included in the dataset [13]. This provides strong privacy guarantees while still allowing for meaningful data analysis. As DP has become a popular solution, there have been extensive efforts to apply the concept of DP to the protection of location data [14,15,16].

A more practical solution for computing the optimal route is to utilize location data collected through privacy-preserving methods. In this scenario, data contributors who agree to share their location data provide obfuscated trajectories obtained using DP-based methods. The application then analyzes these privacy-preserved trajectories to extract the necessary information and computes the optimal route based on this extracted information. By employing such privacy-preserving techniques, the true location information of data contributors remains confidential, protecting their privacy while still enabling the computation of the optimal route.

However, a significant challenge arises from the fact that datasets collected under DP-based privacy-preserving methods often have low utility. DP techniques introduce noise to the true data, which diminishes their accuracy and makes it difficult to derive precise and reliable information. This trade-off between privacy and utility necessitates the development of innovative methods that can effectively extract necessary information from low-utility datasets. Thus, in this paper, we propose a method to effectively compute the optimal route using location data collected with DP-based privacy-preserving techniques. In particular, the contributions of this paper are summarized as follows:

We present a novel framework for effectively computing and recommending an optimal route in a privacy-preserving manner. In particular, we use metric DP [15,17,18], which extends standard DP to handle data with inherent metric or distance measures, such as location data. To the best of our knowledge, this is the first work to compute an optimal route from location datasets collected under metric DP.
We present a method for effectively extracting transitional probabilities from perturbed trajectory datasets. Directly computing transitional probabilities from perturbed data, which often has low utility, can lead to inaccurate results. Therefore, we propose a novel approach to compute transitional probabilities more effectively from the perturbed trajectory data. Specifically, we first present a density-adjusted sampling method to collect representative data from users. This approach ensures that the collected data comprehensively cover all areas. We then introduce a scheme to approximately estimate transitional probabilities based on sampled datasets.
Finally, through experiments on real data, we demonstrate that the proposed framework can effectively compute the optimal route with perturbed trajectory datasets, which highlights its practical applicability and effectiveness.

The remainder of this paper is organized as follows: Section 2 presents related work. Section 3 provides the necessary background information and formally defines the problem addressed in this paper. Section 4 presents our proposed framework for computing optimal routes. In Section 5, we experimentally evaluate the proposed approach using real datasets. The conclusion is presented in Section 6.

2. Related Work

There has been significant research focused on computing and recommending optimal routes between two locations using trajectory data collected from numerous users. Chen et al. [19] proposed the Coherence Expanding algorithm, which models a transfer network using historical trajectory data. This network is then used to compute transfer probabilities. In addition, they introduced the Maximum Probability Product algorithm to identify the most popular route from the constructed transfer network. Shafique and Ali [20] developed a technique to recommend the most popular path within a region of interest, which is defined by the areas most frequently visited by users around point of interest (POI). Their method processes historical trajectory data by segmenting trajectories into smaller parts and removing noisy data points to enhance accuracy. Yochum et al. [21] proposed an itinerary recommendation system that suggests optimal routes between POIs. For itinerary recommendation, they developed an adaptive genetic algorithm that incorporates various factors such as the popularity of POIs, time constraints, and the length of the trip to optimize itinerary planning. Liu et al. [22] developed a method for computing the most probable routes with travel cost estimation. They introduced a popular traverse graph to summarize historical trajectories independent of road network information. The authors then created a self-adaptive method to model travel costs for each route over different time intervals, and proposed an efficient algorithm to find the optimal route between source and destination locations based on this graph. Wang et al. [23] proposed a route planning method that recommends optimal routes based on driver preferences. Their proposed method considers various attributes that influence a driver’s choice, such as scenery, congestion, and traffic flow, to recommend popular routes.

There has been extensive research on the efficient computation of optimal routes. Classical methods often use search algorithms, such as dynamic programming, to reduce the search space [24]. For instance, Huang et al. [25] introduced a dynamic programming method for providing mobile sequential recommendations to taxi drivers. Teng et al. [26] addressed a joint optimization problem for POI recommendations that maximizes the diversity of POIs within a given travel budget, such as distance or cost constraints. More recently, advancements in deep learning have led to the development of methods that utilize modern techniques like graph neural networks and generative models to compute optimal routes [6,27,28,29]. Huang et al. [6] proposed a multi-task route recommendation framework that leverages deep learning techniques to extract path representations. They utilized beam search algorithms to provide recommendations for multiple related tasks. Wang et al. [27] proposed a novel approach to personalized route recommendation by utilizing neural networks to automatically learn the cost functions in the A* algorithm. Their method leverages attention-based recurrent neural networks to effectively model the travel cost from the source to the candidate location. By incorporating contextual information, the recommended routes are adjusted to accommodate the individual’s specific needs and preferences.

We note that the existing approaches for calculating and recommending optimal and popular routes rely on users’ true location datasets, which inherently raise significant privacy concerns. However, the method proposed in this paper takes a fundamentally different approach by utilizing privacy-preserving datasets collected under the DP-based method, thereby ensuring the protection of users’ location privacy. By leveraging DP-based techniques, the proposed algorithm effectively mitigates the risks associated with sharing sensitive location data while still providing effective route recommendations.

Several efforts have been made to ensure the privacy of the user’s location through the use of DP in the context of trajectory data in various applications. Kim et al. [30] proposed a method for recommending top-n routes in indoor spaces using local differential privacy (LDP), which is a localized version of DP, to ensure privacy preservation. Du et al. [16] developed a privacy-preserving framework for generating synthetic trajectory data, employing LDP to collect necessary information such as transitional probabilities and the distribution of start and end locations. SPIREL is a privacy-preserving POI recommendation framework that identifies POIs of potential interest to users [31]. To generate these recommendations, SPIREL uses trajectory data along with user check-in history collected using LDP to ensure user privacy. RNN-DP [32] is a dynamic data publishing framework that protects user privacy when publishing sensitive trajectory data. RNN-DP combines recurrent neural networks with DP to protect user privacy while publishing trajectory data.

Recently, with the development and widespread use of generative AI techniques, several efforts have been made to apply these methods to route and trajectory data. Among these, synthetic route data generation has emerged as a particularly popular area, where generative AI is used to generate large amounts of synthetic route data [33,34]. These data can be used in various applications, including optimal route recommendations. Moreover, some approaches in this domain incorporate DP to ensure privacy protection when generating synthetic routes using generative AI. For example, DP-Loc [35] is a DP-based approach that utilizes a variational autoencoder and DP to generate synthetic routes with time information. Similarly, DP-TrajGAN [36] is a privacy-preserving trajectory generation model that combines DP with a generative adversarial network.

3. Background and Problem Definition

3.1. Background

In this subsection, we present the background information for this paper. Recently, DP has become the standard approach for privacy-preserving data processing. DP is based on a formal mathematical framework that provides a probabilistic privacy guarantee, even against attackers with arbitrary background knowledge [13]. It ensures that an attacker cannot confidently determine whether a given individual is included in the published dataset. DP is formally defined as follows:

Definition 1.

(ϵ-DP) A randomized algorithm

A

satisfies ϵ-DP if and only if the following condition is satisfied for any two neighboring datasets,

D_{1}

and

D_{2}

, and any possible output O of

A

:

P r [A (D_{1}) = O] \leq e^{ϵ} \times P r [A (D_{2}) = O] .

(1)

Here, two datasets,

D_{1}

and

D_{2}

, are considered neighbors if they differ by only one record. This definition implies that for any output of

A

, an adversary, regardless of his background knowledge, cannot confidently determine whether

D_{1}

or

D_{2}

was used as input. The parameter

ϵ

, known as the privacy budget, controls the level of privacy: smaller

ϵ

values provide stronger privacy protection by adding more noise to the result, while larger

ϵ

values provide weaker privacy protection with less noise.

There have been several proposals to extend standard DP to handle data with inherent metric or distance measures [15,37,38]. Among them, geo-indistinguishability (Geo-Ind) is widely recognized as the standard privacy definition for protecting location data in location-based services [14,15]. In this paper, we use Geo-Ind to collect the location data of the users in a privacy-preserving manner. Geo-Ind is formally defined as follows:

Definition 2.

(ϵ-Geo-Ind) Consider

X

as the set of possible user locations and

Y

as the set of reported locations, which are typically assumed to be equal. Let K be a randomized mechanism that generates a perturbed location from a user’s true location. A randomized mechanism K satisfies ϵ-Geo-Ind if and only if the following condition holds for all

x_{1}, x_{2} \in X

and any output location

y \in Y

:

K (x_{1}) (y) \leq e^{ϵ \cdot d (x_{1}, x_{2})} \times K (x_{2}) (y),

(2)

where

d (x_{1}, x_{2})

corresponds to the distance between

x_{1}

and

x_{2}

.

There are two primary methods for implementing Geo-Ind: the Laplace mechanism and the matrix-based mechanism. It is well known that the matrix-based mechanism is more effective than the Laplace method [15]. In the matrix-based mechanism, the space is first divided into a set of disjoint grids and the data aggregator computes an obfuscation matrix M that satisfies

ϵ

-Geo-Ind. This matrix is then distributed to users, who use it to perturb their location data according to the probabilities specified in M. Users report the perturbed location to the aggregator instead of their true location. Several approaches have been proposed in the literature to compute the obfuscation matrix that satisfies

ϵ

-Geo-Ind.

3.2. Problem Definition

In this subsection, we formally define the problem and introduce key terminology necessary to describe the algorithm proposed in this paper. Our focus is to identify and recommend an optimal route between two points, which is a common service provided by location based services, in a privacy-preserving manner.

Assume that the entire area is partitioned into

m_{1} \times m_{2}

grids. Let

G = {g_{1}, g_{2}, \dots, g_{m}}

be a set of grids where

m = m_{1} \times m_{2}

. Users’ locations are represented by the grid in which they are located. Let

N (g_{i}) \subset G

denote the set of neighboring grids for

g_{i}

. Without loss of generality, in this paper, we assume that users can move between neighboring grids.

Let R be a route (or trajectory) of length

| R |

, and let

R [l]

denote the l-th location on the route R. Given a starting point

g_{s} \in G

and an ending point

g_{e} \in G

specified by a user query, let

S R_{s, e}

represent the set of all possible routes from

g_{s}

to

g_{e}

. That is, each route

R \in S R_{s, e}

satisfies

R [0] = g_{s}

and

R [| R |] = g_{e}

. For each route

R \in S R_{s, e}

, let

P r (t_{s, e} | R)

denote the likelihood that users located at

g_{s}

will move to

g_{e}

via route R. Here,

t_{s, e}

represents the transition state from grid

g_{s}

to grid

g_{e}

. The problem addressed in this paper is to find an optimal route

R^{*}

such that the following condition is satisfied:

R^{*} = \underset{R \in S R_{s, e}}{arg max} P r (t_{s, e} | R)

(3)

In other words, the objective of this paper is to compute the most probable route

R^{*}

from

g_{s}

to

g_{e}

that maximizes the likelihood of users traveling between these points. This computation is based on the transitional probabilities derived from location data collected under the Geo-Ind framework.

4. Privacy-Preserving Computation of Optimal Route

In this section, we introduce the proposed method for effectively computing an optimal route that satisfies user queries in a privacy-preserving manner. Figure 1 provides an overview of our approach:

The server first computes the obfuscation matrix, M, which satisfies $ϵ$ -Geo-Ind, and distributes it to the users (Section 4.1).
Each user selects a subtrajectory from their entire trajectory, perturbs each location along the subtrajectory using M, and sends the perturbed subtrajectory to the server (Section 4.2).
The server estimates the transitional probabilities between neighboring points based on the perturbed subtrajectory datasets collected from users (Section 4.3).
Based on the estimated transitional probabilities, the server computes the optimal route in response to a user query and recommends this route to the user (Section 4.4).

4.1. Computing Obfuscation Matrix

A user’s trajectory consists of multiple locations. Therefore, in order to collect trajectory information from users in a privacy-preserving manner using Geo-Ind, it is essential to apply the composition property of DP [13]. This requires dividing the privacy budget,

ϵ

, into multiple sub-privacy budgets, each of which is then used to perturb an individual location along the trajectory.

One straightforward scheme is to divide

ϵ

by the length of the longest trajectory among users. For example, if the longest trajectory has a length of

L_{m a x}

, each user first divides

ϵ

into

L_{m a x} + 1

sub-privacy budgets (i.e.,

\frac{ϵ}{L_{m a x} + 1}

), and then uses each sub-privacy budget to perturb the respective locations along the user’s trajectory. However, because most users’ trajectories are significantly shorter than

L_{m a x}

, this scheme can lead to an ineffective use of the privacy budget, which leads to low data utility for the collected data. Thus, to address this issue, this paper adopts a method in which, instead of collecting all location information along a trajectory, the aggregation server collects subtrajectories of length L (where

L < L_{m a x}

). This approach is a common solution in DP-based methods to effectively utilize the limited privacy budget when collecting multiple datasets [16].

As explained in Section 3.1, there are two approaches to implement Geo-Ind. In this paper, we utilize the matrix-based mechanism to collect users’ trajectory information in a privacy-preserving manner, especially by utilizing the optimization mechanism proposed in [15]. However, we note that the method proposed in this paper is general enough to be applied to any matrix-based mechanism. In the optimization mechanism, the server first computes the obfuscation matrix, M, by solving a linear programming problem. Let us assume that

π_{G}

is the prior probability distribution of the possible locations of users. The obfuscation matrix, M, which satisfies

\frac{ϵ}{L + 1}

-Geo-Ind, is then obtained by solving the following linear programming problem [15]:

\begin{matrix} m i n : & \sum_{g_{u}, g_{v}^{'} \in G} π_{G} (g_{u}) \cdot M [u, v] \cdot d (g_{u}, g_{v}^{'}) \\ s . t . : & M [u, v] \leq e^{\frac{ϵ}{L + 1} \cdot d (g_{u}, g_{w})} \times M [w, v] & g_{u}, g_{w}, g_{v}^{'} \in G \\ \sum_{g_{v}^{'} \in G} M [u, v] = 1 & g_{u} \in G \\ M [u, v] \geq 0 & g_{u}, g_{v}^{'} \in G \end{matrix}

(4)

Here,

M [u, v]

represents the probability that a perturbed location

g_{v}^{'}

is randomly generated from the user’s true location

g_{u}

by Geo-Ind (in this paper, we denote the true location by g and the perturbed location by

g^{'}

to distinguish between the user’s true location and the perturbed location). The prior probability distribution of the user’s possible locations,

π_{G}

, can be computed using available historical data. If such data are not available, a uniform distribution can be used to define the prior. Once M is computed, it is distributed to each user.

4.2. Density-Adjusted Sampling for Collecting User’s Trajectory Information

After receiving the obfuscation matrix M from the server, each user first selects a subtrajectory of length L from their entire trajectory, and then perturbs each location along the selected subtrajectory according to the probabilities embedded in M. Thus, a straightforward approach is either to select the first

(L + 1)

consecutive locations and ignore the rest [16], or to randomly select

(L + 1)

consecutive locations from the entire trajectory. However, such strategies often fail to collect representative data for computing transitional probabilities across all areas because they do not account for the distribution of users. Typically, users are densely concentrated in the central region and sparsely distributed in areas further from the center [39]. As a result, using these simple selection strategies may result in insufficient data collection to accurately compute transitional probabilities in regions further from the central area.

Figure 2 shows an illustrative example where the original data are densely concentrated near the center of the region and become sparsely distributed as the distance from the center increases. In this scenario, using random sampling may result in insufficient data for regions farther from the center, making it difficult to compute accurate transition probabilities. Therefore, in order to collect samples that are representative of the entire region, it is necessary to consider the distribution of the original data and adjust the sampling probabilities accordingly.

Thus, in this paper, we introduce a density-adjusted sampling method to collect sufficient representative data for computing transitional probabilities across all areas. The main idea of the proposed technique assumes that the distribution of user locations follows a Gaussian distribution. Thus, we first use the inverse of the value of the Gaussian probability density function as the weight for each location along the trajectory, and then randomly select a subtrajectory based on the weight of each location. This strategy assigns higher weights to locations farther away from the central region, preventing the sampled location data from being overly concentrated in the central region and resulting in a more evenly distributed dataset over the entire area.

Let

g_{〈 x, y 〉}

be a two-dimensional grid representation corresponding to the grid at the x-th row and the y-th column where

1 \leq x \leq m_{1}

and

1 \leq y \leq m_{2}

. Since the entire domain is divided into

m_{1} \times m_{2}

grids, the grid corresponding to the center of the area is represented as

g_{〈 \frac{m_{1}}{2}, \frac{m_{2}}{2} 〉}

. Then, the Gaussian probability density function centered at

g_{〈 \frac{m_{1}}{2}, \frac{m_{2}}{2} 〉}

is defined as follows:

P r (g_{〈 x, y 〉}) = \frac{1}{2 π σ_{a} σ_{b}} exp (- \frac{1}{2} [{(\frac{x - \frac{m}{2}}{σ_{a}})}^{2} + {(\frac{y - \frac{m}{2}}{σ_{b}})}^{2}])

(5)

Here,

σ_{a}

and

σ_{b}

represent the standard deviations in the row and column directions, respectively.

Given the user’s trajectory R, a weight

w_{i}

for the i-th location

R [i]

is defined using the Gaussian probability density function as follows:

w_{i} = \frac{α}{P r (g r i d (R [i]))}

(6)

Here,

g r i d ()

represents the function that maps the location of the trajectory to a two-dimensional grid representation such as

g_{〈 x, y 〉}

, and

α

denotes a predefined parameter. In other words, this weighting scheme adjusts the importance of each location along the trajectory based on its distance from the center of the area, with locations further from the center receiving higher weights.

The parameters

σ_{a}

and

σ_{b}

in Equation (5) control the spread of the Gaussian distribution, determining how quickly the weights decrease as the distance from the center increases. Larger values of

σ_{a}

and

σ_{b}

produce a flatter, more uniform distribution, while smaller values produce a steeper distribution that concentrates more weight near the center. Since the inverse of the Gaussian probability density function is used as the weight for each location in Equation (6), setting higher values for

σ_{a}

and

σ_{b}

flattens the distribution of the weights

w_{i}

. On the other hand, lower values for

σ_{a}

and

σ_{b}

produce a steeper distribution, assigning significantly higher weights to locations farther from the center. By adjusting these parameters, we can control the likelihood that peripheral locations are sampled more frequently. We also note that the choice of these parameter values can be empirically learned from available historical data.

Then, the weight of a subtrajectory of length L starting at

R [i]

(i.e., the subtrajectory consisting of locations

R [i], R [i + 1], \dots, R [i + L]

) is defined as follows:

s w_{i} = \sum_{k = 0}^{L} w_{i + k} .

(7)

In other words, by summing the weights of the locations along each subtrajectory, those consisting of locations farther from the center of the area receive higher overall weights than those closer to the center. As a result, when subtrajectories are randomly selected based on these weights, those farther from the center are more likely to be selected. Thus, even though the density of user locations may be higher in the central area, the proposed scheme ensures a more representative dataset across the entire area by giving a higher chance of selection to subtrajectories in less densely populated regions.

Algorithm 1 represents the pseudocode for the density adjusted sampling that is performed on each user side. The input to Algorithm 1 includes the obfuscation matrix M (which satisfies

\frac{ϵ}{L + 1}

-Geo-Ind), received from the server, a trajectory R, and the length of the subtrajectory L. In line 1, the list W, which will store the weights of the subtrajectories, is initialized as an empty list. In lines 2 through 4, subtrajectories of length L are extracted from R by moving a sliding window on R from index 0 to

| R | - L

. The weight of each subtrajectory is then computed based on Equation (7), and the computed weights are then stored in the list W. In line 6, a subtrajectory is randomly sampled based on the weights of each subtrajectory. Then, in line 7, the sampled subtrajectory is perturbed by applying the obfuscation matrix M to each location along the subtrajectory. Finally, the perturbed subtrajectory is sent to the server.

The time complexity of Algorithm 1 is dominated by the for loop, which iterates

| R | - L

times. For each iteration, the weight of a subtrajectory of length L is computed, resulting in a total complexity of

O ((| R | - L) \times L)

. If L is much smaller than

| R |

, this simplifies to

O (| R | \times L)

, which means that the complexity of the algorithm grows linearly with the size of the trajectory and the length of the subtrajectory.

Algorithm 1: Pseudocode for density-adjusted sampling (user side processing)

4.3. Computing Transitional Probabilities

To effectively compute the optimal route, it is essential to model the intra-route mobility of real-world users accurately. For this purpose, we rely on a first-order Markov chain. A first-order Markov chain models the probability of a user moving from one location to another based only on their current location, without considering the history of their previous locations. According to this model, the

(l + 1)

-th location,

R [l + 1]

, depends only on the previous location,

R [l]

, rather than all previous locations:

P r (R [l + 1] | R [1], \dots, R [l]) = P r (R [l + 1] | R [l])

(8)

This simplification makes it easier to compute the likelihood of routes when identifying the most probable route between two points.

Based on a first-order Markov chain, we define the transitional probability

P r (t_{i, j})

, which captures the probability that a user whose l-th location is

g_{i}

will move to

g_{j}

at the next

(l + 1)

-th location, as follows:

P r (t_{i, j}) = \{\begin{matrix} P r (R [l + 1] = g_{j} | R [l] = g_{i}), & i f g_{j} \in N (g_{i}) \\ 0, & o t h e r w i s e \end{matrix}

(9)

Here,

t_{i, j}

represents the transition state from grid

g_{i}

to grid

g_{j}

. It is obvious that the sum of the transitional probabilities to all neighboring grids

g_{j} \in N (g_{i})

equals 1 (i.e.,

\sum_{g_{j} \in N (g_{i})} P r (t_{i, j}) = 1

). Computing the transitional probability,

P r (t_{i, j})

, is straightforward for the true dataset. The challenge, however, is to computer the transitional probabilities from the perturbed dataset collected under Geo-Ind. Thus, in this subsection, we present a method to effectively estimate the transitional probabilities on the basis of the perturbed datasets.

Let

P R_{s e t}

be the set of perturbed subtrajectories collected from users in the previous step. For each perturbed subtrajectory

P R \in P R_{s e t}

, the perturbed transitions can be obtained by capturing all pairs of consecutive locations within

P R

. Specifically, for a given perturbed subtrajectory

P R

of length L, we can extract L transitions (i.e., consecutive location pairs) such as

(P R [0], P R [1]), (P R [1], P R [2]), \dots, (P R [L - 1], P R [L])

. These perturbed transitions are then used to estimate the true transitions.

Let

\hat{T}

be the

m \times m

observed (perturbed) transition matrix, where

\hat{T} [i] [j]

represents the frequency of observing a transition from location

g_{i}

to location

g_{j}

in

P R_{s e t}

. Similarly, let T be the true transition matrix that we aim to estimate, where

T [i] [j]

represents the true frequency of transitions from location

g_{i}

to location

g_{j}

. The relationship between T and

\hat{T}

can be described as follows:

\hat{T} = M^{T} \cdot T \cdot M

(10)

In other words, multiplying T by M on the right applies the perturbation to the current location (i.e.,

R [l + 1]

), while multiplying T by

M^{⊤}

on the left applies the perturbation to the previous location (i.e.,

R [l]

). Given

\hat{T}

and M, the true transition matrix T can be estimated as follows:

\begin{matrix} \min : & ∥ \hat{T} - M^{⊤} {T M ∥}_{F}^{2} \\ subject to : & T [i] [j] \geq 0 \forall g_{i}, g_{j} \in G, \\ T [i] [j] = 0 \forall g_{j} \notin N (g_{i}) \end{matrix}

(11)

Here,

{∥ \cdot ∥}_{F}^{2}

denotes the squared Frobenius norm. The second constraint ensures that transitions are only possible between neighboring grids. We note that directly solving the above optimization problem can be challenging due to the complexity of the objective function and constraints. Therefore, optimization techniques such as sequential least squares programming can be used to find an approximate solution that minimizes the objective function while satisfying the constraints.

After computing the true transition matrix T, the transitional probability

P r (t_{i, j})

can be obtained as follows:

P r (t_{i, j}) = \{\begin{matrix} \frac{T [i] [j]}{\sum_{g_{k} \in N (g_{i})} T [i] [k]}, & i f g_{j} \in N (g_{i}) \\ 0, & o t h e r w i s e \end{matrix}

(12)

The above equation ensures that the sum of the transitional probabilities across all neighboring grids equals 1, since the probabilities are normalized by the total number of transitions to all neighboring grids.

4.4. Recommending Optimal Route

Once the transitional probabilities have been computed, the next step is to determine and recommend the optimal route between two points based on these probabilities. As shown in Figure 1, this recommendation process is initiated whenever a user requests a route suggestion. Given the start and end points requested by a user, the probability

P r (t_{s, e} | R)

, which denotes the probability that users located at

g_{s}

will travel to

g_{e}

via route

R \in S R_{s, e}

, is then computed as follows:

P r (t_{s, e} | R) = \prod_{h = 0}^{| R |} P r (t_{R [h], R [h + 1]})

(13)

This approach assumes a first-order Markov chain in which the probability of transitioning to the next location is determined solely by the current location, independent of previous locations. As a result, the overall probability of traveling from the start to the end point via a specific route is computed as the product of the transitional probabilities between consecutive locations along that route, which is a method commonly used in the literature [24]. Finally, the server returns the route R in

S R_{s, e}

that maximizes this probability, ensuring that the recommended route is the most likely path that users would take based on the observed data.

The first-order Markov chain model used in this paper simplifies the computation of the optimal route by considering only the current and next locations. However, the process can still be computationally expensive for large datasets. As the grid size increases, the number of possible transitions increases significantly, resulting in higher computational cost for determining the optimal route.

To address this, efficient algorithms such as Dijkstra’s or A* can be incorporated into the route recommendation step to determine the optimal route based on transition probabilities, eliminating the need to enumerate all possible routes. These algorithms, which are widely used in graph traversal problems, enable efficient route computation even for large grid sets. We also note that several proposals have introduced computationally efficient methods for determining the optimal route based on transition probabilities without enumerating all possible routes [24]. The approach proposed in this paper is general enough to incorporate such methods, allowing for efficient route computation even in large grid sets. In addition, parallel or distributed computing techniques can be utilized to further enhance computational efficiency in computing the optimal route.

5. Experiment

In this section, we present an experimental evaluation of the proposed scheme on real-world datasets.

5.1. Experimental Setup

We evaluated the effectiveness of the proposed method using the Porto taxi trajectories dataset [40], which consists of GPS coordinates collected from 442 taxis operating in Porto, Portugal. For the experiments, we first divided the geographic region into three grid sizes:

15 \times 20

,

30 \times 60

, and

45 \times 60

. Next, for each grid configuration, we randomly selected 100,000 trajectories. In order to represent trajectories within the grid space, each location was mapped to its corresponding grid cell based on geographic coordinates.

In the experiment, the privacy budget,

ϵ

, was varied between 1.0 and 5.0. The methods used in the experiments were implemented using Python 3.8, and all experiments were performed on a system equipped with Intel Xeon 5220R CPUs and 64 GB of memory. We first present the experimental results on estimating the transitional probabilities, followed by the results of computing the optimal route.

5.2. Evaluation Results on the Estimation of Transitional Probabilities

In this subsection, we evaluate the performance of the proposed method for computing transitional probabilities using perturbed trajectory datasets. In the experiment, the length of sampled subtrajectories varies from 1 to 8. In addition,

σ_{a}

and

σ_{b}

in Equation (5) are set to

\frac{m_{1}}{8}

and

\frac{m_{2}}{8}

, respectively, while the value of

α

in Equation (6) is set to 1. For comparison purposes, we report results for the proposed method (

D A

), which relies on density-adjusted sampling, and results from a method (

B S

) based on simple (non-density-adjusted) sampling. The following metrics are used for evaluation:

The mean absolute error (MAE) quantifies the difference between the actual transitional probability, $P r_{t r u e} (t_{i, j})$ , and the estimated transitional probability from the perturbed dataset, $P r_{e s t} (t_{i, j})$ . MAE is defined as

$MAE = \frac{1}{\sum_{g_{i} \in G} | N (g_{i}) |} \sum_{g_{i} \in G, g_{j} \in N (g_{i})} |P r_{t r u e} (t_{i, j}) - P r_{e s t} (t_{i, j})|$

(14)
The Jensen–Shannon Divergence (JSD) measures the difference between the actual transitional probability distribution and the estimated distribution computed from the perturbed dataset. JSD is defined as

$Density Error = JSD (D (P r_{t r u e}), D (P r_{e s t}))$

(15)

Here, $D (P r_{t r u e})$ represents the distribution of actual transitional probabilities, and $D (P r_{e s t})$ denotes the distribution of estimated transitional probabilities computed from datasets collected under $ϵ$ -Geo-Ind.

Figure 3 and Figure 4 show the effect of subtrajectory length on MAE and JSD for the proposed method with density-adjusted sampling (

D A

) and the method with simple (non-density-adjusted) sampling (

B S

). In these experiments, the privacy budget

ϵ

varies between 1.0, 3.0, and 5.0, while the grid size is fixed at

45 \times 60

. Key observations based on these figures can be summarized as follows. First, as shown in the figures, MAE and JSD generally increase as the privacy budget decreases, which is a common characteristic observed in DP-based methods. This occurs because a lower privacy budget

ϵ

corresponds to stronger privacy protection, which is achieved by introducing more noise into the data. While this enhances privacy, it also results in reduced data utility, as the added noise reduces the accuracy of the data collected.

More importantly, across all scenarios, the proposed method using density-adjusted sampling (

D A

) consistently outperforms the simple sampling method (

B S

). This superior performance is evident in both MAE and JSD metrics, confirming that

D A

provides more accurate estimates of transitional probabilities even under varying levels of privacy. These results validate the effectiveness of the density-adjusted sampling approach in enhancing the accuracy of transitional probability estimation while ensuring privacy.

When the length of the sampled subtrajectory varies from 1 to 8, the best performance is observed at a length of 1. This finding suggests that shorter subtrajectories allow for a more concentrated allocation of the privacy budget to perturb each individual location within the subtrajectory. Specifically, for a subtrajectory of length 1, the privacy budget

ϵ

is divided into two

\frac{ϵ}{2}

sub-privacy budgets, each of which is used to perturb each location within the subtrajectory. As a result, using a subtrajectory of length 1 allows for the collection of perturbed transition data with higher utility compared to longer subtrajectories, which in turn improves the accuracy of computing transitional probabilities. These experimental results suggest that in situations where privacy and accuracy must be carefully balanced, prioritizing shorter subtrajectories can be a more effective strategy. By focusing on shorter sequences, it is possible to maximize data utility while still meeting privacy requirements, thereby achieving better overall performance in privacy-preserving applications.

5.3. Evaluation Results on Estimating the Optimal Route

In this subsection, we evaluate the performance of the proposed method for estimating the optimal route using transitional probabilities computed from perturbed trajectory datasets collected under Geo-Ind. Since there is no existing method directly comparable to the proposed approach, we compare it with the most similar solution available, as proposed in [30], which computes the top-n routes in indoor environments based on location data collected under LDP, which is a localized version of DP. We adapt the method in [30] to compute top-1 routes instead of top-n routes. However, it is important to note that LDP-based approaches are not directly comparable to Geo-Ind-based approaches because Geo-Ind extends DP by incorporating a distance metric, whereas LDP does not. Despite this difference, this comparison can still provide valuable insight into the relative performance of the proposed approach.

For performance evaluation, we employ the following metrics, which are commonly used in the literature [24]:

Precision quantifies the accuracy of the estimated route by measuring the proportion of locations in the computed route $R_{e s t}$ that overlap with the true optimal route $R_{t r u e}$ . Precision is defined as

$Precision (P) = \frac{| R_{t r u e} \cap R_{e s t} |}{| R_{e s t} |}$

(16)
Recall evaluates the coverage of the true optimal route $R_{t r u e}$ by determining the proportion of its locations that are correctly identified in the estimated route $R_{e s t}$ . It is defined as

$Recall (R) = \frac{| R_{t r u e} \cap R_{e s t} |}{| R_{t r u e} |}$

(17)
The $F_{1}$ score provides a harmonic mean of precision and recall. It is given by

$F_{1} = \frac{2 \times P \times R}{P + R}$

(18)

In the experiment, we randomly generate 50 user queries, each consisting of a pair of start and end locations. We then evaluate the performance of the proposed method by calculating the average value of the relevant metrics across all queries.

Figure 5 shows the effect of varying privacy budgets on precision, recall, and

F_{1}

score for the proposed approach (

P A

) and the existing approach (

E A

) described in [30]. In these experiments, the grid size is fixed at

15 \times 20

, and the length of the sampled subtrajectories collected using

ϵ

-Geo-Ind is set to 1, while

ϵ

varies from 1 to 5. As expected, precision, recall, and

F_{1}

score decrease as the privacy budget decreases. This is because a lower privacy budget introduces more noise into the original data, leading to reduced accuracy in the computation of transition probabilities, as observed in the previous experimental results. As a result, this reduction in accuracy negatively impacts the computation of the optimal route, as the underlying transition probabilities become less reliable.

In systems that rely on DP, there is an inherent trade-off between privacy and data utility. When the privacy budget is small, more noise must be introduced to protect user privacy, which can significantly reduce the utility of the data. In extreme privacy conditions, where

ϵ

is very small, the added noise may overwhelm the data, making it difficult to distinguish between meaningful user movements and random noise. Even in regions with frequent user activity, a very small

ϵ

can introduce sufficient noise to reduce the accuracy of transition probabilities, leading to inaccurate route recommendations.

The results shown in the figure also confirm that the proposed approach (

P A

) consistently outperforms the existing approach (

E A

) across all privacy budgets. This consistent performance advantage highlights the effectiveness of the proposed method in balancing privacy and utility. In some cases, the proposed approach can achieve a high level of accuracy, such as 80%. This achievement is particularly noteworthy given the inherent challenges posed by the reduced data utility resulting from the perturbation mechanism of Geo-Ind. The ability of the proposed method to achieve such high accuracy despite these challenges highlights its robustness and reliability in privacy-preserving route recommendation.

Figure 6 shows the effect of varying grid sizes on precision, recall, and

F_{1}

score for the proposed approach (

P A

) and the existing approach (

E A

). In these experiments, the grid size varies among

15 \times 20

,

30 \times 40

, and

45 \times 60

, while the length of the sampled subtrajectories is set to 1, and

ϵ

is fixed at 5. As shown in the figure, precision, recall, and

F_{1}

score all decrease as the grid size increases. This trend occurs because, as the grid size increases, the number of possible routes connecting the start and end points increases proportionally. As the number of possible routes increases, the likelihood of accurately estimating the true optimal route decreases due to the increasing complexity of the route estimation problem, which is a challenge commonly observed in route recommendation systems. Despite these challenges, the proposed approach (

P A

) continues to perform better than the existing approach (

E A

), demonstrating its resilience and effectiveness in managing the trade-off between privacy and utility across varying grid sizes.

Table 1 presents the precision, recall, and

F_{1}

scores corresponding to the results shown in Figure 5. In this table, the improvement of the proposed approach (

P A

) compared to the existing approach (

E A

) is highlighted in the “Imp” column. As shown in the table, for privacy budget settings of

ϵ = 1

and

ϵ = 3

, the proposed method achieves an improvement of about 18–20%. For

ϵ = 5

, an improvement of 15–16% is observed. These results verify that the proposed approach consistently outperforms the existing method across different privacy levels, providing better accuracy in estimating the optimal route.

6. Conclusions and Future Work

In this paper, we presented a novel privacy-preserving framework for computing optimal routes using location data collected by Geo-Ind. The proposed framework effectively addresses the critical trade-off between data utility and user privacy in privacy-preserving route recommendation systems. By incorporating a density-adjusted sampling method and an innovative scheme for estimating transitional probabilities from perturbed datasets, the proposed framework significantly improves the accuracy of optimal route computation. Experimental results demonstrate the practical effectiveness and reliability of the proposed approach, consistently outperforming existing methods across various privacy budgets and grid sizes. These results highlight the potential of the framework to set a new standard for recommending routes while preserving privacy.

Although the proposed algorithm shows promising results, it has some limitations. First, the computational complexity increases significantly with increasing dataset size and grid size, which can be a challenge for large-scale real-time applications. This can lead to higher processing times, making it difficult to efficiently scale the algorithm in environments where real-time processing is critical. Second, the trade-off between privacy and accuracy inherent in DP-based techniques can result in reduced data utility, especially when stronger privacy guarantees are required. As the level of privacy increases, more noise is introduced into the data, which can negatively impact the accuracy of the computed transition probabilities. This trade-off can be particularly problematic in scenarios where accurate recommendations are critical.

Thus, an important future research direction is to improve the efficiency of optimal route computation, especially for large grid sizes. This can be achieved by parallelizing the computational process to reduce the associated overhead. We plan to explore various parallel processing techniques that distribute both data and computation across a cluster of machines, enabling faster and more efficient route recommendations, especially for large-scale applications. Furthermore, another important research direction is to mitigate the privacy–utility trade-off, possibly through adaptive noise addition methods that adjust the level of noise based on the sensitivity of different regions. In highly sensitive areas, stronger privacy protection would be applied by adding more noise. On the other hand, in less sensitive areas, lower privacy requirements would allow for the collection of higher utility data. This approach would ensure that the framework maintains privacy where it is most needed, while maximizing data utility in regions where privacy concerns are lower. Furthermore, future work will focus on evaluating the system’s performance under varying privacy budgets and conducting a theoretical analysis of the privacy-utility trade-off. This will help to better understand the behavior of the system under DP settings and explore ways to optimize the balance between privacy and utility.

Funding

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF-2023R1A2C1004919).

Data Availability Statement

The original data presented in the study are openly available in Kaggle at https://www.kaggle.com/c/pkdd-15-predict-taxi-service-trajectory-i (accessed on 20 June 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tong, Y.; Zhou, Z.; Zeng, Y.; Chen, L.; Shahabi, C. Spatial crowdsourcing: A survey. Int. J. Very Large Data Bases 2020, 29, 217–250. [Google Scholar] [CrossRef]
Shi, D.; Ding, J.; Errapotu, S.M.; Yue, H.; Xu, W.; Zhou, X.; Pan, M. Deep Q-network-based route scheduling for TNC vehicles with passengers’ location differential privacy. IEEE Internet Things J. 2019, 6, 7681–7692. [Google Scholar] [CrossRef]
Zhang, P.; Zheng, J.; Lin, H.; Liu, C.; Zhao, Z.; Li, C. Vehicle trajectory data mining for artificial intelligence and real-time traffic information extraction. IEEE Trans. Intell. Transp. Syst. 2023, 24, 13088–13098. [Google Scholar] [CrossRef]
Xu, X.; Wang, X.; Ye, Z.; Zhang, A.; Liu, J.; Xia, L.; Li, Z.; Feng, B. Route recommendation method for frequent passengers in subway based on passenger preference ranking. Expert Syst. Appl. 2024, 252, 124216. [Google Scholar] [CrossRef]
Chaudhari, K.; Thakkar, A. A comprehensive survey on travel recommender systems. Arch. Comput. Methods Eng. 2020, 27, 1545–1571. [Google Scholar] [CrossRef]
Huang, F.; Xu, J.; Weng, J. Multi-task travel route planning with a flexible deep learning framework. IEEE Trans. Intell. Transp. Syst. 2021, 22, 3907–3918. [Google Scholar] [CrossRef]
Cui, G.; Luo, J.; Wang, X. Personalized travel route recommendation using collaborative filtering based on GPS trajectories. Int. J. Digit. Earth 2018, 11, 284–307. [Google Scholar] [CrossRef]
Silva, R.A.d.O.e.; Cui, G.; Rahimi, S.M.; Wang, X. Personalized route recommendation through historical travel behavior analysis. GeoInformatica 2022, 26, 505–540. [Google Scholar] [CrossRef]
Primault, V.; Boutet, A.; Mokhtar, S.B.; Brunie, L. The long road to computational location privacy: A survey. IEEE Commun. Surv. Tutor. 2019, 21, 2772–2793. [Google Scholar] [CrossRef]
Kim, J.W.; Edemacu, K.; Kim, J.S.; Chung, Y.D.; Jang, B. A survey of differential privacy-based techniques and their applicability to location-Based services. Comput. Secur. 2021, 111, 102464. [Google Scholar] [CrossRef]
Gruteser, M.O.; Grunwald, D. Anonymous usage of location-based services through spatial and temporal cloaking. In Proceedings of the International Conference on Mobile Systems, Applications and Services, San Francisco, CA, USA, 5–8 May 2003; pp. 31–42. [Google Scholar]
Beresford, A.R.; Stajano, F. Location privacy in pervasive computing. IEEE Pervasive Comput. 2003, 2, 46–55. [Google Scholar] [CrossRef]
Dwork, C. Differential privacy. In Proceedings of the EATCS International Colloquium on Automata Languages, and Programming, Venice, Italy, 10–14 July 2006; pp. 1–12. [Google Scholar]
Andres, M.E.; Bordenabe, N.E.; Chatzikokolakis, K.; Palamidessi, C. Geo-indistinguishability: Differential privacy for location-based systems. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Berlin, Germany, 4–8 November 2013; pp. 901–914. [Google Scholar]
Bordenabe, N.E.; Chatzikokolakis, K.; Palamidess, C. Optimal geo-indistinguishable mechanisms for location privacy. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, 3–7 November 2014; pp. 251–262. [Google Scholar]
Du, Y.; Hu, Y.; Zhang, Z.; Fang, Z.; Chen, L.; Zheng, B.; Gao, Y. LDPTrace: Locally differentially private trajectory synthesis. Proc. Vldb Endow. 2023, 16, 1897–1909. [Google Scholar] [CrossRef]
Carvalho, R.S.; Vasiloudis, T.; Feyisetan, O.; Wang, K. TEM: High utility metric differential privacy on text. In Proceedings of the SIAM International Conference on Data Mining, Minneapolis, MN, USA, 27–29 April 2023; pp. 883–890. [Google Scholar]
Imola, J.; Kasiviswanathan, S.; White, S.; Aggarwal, A.; Teissier, N. Balancing utility and scalability in metric differential privacy. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, Eindhoven, The Netherlands, 1–5 August 2022; pp. 885–894. [Google Scholar]
Chen, Z.; Shen, H.T.; Zhou, X. Discovering popular routes from trajectories. In Proceedings of the IEEE International Conference on Data Engineering, Hannover, Germany, 11–16 April 2011. [Google Scholar]
Shafique, S.; Ali, M.E. Recommending most popular travel path within a region of interest from historical trajectory data. In Proceedings of the ACM SIGSPATIAL International Workshop on Mobile Geographic Information Systems, Burlingame, CA, USA, 31 October 2016; pp. 2–11. [Google Scholar]
Yochum, P.; Chang, L.; Gu, T.; Zhu, M. An adaptive genetic algorithm for personalized itinerary planning. IEEE Access 2020, 8, 88147–88157. [Google Scholar] [CrossRef]
Liu, H.; Jin, C.; Zhou, A. Popular route planning with travel cost estimation from trajectories. Front. Comput. Sci. 2020, 14, 191–207. [Google Scholar] [CrossRef]
Wang, R.; Zhou, M.; Gao, K.; Alabdulwahab, A.; Rawa, M.J. Personalized route planning system based on driver preference. Sensors 2021, 22, 11. [Google Scholar] [CrossRef] [PubMed]
Zhang, S.; Luo, Z.; Yang, L.; Teng, F.; Li, T. A survey of route recommendations: Methods, applications, and opportunities. Inf. Fusion 2024, 108, 102413. [Google Scholar] [CrossRef]
Huang, J.; Huangfu, X.; Sun, H.; Li, H.; Zhao, P.; Cheng, H.; Song, Q. Backward path growth for efficient mobile sequential recommendation. IEEE Trans. Knowl. Data Eng. 2015, 27, 46–60. [Google Scholar] [CrossRef]
Teng, X.; Trajcevski, G.; Zufle, A. Semantically diverse paths with range and origin constraints. In Proceedings of the 29th International Conference on Advances in Geographic Information Systems, Beijing, China, 2–5 November 2021; pp. 375–378. [Google Scholar]
Wang, J.; Wu, N.; Zhao, W.X. Personalized route recommendation with neural network enhanced search algorithm. IEEE Trans. Knowl. Data Eng. 2022, 34, 5910–5924. [Google Scholar] [CrossRef]
Wen, H.; Lin, Y.; Mao, X.; Wu, F.; Zhao, Y.; Wang, H.; Zheng, J.; Wu, L.; Hu, H.; Wan, H. Graph2Route: A dynamic spatial-temporal graph neural network for pick-up and delivery route prediction. In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 4143–4152. [Google Scholar]
Wang, C.; Li, C.; Huang, H.; Qiu, J.; Qu, J.; Yin, L. ASNN-FRR: A traffic-aware neural network for fastest route recommendation. GeoInformatica 2023, 27, 39–60. [Google Scholar] [CrossRef]
Kim, D.-H.; Jang, B.; Kim, J.W. Privacy-Preserving Top-k Route Computation in Indoor Environments. IEEE Access 2018, 6, 56109–56121. [Google Scholar] [CrossRef]
Kim, J.S.; Kim, J.W.; Chung, Y.D. Successive Point-of-Interest Recommendation With Local Differential Privacy. IEEE Access 2021, 9, 66371–66386. [Google Scholar] [CrossRef]
Chen, S.; Fu, A.; Shen, J.; Yu, S.; Wang, H.; Sun, H. RNN-DP: A new differential privacy scheme base on recurrent neural network for dynamic trajectory privacy protection. J. Netw. Comput. Appl. 2020, 168, 102736. [Google Scholar] [CrossRef]
Jiang, W.; Zhao, W.X.; Wang, J.; Jiang, J. Continuous trajectory generation based on two-stage GAN. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; pp. 4374–4382. [Google Scholar]
Demetriou, A.; Alfsvag, H.; Rahrovani, S.; Chehreghani, M.H. A deep learning framework for generation and analysis of driving scenario trajectories. IEEE Trans. Syst. Sci. Cybern. 2023, 4, 1–14. [Google Scholar] [CrossRef]
Lestyan, S.; Acs, G.; Biczok, G. In search of lost utility: Private location data. Proc. Priv. Enhancing Technol. 2022, 354–372. [Google Scholar] [CrossRef]
Zhang, J.; Huang, Q.; Huang, Y.; Ding, Q.; Tsai, P.-W. DP-TrajGAN: A privacy-aware trajectory generation model with differential privacy. Future Gener. Comput. Syst. 2023, 142, 25–40. [Google Scholar] [CrossRef]
Alvim, M.; Chatzikokolakis, K.; Palamidessi, C.; Pazii, A. Local differential privacy on metric spaces: Optimizing the trade-off with utility. In Proceedings of the Computer Security Foundations Symposium, Oxford, UK, 9–12 July 2018. [Google Scholar]
Acharya, J.; Bonawitz, K.; Kairouz, P.; Ramage, D.; Sun, Z. Context-aware local differential privacy. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 52–62. [Google Scholar]
Guzman, L.A.; Camacho, R.; Herrera, A.R.; Beltran, C. Modeling population density guided by land use-cover change model: A case study of Bogota. Popul. Environ. 2023, 43, 553–575. [Google Scholar] [CrossRef]
Moreira-Matias, L.; Gama, J.; Ferreira, M.; Mendes-Moreira, J.; Damas, L. Predicting taxi–passenger demand using streaming data. IEEE Trans. Intell. Transp. Syst. 2013, 14, 1393–1402. [Google Scholar] [CrossRef]

Figure 1. An overview of the proposed privacy-preserving framework for recommending optimal routes.

Figure 2. Density-adjusted sampling effectively captures representative data across all areas, while random sampling often fails to achieve such comprehensive coverage.

Figure 3. Effect of subtrajectory length on MAE for the proposed method with density-adjusted sampling (

D A

) and the method with simple (non-density-adjusted) sampling (

B S

).

Figure 3. Effect of subtrajectory length on MAE for the proposed method with density-adjusted sampling (

D A

) and the method with simple (non-density-adjusted) sampling (

B S

).

Figure 4. Effect of subtrajectory length on JSD for the proposed method with density-adjusted sampling (

D A

) and the method with simple (non-density-adjusted) sampling (

B S

).

Figure 4. Effect of subtrajectory length on JSD for the proposed method with density-adjusted sampling (

D A

) and the method with simple (non-density-adjusted) sampling (

B S

).

Figure 5. Effect of varying privacy budgets on (a) precision, (b) recall, and (c)

F_{1}

score for the proposed approach (

P A

) and the existing approach (

E A

).

Figure 5. Effect of varying privacy budgets on (a) precision, (b) recall, and (c)

F_{1}

score for the proposed approach (

P A

) and the existing approach (

E A

).

Figure 6. Effect of varying grid sizes on (a) precision, (b) recall, and (c)

F_{1}

score for the proposed approach (

P A

) and the existing approach (

E A

).

Figure 6. Effect of varying grid sizes on (a) precision, (b) recall, and (c)

F_{1}

score for the proposed approach (

P A

) and the existing approach (

E A

).

Table 1. Precision, recall, and

F_{1}

scores corresponding to the results shown in Figure 5.

Table 1. Precision, recall, and

F_{1}

scores corresponding to the results shown in Figure 5.

	Precision			Recall			$F_{1}$
$ϵ$	PA	EA	Imp (%)	PA	EA	Imp (%)	PA	EA	Imp (%)
1	0.791	0.595	19.56	0.680	0.495	18.52	0.728	0.537	19.06
3	0.857	0.655	20.24	0.724	0.543	18.07	0.781	0.590	19.18
5	0.869	0.701	16.73	0.739	0.581	15.80	0.795	0.631	16.38

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, J. Effective Route Recommendation Leveraging Differentially Private Location Data. Mathematics 2024, 12, 2977. https://doi.org/10.3390/math12192977

AMA Style

Kim J. Effective Route Recommendation Leveraging Differentially Private Location Data. Mathematics. 2024; 12(19):2977. https://doi.org/10.3390/math12192977

Chicago/Turabian Style

Kim, Jongwook. 2024. "Effective Route Recommendation Leveraging Differentially Private Location Data" Mathematics 12, no. 19: 2977. https://doi.org/10.3390/math12192977

APA Style

Kim, J. (2024). Effective Route Recommendation Leveraging Differentially Private Location Data. Mathematics, 12(19), 2977. https://doi.org/10.3390/math12192977

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Effective Route Recommendation Leveraging Differentially Private Location Data

Abstract

1. Introduction

2. Related Work

3. Background and Problem Definition

3.1. Background

3.2. Problem Definition

4. Privacy-Preserving Computation of Optimal Route

4.1. Computing Obfuscation Matrix

4.2. Density-Adjusted Sampling for Collecting User’s Trajectory Information

4.3. Computing Transitional Probabilities

4.4. Recommending Optimal Route

5. Experiment

5.1. Experimental Setup

5.2. Evaluation Results on the Estimation of Transitional Probabilities

5.3. Evaluation Results on Estimating the Optimal Route

6. Conclusions and Future Work

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI