Next Article in Journal
Thermogram Breast Cancer Detection: A Comparative Study of Two Machine Learning Techniques
Previous Article in Journal
Estimating Time-Varying Applied Current in the Hodgkin-Huxley Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DGS-HSA: A Dummy Generation Scheme Adopting Hierarchical Structure of the Address

1
National Engineering Laboratory for Disaster Backup and Recovery, Information Security Center, School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China
2
School of Computer and Information Engineering, Hechi University, Guangxi 546300, China
3
Guizhou Provincial Key Laboratory of Public Big Data, Guizhou University, Guizhou 550025, China
*
Author to whom correspondence should be addressed.
Current address: National Engineering Laboratory for Disaster Backup and Recovery, Information Security Center, School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, China.
Appl. Sci. 2020, 10(2), 548; https://doi.org/10.3390/app10020548
Submission received: 23 November 2019 / Revised: 31 December 2019 / Accepted: 8 January 2020 / Published: 11 January 2020

Abstract

:
With the increasing convenience of location-based services (LBSs), there have been growing concerns about the risk of privacy leakage. We show that existing techniques fail to defend against a statistical attack meant to infer the user’s location privacy and query privacy, which is due to continuous queries that the same user sends in the same location in a short time, causing the user’s real location to appear consecutively more than once and the query content to be the same or similar in the neighboring query. They also fail to consider the hierarchical structure of the address, so locations in an anonymous group may be located in the same organization, resulting in leaking of the user’s organization information and reducing the privacy protection effect. This paper presents a dummy generation scheme, considering the hierarchical structure of the address (DGS-HSA). In our scheme, we introduce a novel meshing method, which divides the historical location dataset according to the administrative region division. We also choose dummies from the historical location dataset with the two-level grid structure to realize the protection of the user’s location, organization information, and query privacy. Moreover, we prove the feasibility of the presented scheme by solving the multi-objective optimization problem and give the user’s privacy protection parameters recommendation settings, which balance the privacy protection level and system overhead. Finally, we evaluate the effectiveness and the correctness of the DGS-HSA through theoretical analysis and extensive simulations.

1. Introduction

With the rapid development of smart terminal technology, positioning technology, and mobile Internet technology, location-based service (LBS) is increasingly penetrating all aspects of life, such as inquiring about nearby restaurants, hotels, and so on, bringing great convenience to people. In reality, people enjoy location-based services over the cyber. Considering the network security situation, people inevitably suffer serious cyber risks, e.g., malware [1], spyware [2], encryption for malicious purposes [3], and zero-day attacks [4]. Privacy leakage, as one of the cyber risks, is drawing more and more attention.
At present, the intrinsic privacy leakage for LBS is increasingly becoming a huge challenge for protecting a user’s privacy. In applying LBS, a user submits a request, which contains locations and query content, to the LBS server. However, users’ locations and query content are often leaked, due to the compromise between the provider (LP) and attackers. As a result, users are exposed to privacy leakage when using an LBS. For example, attackers can infer a user’s privacy (e.g., where the user works and goes for a dinner, where the user stays overnight) from locations and query content (e.g., “which hospitals are nearby”). Even then, users still need to use LBSs in many cases. Thus a location privacy-preserving technology must settle these problems to protect locations and so query content is not disclosed. In this work, we investigate how to protect users’ privacy in the scenario of a continuous query.
In this scenario, to protect users’ privacy, researchers have presented a variety of methods like privacy policy [5,6], distortion, encryption [7,8,9,10,11], and so on. Among these methods, distortion techniques, such as pseudonym, obfuscation, and dummy, are most widely used to protect privacy. Pseudonym techniques [12,13] protect users’ privacy by replacing or directly deleting users’ identity identifiers with a false or temporary pseudonym. While users can use pseudonyms for privacy protection, adversaries can still infer the user’s real identity by analyzing the spatial-temporal correlation of continuous queries [13]. Obfuscation techniques [14,15,16] protect the users’ privacy by generalizing or perturbing the time and locations in LBS queries, and then the users’ precise locations are not recognized. However, submitting inaccurate locations will reduce the quality of the service. An effective approach is to maintain the quality of service while ensuring that users’ privacy is not leaked. Considering that the dummy techniques [13,17,18,19,20,21,22,23,24,25,26,27] to protect the privacy of a user are to add dummy queries to the real query and do not reduce the quality of the service. They are feasible in our work. In this paper, we focus on how to hide a user’s locations or query content among dummies in the scenario of a continuous query.
While existing dummy techniques solved the privacy leakage in the LBS query, they mainly focus on the scenario neighbor query in which a user sends queries in different locations and they neglect the scenario where the same user sends a short-time continuous query in the same location (called ’scenario of this paper’ for simplicity ). Figure 1 shows an example of privacy leakage in the scenario of this paper.
In the example, the real user sends three queries in location L to query nearby points of interest (POI). For each query, three dummy users are selected to form an anonymous group. However, existing techniques, e.g., the distortion technique, only select one of the three queries (we assume the first query is selected) to hide the location and query content since existing techniques neglect the scenario of this paper. So, we assume not one of the users sends three queries except the real user u 1 in the example. For four query content sent by all users in each query, we assume the first query has three query categories, while there is only one query category for the second and third query, respectively.
As shown in Figure 1, by intersecting users in different anonymous groups and combining background knowledge [26], the attacker could find that only u 1 was in the three queries and then inferred u 1 was the real user who sends queries in L (only the same user has the ability to send queries continuously in a short period of time, see [13]). Moreover, attackers can deduce the user’s sensitive location, e.g., home address, workplace, by collecting and analyzing the user’s long-term historical query data [13]. For the query content, although the attacker cannot distinguish which one is the query content of u 1 in the first query, he was able to identify it in the second and third queries. We call this kind of attack, one meant to obtain sensitive location and query content privacy by intersection inference attacks (see [27]), intersection inference attack for location and query content (I2LC).
Furthermore, existing methods do not consider the hierarchical structure of the address (HSA), which may reveal a user’s organization information. Actually, a region is comprised of different levels of region, and these different levels of region are called the HSA. For example, the HSA of Beijing is comprised of four levels of region: City (Beijing), district (Haidian District), street (Xi Tucheng Road), organization (Beijing University of Posts and Telecommunications (BUPT)). If we do not consider the HSA, we may select k 1 locations that locate in the same organization as the real location in the anonymous group. For example, the k locations in the anonymous group are all located in BUPT. Despite this not directly exposing the user’s exact location, it reveals that the user is located in BUPT. For some particular users, the organization information also needs to be kept secret, such as confidential personnel, anti-drug police, etc.; once leaked, it will be used by criminals to infer their occupation and pose a life threat to the users.
In this paper, we present a dummy generation scheme considering the HSA, called DGS-HSA, to generate dummy locations and query content. In this scheme, for each query in a continuous query, we select k locations (in our scheme, the user does not send her real locations to the LBS server, see Section 4.4), from historical locations as dummy locations. In each dummy location, a user sends a dummy query selected from the basic service set. In the DGS-HSA, we first divide a city area where users locate into different grids with the two-level structure. Each grid represents an organization and all historical locations are clustered into these grids according to their organizations. Based on these grids, we then propose the degree of privacy protection < L ( k , s ) , Q l > . Using < L ( k , s ) , Q l > , we select dummy locations and query content to protect users’ location privacy, organization information, and query privacy in the scenario of this paper. For each query in a continuous query, we select k locations distributed across s different organizations from historical locations and query content in all queries in a continuous query that has at least l query categories. It can protect a user’s organization information and query privacy since it ensures that k locations do not locate in the same organization and query content in all queries in a continuous query that has at least l query categories. For all queries in a continuous query, we ensure all locations selected from historical locations are evenly distributed across all different organizations. So, our scheme can resist I2LC and protect users’ location privacy. Using theoretical analysis we also prove that our scheme can meet the < L ( k , s ) , Q l > and resist I2LC. However, considering the privacy protection level and system overhead, the two must reach equilibrium if our scheme is feasible. So, we evaluate the feasibility and effectiveness of our scheme. Results show that our scheme can reach an equilibrium between privacy protection level and system overhead and can better protect users’ location privacy, organization information, and query privacy than the scheme in [13,27].
The DGS-HSA can protect users’ organization information and query privacy. Compared with [13,27], the DGS-HSA considers the HSA and divides a city area where users locate into different grids with the two-level structure. Based on it, for each query in a continuous query, the DGS-HSA selects k dummy locations distributing across s different organizations and k query content in a continuous query that has at least l query categories. That is, attackers cannot distinguish the user’s real organization and real query content.
Compared with [13,27], the DGS-HSA can resist I2LC and protect users’ location privacy. For a continuous query, our scheme evenly selects dummy locations from different organizations. It ensures that all dummy locations are evenly distributed across these organizations. It means the probability that each location distributes across every organization is the same. That is, attackers cannot distinguish the user’s real location by launching I2LC. Specifically, the major contributions of this paper are as follows:
  • To protect a user’s location privacy, organization information and query privacy in the scenario of this paper, we present the DGS-HSA to generate dummy locations and query content. The DGS-HSA considers the HSA and can resist I2LC.
  • Considering the HSA, we introduce a novel meshing method to divide a city area where users locate into different grids with the two-level structure. Using this, our scheme ensures that, for each query in a continuous query, k dummy locations do not select from the same organization. In addition, for the query content, the probability of occurrence of each query category is the same in all queries in a continuous query. Thus, our scheme can better protect users’ organization information and query privacy than the scheme in [13,27].
  • To protect a user’s location privacy, we propose a method to resist I2LC. Using this method, for a continuous query, our scheme ensures that each selected dummy location evenly distributes across different organizations in a continuous query. In addition, we also replace the real user’s location with historical locations that are in the same organization as the real user’s location. It prevents the user’s exact location from being sent to the server. Thus, our scheme can resist I2LC and better protects users’ location privacy than the scheme in [13,27].
  • We evaluate the feasibility and effectiveness of our scheme. Results show that our scheme can reach an equilibrium between privacy protection level and system overhead and we give the recommended configuration of system parameters. They also show that our scheme can better protect users’ location privacy, organization information and query privacy than the scheme in [13,27].
The rest of the paper is organized as follows. We first overview the related research in Section 2. Then, in Section 3, we introduce the adversary model, the motivation and basic idea, the privacy protection model, and the system architecture of this paper. We also describe our algorithms in detail in Section 4. Respectively, Section 5 and Section 6 evaluate the effectiveness of the presented scheme from both theoretical and experimental aspects, including scheme feasibility analysis. Finally, in Section 7, we summarize the research work and the innovative results and point out the problems and future work.

2. Related Work

Location privacy issues have attracted more and more attention since Beresford and Stajano first raised them [28]. In this section, we mainly review some existing research on the privacy-preserving technique, the query request scenario, system architecture, and the attack model.

2.1. Privacy-Preserving Technique

Researchers have presented a variety of methods such as privacy policy [5,6], distortion, and encryption [7,8,9,10,11]. Among these methods, distortion has been the most widely used method of protecting users’ privacy. It mainly contains three main methods: Pseudonym, obfuscation, and dummy. Pseudonym [12,13,28] protects users’ privacy by replacing or directly deleting users’ identity identifiers with a false or temporary pseudonym. In [12,28], users did not communicate with the LBS server directly but communicated with an anonymizer proxy which replaced the user’s real identity to pseudonymous before the user’s message was submitted to the LBS server. In [13], for MNAME and REGP, a user stored several user names, selected randomly one user name from the name set as the current user name in every time LBS query; for SNAME, anonymizer changed the user name to the same and sent the query to LBS server in every time LBS query. However, only using pseudonyms is not enough to protect users’ privacy, and adversaries can still infer the user’s real identity by analyzing the spatial-temporal correlation of continuous queries [13].
Obfuscation techniques [14,15,16] protect the users’ privacy by generalizing or perturbing the time and locations in LBS queries, and then the users’ precise locations are not recognized. In those methods, the anonymizer submitted an area containing at least k 1 other subjects in every time LBS query, instead of submitting the user’s exact location; this is called spatial cloaking. During the above process, the LBS query would be delayed until k vehicles have visited the area chosen for the requestor, which reduces the accuracy in time; this is called temporal cloaking. However, submitting inaccurate locations or time will reduce the quality of the service in some specific services. Dummy techniques [13,17,18,19,20,21,22,23,24,25,26,27] to protect the privacy of a user add dummy queries to the real query and do not reduce the quality of the service.
In this paper, we focus on how to hide a user’s locations or query content among dummies in the scenario of a continuous query. Different from existing dummy techniques, such as [13,27], in our method, each selected dummy location evenly distributes across different organizations in a continuous query and locations in each query are distributed across different organizations. In addition, we also replace the real user’s location with a historical location that is in the same organization as the real user’s location.

2.2. Scenario

Query request scenarios include the single query and the continuous query. In the single query, a user usually sends a single query request for a period of time. Many methods such as pseudonyms, perturbations, dummies, etc., are used to protect the identity, location, and query content of the user. In [14], authors first introduced k-anonymity to protect location privacy. Based on [14], the authors in [15,16] pointed out the equilibrium between the personalized privacy protection and quality of service. Methods [12,21,28] raised the anonymous group that is composed of the user’s real location and k 1 dummies selected from historical queries. Inspired by [12,15,16,21,28], we consider the equilibrium between privacy protection level and system overhead and select dummy locations from historical queries.
In the continuous query, a user continuously sends multiple query requests for a period of time. Most methods complete privacy protection by randomly adding dummies, using historical data to generate dummies, etc. In [19], the authors first proposed a technique of randomly generating dummies to complete anonymity, but the generated dummies have weak authenticity and can be easily identified. The methods in [20,21,22,23,24,25] study how to generate more realistic dummy trajectories. In [20], the authors presented trajectory k-anonymity, using the users’ historical trajectory data to generate k 1 dummy trajectories. Methods in [16,17,29,30,31] consider separately a user’s motion patterns, physical constraints in the real world, and spatiotemporal correlation awareness to generate dummy trajectories with higher realism.
A basic assumption commonly used in the above two scenarios is that users send query requests in different locations. However, in reality, a user often sends a single or short-time continuous query in the same location. For example, a user at her home would continuously query nearby restaurants within one minute. In this paper, we mainly study the privacy protection of the same user submitting a single or continuous query scenario in the same location.

2.3. System Architecture

In existing works, there are three main system architectures: Centralized architecture, distributed architecture, and independent architecture. In centralized architecture [14,15,16,28], a trusted third party (a server, called TTP) is introduced into the system to complete anonymity. However, once the TTP is untrusted or compromised, a user’s information may completely leak, since the TTP has all her information. Besides, the anonymous server is a bottleneck of system performance.
In the distributed architecture [29,30,31], the system completes anonymity without the TTP; this is called TTP-free. In [29,30], k peer users communicate over a short distance and form a collaborative group to achieve anonymity. However, these users are required to exchange query information and to be honest and trustworthy. In addition, sometimes it may take a long waiting time for k 1 users joining for completing anonymity. In [31], authors introduced an anonymous group re-clustering method to solve the problems of a long waiting time completing anonymity and anonymous failure caused by insufficient users and to improve the success rate of clustering.
To overcome the drawbacks of the TTP and the TTP-free methods. The methods in [17,18] presented the independent architecture. In [17], the authors put forward DLS and enhanced DLS to complete anonymity. This method adopted independent architecture and can effectively solve the shortcomings of the above two system architectures. In [18], the authors presented TTcloak that uses dummy techniques to achieve k-anonymity of location and l-diversity of query content.

2.4. Attack Model

Existing work focuses on how to deal with attacks in different scenarios. In [26], the authors proposed reciprocity of k-clusters to achieve strong k-anonymity. In this method, k users remain unchanged in several queries in a short time so that an attacker cannot identify users by intersecting several k-clusters of different queries. In [27], the authors studied the scenario in which different users sent query requests in the same location and proposed L2P2. The L2P2 adopted a distributed architecture and introduced location labels to classify locations of mobile users to sensitive and ordinary locations. When users’ locations in the anonymous group were the same and were all located in sensitive locations, they exchanged ID with each other to achieve the anonymous effect. The method in [13] pointed out that, for long-term cumulative sporadic queries and continuous queries, attackers could deduce users’ privacy by collecting and analyzing users’ historical data, called LSA and RSA attacks. The method in [20] proposed MNAME and SNAME to resist LSA attacks and raised REGP to resist the RSA attack. In these methods, refs. [13,20,26] mainly focused on scenarios where a user sent a query in different locations, and [27] mainly focused on scenarios where the user sent queries in the same locations without considering short-time continuous query and difficult to resist LSA and RSA attacks. However, for continuous queries in the same location in a long time, the users in the anonymous group are not always identical, causing the real user to be able to be inferred by comparing multiple queries. Hence, in this paper, we mainly study the attack (I2LC) which a user might suffer and how to resist it in the scenario of this paper.
There are also several works related to the HSA and meshing methods to divide a city area. The method in [32] pointed out that privacy-related attributes have a hierarchical structure, and studied achieving k-anonymity by clustering in attribute hierarchical structures. The method in [33] pointed out that the level of location privacy relied on the HSA and location privacy protection needed to consider the HSA. For example, for k 1 dummy locations and a real location, which all locate in BUPT, although an attacker cannot distinguish the real location, he knows that the user is in BUPT. In addition, the method in [27] divided an area based on sensitive locations and ordinary locations. The method in [13] divided an area based on the probability of locations. While they do not consider the HSA. In our paper, we consider the HSA and divide an area into different grids with the two-level structure based on the organization.

3. Preliminary

In this section, we first analyze the privacy that an attacker intends to obtain(Adversary Model). Next, we formulate the problem of how user privacy leaks (motivation) and give the basic idea (basic idea) to solve this problem in the scenario of this paper. We also give the criteria (privacy protection model) for measuring whether a user’s privacy has been leaked and the structure needed to protect the user’s privacy (system architecture).

3.1. Adversary Model

In this paper, we assume that adversaries are honest but curious and simply collect all the users’ data that they can access. Their goal is to infer the user’s privacy, including identity, location, organization, and query content, by analyzing these data. In fact, an adversary may be an LBS provider, an LBS user, or a malicious hacker. We mainly consider the LBS provider owning users’ full information as an active adversary. To achieve the goal, he can obtain global information of all users’ current and historical data. He can also use statistical inference methods to infer a specific user’s privacy by combining with side information. Here, side information mainly refers to a city’s region information, e.g., administrative region division, street, organization distribution, and so on. In this paper, we only consider protecting the user’s location, organization, and query content. The reason for not protecting the user’s identity is that the LBSs only providing the query service do not involve the user’s identity. However, for the other LBSs needing to log in to the account, because real name Internet access is used in China, once the user logs in to the account, the service provider will know the user’s identity. In both cases, protecting the user’s identity is of little significance.

3.2. Motivation and Basic Idea

To protect users’ privacy in the scenario of this paper, an effective approach is to add dummies. Considering the scenario of this paper, we assume a user sends m queries in a continuous query. Using dummy techniques, there will be m × ( k 1 ) selected dummy locations (some dummy locations are the same) and m real locations. Assume there are c kinds of locations among the m × k locations and n i is the number of ith kinds of location. Then there are the following relationships. From the perspective of adversaries, the probability p i that the ith kind of location is estimated as the real location is as follows:
p i = n i m × k
In particular, for the real location r (assume r is the jth kind of location), the probability p j that it is estimated as the real location is as Equation (2).
p i = m m × k
Considering the I2LC, we assume each dummy location appears once in m queries since only the real location r appears in each query. That is, for each kind of dummy location, n i . Then we get:
lim m + n i m × k = 0 ( i j ) lim m + n i m × k = 1 k ( i = j )
That is, when the user sends an unlimited number of queries, the probability that the real location is estimated as the real location is far larger than the dummy location. This means that the real location can be easily distinguished.
Hence, if a user uses dummy techniques for privacy protection, the attacker can infer her real identity by launching the I2LC. As shown in Figure 1, in the scenario of this paper, the attacker could infer that a user who appears most frequently may be the real user by intersecting users in different anonymous groups.
In addition, a basic assumption commonly used in existing methods is that all locations are geographic coordinates. For example, we assume user Alice locates in the location L i = ( t i , l o i , l a i ) , where t i , l o i and l a i are the time, longitude, and latitude, respectively. That is, the attacker cannot obtain Alice’s location privacy if he cannot infer that Alice locates in L i . However, existing methods ignore the HSA. In this paper, we divide the HSA into six levels based on common administrative division and use L e i , i = 0 , 1 , , 5 to denote its level, where L e 4 corresponds to the organization name and L e 5 represents a location, as shown in Figure 2. If we do not consider the HSA, we may select k 1 dummies that locate in the same organization as the real location. In particular, in the methods that have the TTP-free method and collaboratively achieve anonymity through P2P network, the k users are close to each other and easily belong to the same organizations, which reveals the privacy of users’ organizations.
Assume that the attacker obtains k locations, which are distributed across s ( k s ) different organizations and are from one of m queries in a continuous query. Then the probability p o that the attacker can distinguish the user’s real organization is as follows:
p o = 1 s
If the k users (k locations) are close enough that they are almost in the same organization, s will be close to 1. Then we get:
lim s 1 1 s = 1
That is to say, if we select k 1 dummies that locate in the same organization as the real location, the attacker can easily distinguish the user’s real organization. In fact, the attacker can easily infer that different anonymous groups locate in the same organization. As shown in Figure 3. In Figure 3, the rectangle represents an area containing four organizations: R 1 , R 2 , R 3 , and R 4 . Different red circles represent different anonymous groups: A 1 , A 2 , and A 3 . In the example, although users are different in A 1 and A 3 , the attacker can easily infer that the users in A 1 and A 3 are in the same organization. It is more serious in densely populated and larger organizations. According to actual need, this paper only considers location privacy protection of L e 4 and L e 5 levels.
The above formulas reveal the mechanism by which the attacker obtains users’ privacy by using the I2LC and the HSA. (1) When the frequency with which each location appears in a continuous query is different, location privacy leakage occurs; (2) users (locations) in an anonymous group are concentrated in the same organization. Hence, the basic idea of our scheme is that each location evenly appears in a continuous query and locations in an anonymous group are distributed across two or more different organizations. In addition, we use the l-diversity to protect the query content. That is, for all queries in a continuous query, the probability of occurrence of each query category is the same in multiple queries. This means that the probability that the user’s query content is recognized is also the same.
To implement our basic idea, we consider two main k-anonymity methods: Cloaking and adding dummy. The cloaking method submits a cloaking area containing k users to the LBS server. (The user’s real location is also in the cloaking area.) However, it is difficult to guarantee the basic idea. The reason is that users do not store the information of users who had participated in anonymity. Therefore, it is difficult to determine whether a user appears too frequently in a continuous query. For example, in the cloaking method, a user is randomly selected to participate in anonymity, resulting in the inability to control the user’s participation. Another reason is that users cannot know the area of all organizations. For example, in cloaking, randomly selecting users to participate in anonymity causes users’ locations to be randomly distributed across different organizations. In this paper, adding dummy is an ideal solution because it is not constrained by the real user’s locations, and the dummy locations can be generated flexibly according to some conditions. While some methods, such as randomly generating dummies, selecting dummies from historical queries or historical data, to a certain extent, can guarantee that users in an anonymous group are distributed across specified organizations, there is no guarantee that all users are not in the same organization.
In this paper, we use the adding dummy to realize the protection of the user’s location, organization information, and query privacy in the scenario of this paper. All dummy locations and query content are selected from historical locations and the basic service set. We first divide a city area where users locate into different grids with the two-level structure and different grids represent different organizations, as shown in Figure 4. Each historical location is located in a grid. Then, we can select k dummy locations from s different grids evenly. That is, we can ensure that the k locations are different from each other in each query and distributed in s different organizations. This protects users’ privacy at the organization level. In particular, the LBS provider can identify the real user by comparing historical locations (the real user’s location is different from the historical locations), since he has all historical locations. Therefore, we consider replacing the real user’s location with the historical locations that are in the same organization as the real user’s location. This ensures that the locations in each query are different. In addition, for the query content, our idea is to ensure that the probability of occurrence of each query category is the same in all queries. This means that the probability that the user’s query content is recognized is also the same.

3.3. Privacy Protection Model

To implement the above basic idea, the adding dummy method in our paper needs to meet two conditions. (1) The LBS server also needs to store a historical location dataset. It can guarantee that each location evenly appears in a continuous query and k dummy locations cannot locate in the same organization by controlling the user’s participation; (2) the LBS server also needs to provide a service category set. This can guarantee that the probability of occurrence of each query category is the same in all queries.
Considering the above conditions, for the LBS system, it needs to provide M different service categories, e.g., restaurant and hospital. Let these service categories be C = { C 1 , C 2 , , C M } ; this is called a basic service set. In addition, the LBS server also needs to store a historical location dataset G, used by users to generate dummy locations. The area that historical locations in G are located in is divided into N different first-level grids. So, we denote G as G = i = 1 N R i , where R i denotes the ith first-level grid. For R i , it is further divided into U i different second-level grids (organizations). Therefore, we denote R i as R i = j = 1 U i U i j = { L | L U i 1 , L U i 2 , , o r L U i U i } , where U i j denotes the jth second-level grid in R i and L is a location in U i j . To protect users’ privacy, when a user sends the ith query in location L 0 to the LBS server, the user needs to submit a query request R e q ( i ) = ( ( L 1 ( i ) , C 1 ( i ) ) , ( L 2 ( i ) , C 2 ( i ) ) , , ( L k ( i ) , C k ( i ) ) ) , L j ( i ) G , C j ( i ) C , i = 1 , 2 , , n , j = 1 , 2 , , k . After receiving the query request, the LBS server calculates and generates a response message R e s ( i ) = ( ( L 1 ( i ) , S 1 ( i ) ) , ( L 2 ( i ) , S 2 ( i ) ) , , ( L k ( i ) , S k ( i ) ) ) . Then, to implement the basic idea, the R e q ( i ) and R e s ( i ) in a continuous query should be bounded. To address this issue, we define the privacy protection model as < L ( k , s ) , Q l > , called the ’degree of privacy protection’, as follows:
Definition 1:
In a continuous query, if the < L ( k , s ) , Q l > can protect a user’s location privacy, organization information, and query privacy, the R e q ( i ) and R e s ( i ) must satisfy the following conditions:
(1) 
p , q , p q , L p ( i ) L q ( i ) , p , q [ 1 , k ] , i = 1 , 2 , , n ;
(2) 
The locations L 1 ( i ) , L 2 ( i ) , , L 1 ( k ) are located in s different second-level grids;
(3) 
The locations in R e q ( i ) contain at least l ( l M ) different service categories. If a user sends n times queries in location L 0 , the number of times each service category selected in C is approximately equal. That is, the probability of each service category selected in C is equal.
The privacy protection model means that, if a user needs to protect her location, organization information, and query content, she must send k different locations to the LBS server in each query request, which these locations locate in s different second-level grids and the k query content contained in these locations contains at least l ( l M ) different service categories. The first two conditions ensure that the k locations R e q ( i ) are different from each other and located in s different second-level grids. It is necessary to select k dummies from s different grids in a query request. Condition (3) guarantees that the k query contents in R e q ( i ) are not exactly the same. It is necessary to protect the query content. So, < L ( k , s ) , Q l > can protect a user’s location privacy, organization information, and query privacy.

3.4. System Architecture

Given the adversary model, the basic idea, and the privacy protection model, the problem is how to select a system architecture to select dummy locations and query content. Considering the drawbacks of the TTP and the TTP-free method, in this paper, we adopt a dummy technique that uses historical locations to generate dummy locations to protect the user’s privacy, including the organization information and query content. As shown in Figure 5, the system architecture of the dummy technique is a typical independent architecture that avoids the disadvantages of centralized and distributed architectures.
In the LBS system, there are three participants: The user, the mobile terminal, and the LBS server. Additionally, the privacy protection module of the mobile terminal is the core module of the entire system and consists of an A r e a generation module, a dummy location generation module, and a dummy content generation module. The LBS server stores the historical location dataset G, provides the historical location sub-dataset G s u b for the user to generate dummy locations, and calculates query results R e s for the user. The whole process is described as follows:
  • System initialization: This mainly refers to obtaining the user’s current location L 0 and initializing the historical location dataset G on the server.
  • Initiating a service request: In this stage, the user needs to input initial < k , s , l > parameters as the input of the privacy protection module.
  • A r e a generation: A r e a is a range of latitude and longitude sent to the LBS server by the user to indicate the latitude and longitude range of the historical location dataset that the user downloads from the LBS server. A r e a is determined by the user’s current location L 0 and L ( k , s ) . After the user inputs < k , s , l > , the A r e a generation module gives A r e a according to L 0 and s, and sends A r e a to the LBS server for downloading G s u b .
  • Historical location data acquisition: The LBS server sends G s u b which meets the requirements of A r e a to the user.
  • Query request construction: After generating dummy locations and dummy query content, the privacy protection module constructs the query request message R e q in the light of a specific format.
  • Service response: According to the dummy locations and dummy query content in R e q , the server calculates query results and constructs and feeds back response message R e s to the user.

4. Dummy Generation Scheme

In this section, we describe our dummy generation scheme in two parts: Algorithm framework and several key algorithms.

4.1. Algorithm Framework

In our scheme, generating dummy locations and query content requires consideration of two conditions: (1) The LBS provider owning users’ full information as an active adversary; (2) in our independent architecture, we select dummy locations and query content from historical location datasets and basic service sets which store on the mobile terminal and the mobile terminal has less storage. In the independent architecture, they mean that the client cannot submit the exact locations to the server and only stores fewer data. The process generating dummy locations and query content is as shown in Figure 6. The main steps are as follows:
  • Step 1: The user submits the privacy protection parameters < k , s , l > and her current location L 0 to the server. To ensure that the LBS provider cannot identify L 0 , we use an algorithm, called the A r e a generation algorithm, to generate A r e a containing L 0 and submit A r e a , not the exact location L 0 .
  • Step 2: After receiving A r e a , the server generates G s u b (a sub-dataset of G) and sends it to the mobile terminal. According to the A r e a , the server uses an algorithm, called the historical location sub-dataset generation algorithm, to generate G s u b . The G s u b is a small dataset and can be stored on the mobile terminal.
  • Step 3: The mobile terminal constructs a query request R e q and submits it to the server. In this step, the mobile terminal uses the dummy location generation algorithm to select dummy locations and the dummy query content generation algorithm to select dummy query content. The two algorithms can ensure that the constructed R e q meets the privacy protection model.
  • Step 4: The server calculates and sends the query results R e s to the mobile terminal. To ensure that the user knows the query results and the server does not, we use the user query request construction algorithm to calculate the R e s .

4.2. A r e a Generation Algorithm

To ensure that the LBS provider cannot identify L 0 , we need to blur L 0 into an area that consists of different organizations. Here, the area is denoted as A r e a = ( s , l o m i n , l o m a x , l a m i n , l a m a x ) , where s indicates the number of different second-level grids, [ l o m i n , l o m a x ] and [ l a m i n , l a m a x ] , respectively, point out the range of longitude and latitude of G s u b . The A r e a is decided by L ( k , s ) . If we know the user’s current location L 0 = ( t 0 , l o 0 , l a 0 ) , A r e a is also denoted as A r e a = ( s , l o 0 + α , l o 0 + β , l a 0 + α , l a 0 + β ) . In A r e a , α , β , α and β are given by the user. So, we design an algorithm to generate the A r e a . Algorithm 1 shows the formal description of it. First, it judges the number of second-level grids in the A r e a . If so, the server provides G s u b . Otherwise, the server increases the value of the initial parameter α , β , α , and β , and repeats the above judgment until it contains no less than s second-level grids.
Generally speaking, the higher the user’s location privacy protection requirements are and the more grids and historical location data meeting the requirements, the better the privacy protection effect is, and of course the higher the storage and communication overhead are when generating the dummy locations.
Algorithm 1: A r e a generation.
Input: L 0 , s, α , β , α , β
Output: A r e a
1. A r e a ;
2. let l o m i n l o + α , l o m a x l o + β , l a m i n l a + α , l a m a x l a + β ;
3. let A r e a { s , l o m i n , l o m a x , l a m i n , l a m a x } ;
4.Return A r e a

4.3. Historical Location Sub-Dataset Generation Algorithm

In our scheme, the mobile terminal is used to implement the process selecting dummy locations and query content. So, it needs to store the historical locations dataset and basic service set. Considering its lesser storage, the server must send a smaller historical location sub-dataset G s u b to the mobile terminal. Here, we only consider the G s u b , since the basic service set itself is a smaller dataset and can store on the mobile terminal. So, we design an algorithm to generate the G s u b . Algorithm 2 shows the formal description of it. First, Algorithm 2 determines whether the historical locations within the range of A r e a are distributed in s different second-level grids. If so, they meet the requirement and return G s u b , which are the historical locations in the given range. Otherwise, the range of the A r e a is extended by a degree of 10 3 and we get a new A r e a which is expressed as A r e a 1 = ( s , l o m i n 10 3 , l o m a x + 10 3 , l a m i n 10 3 , l a m a x + 10 3 ) . Repeat the above process until the given historical location range can contain different second-level grids. In general, iterating once will meet the requirement.
Algorithm 2: Historical location sub-dataset generation.
Input: L 0 , A r e a , G
Output: G s u b
Applsci 10 00548 i001

4.4. Dummy Location Generation Algorithm

To meet < L ( k , s ) , Q l > , we need to achieve two goals. (1) Each location evenly appears in a continuous query; (2) in each query, k dummy locations do not locate in the same organization. Therefore, we take two measures to ensure that the two goals are achieved. Firstly, the user’s real location is not submitted to the LBS server but is replaced by a historical location in the same organization as the user’s real location. Secondly, we add an identifier I d e for each location in U i j to ensure selected dummy locations are different in each query request R e q . We set I d e to 1 each time a location is selected. When the identifiers of all locations in U i j are set to 1, which represents that all locations in U i j are traversed, all identifiers for each location in U i j are set to 0 to start a new round of selection. While we use a historical location in the same organization to replace the user’s real location to protect the real one from being identified, because of the user’s real location and the dummy location that replaces the real one in the same organization, the loss of service quality is small.
Based on the above goals and measures, we design an algorithm to generate dummy locations. Algorithm 3 shows the formal description of it. First, we determine the second-level grid U i 0 , j s u b which the current location L 0 belongs to and then use U i 0 , j s u b as the starting point to randomly select s 1 second-level grids from near and far. The s second-level grids are expressed as S G = { U i 0 , j s u b , U i 1 , j + 1 s u b , U i 2 , j + 2 s u b , , U i s 1 , j + s 1 s u b } . Then, we randomly select one location from each grid of S G and obtain s locations L 1 , L 2 , , L s which are located in s different second-level grids. We also set each identifier I d e of the selected historical locations to 1. Second. we randomly select the next s locations L s + 1 , L s + 2 , , L 2 s from the historical locations of I d e = 0 in s grids of S G , and set the identifier I d e of selected historical locations to 1. Then, we repeat the above procedure until the remaining k 2 s locations are selected. Finally, k locations L 1 , L 2 , , L k are selected and evenly distributed in s second-level grids. Among them, we use L 1 to replace the user’s real location L 0 to prevent the server from distinguishing L 0 from other dummy locations by comparing with historical location data it owns. For simplicity, we assume that there are enough historical locations in the historical location dataset (in fact, we can also get enough historical locations) to ensure that users can choose different historical locations each time. The specific algorithm is described as follows.
Algorithm 3: Dummy locations generation.
Input: L 0 , s, k, G s u b
Output: anonymous group D L
Applsci 10 00548 i002

4.5. Dummy Query Content Generation Algorithm

The number of locations participating in anonymity is k, so the amount of corresponding anonymous query content is also k. Assuming that the user has queried n times in a row, the number of locations and the amount of query content of participating in the anonymity are both n k . In n times continuous queries, the idea of protecting the user’s query content is that the probability of each service category selected in the basic service set C is equal. In other words, the number of times each service category selected is approximately equal, and the attacker cannot guess the user’s query privacy from the distribution characteristics. We denote C as C = { C 1 , C 2 , , C M } , and use n i to represent the times C i selected. n i should meet the following constraints:
  • i , j , i j , i , j [ 1 , M ] , | n i n j | 0
  • each query contains at least l different service categories.
Then, n i = n k M , l M k .
The specific algorithm is described as follows (Algorithm 4).
Algorithm 4: Query content generation.
Input: k, n, l, C
Output: C
Applsci 10 00548 i003

4.6. User Query Request Construction Algorithm

After generating k dummy locations and k dummy query content that satisfy the privacy protection requirement < L ( k , s ) , Q l > , the user’s query request R e q can be expressed as R e q = ( ( L 1 , C 1 ) , ( L 2 , C 2 ) , , ( L k , C k ) ) , C 1 , C 2 , , C k C .

4.7. Response Message Generation Algorithm

After receiving the user query request R e q = ( ( L 1 , C 1 ) , ( L 2 , C 2 ) , , ( L k , C k ) ) , the server calculates and generates a response message R e s = ( ( L 1 , S 1 ) , ( L 2 , S 2 ) , , ( L k , S k ) ) . Finally, the user receives the response message R e s and obtains the query results.

5. System Analysis

In this section, we analyze the security and feasibility of our scheme. Specifically, following the scheme, we prove the feasibility of the scheme and solve optimal solution problems. We also examine whether our scheme can achieve the desirable security and privacy requirements.

5.1. Existence of Solutions and Optimal Solution Problems

The ideal privacy protection scheme is to maximize the privacy protection level, minimize system overhead, and balance the privacy protection and overhead in specific scenarios. This section will prove the feasibility of the scheme by proofing the existence of the solution of the multi-objective optimization problem and solve the equilibrium problem of privacy protection level and system overhead balance.
In this paper, the security goal is to achieve privacy protection level < L ( k , s ) , Q l > . We use the probability of organization information being identified p A to measure L ( k , s ) and use the probability of query content being identified p S to measure Q l . The smaller p A and p S are, the higher the privacy protection level is. To achieve the expected security goal, we select k dummy locations from G s u b and make k dummy locations be located in s second-level grids (i.e., organizations). G s u b is related to | U j s u b | (the number of locations in U j s u b ) and s. The larger | U j s u b | and s are, the larger the G s u b is, and the larger the corresponding communication overhead and storage overhead are. The equilibrium problem of privacy protection level and system overhead can be described as minimizing p A , p S , and G s u b .
The objective functions of the above problem are denoted as m i n { p A ( k , s ) } , m i n { p S ( k , s ) } , and m i n { c ( s , G s u b ) · c ( s , G s u b ) } signifies the system overhead. Here, we consider the storage overhead and communication overhead related to G s u b for the following two reasons. On the one hand, compared with other methods, the increase in the cost of our method chiefly comes from the increase of G s u b . The cost of other data involved in the service process is much smaller than G s u b . On the other hand, G s u b mostly affects the computational cost of selecting s organizations. Once s organizations are determined, the computational cost of selecting k dummy locations is basically unchanged.
Constraints are described as follows. k s locations for each organization correspond to k s query content, each query content corresponds to a different service category, and k s is no greater than M (in this paper, we set M = 21 (see Section 6), s 2 ; here, let k s M be reasonable; otherwise, it will cause k to be too large, which will greatly increase the system overhead). There is at least one solution that can make the k s query content corresponding to k s locations in each organization different, and the number of times each query category selected after n times queries equal.
We use x i j z to indicate whether the category corresponding to the jth location in the ith organization in the zth time query is C j . If x i j z = 1 , it indicates that C j is selected; otherwise, x i j z = 0 indicates that C j is not selected. The constraints are expressed as follows.
j = 1 M x i j ( z ) = k s η i i = 1 s x i j ( z ) = k M μ j i = 1 s ( k s η i ) = j = 1 M ( k M μ j ) = k z = 1 n j = 1 M x i j ( z ) z = 1 n j = 1 M x i j ( z ) = 0 η i , μ j { 0 , 1 } i , i = 1 , 2 , , s , a n d i i ; j = 1 , 2 , , M ; 2 s k ; z = 1 , 2 , , n
In System of Linear equations (6), k s and k M are rounded up for the rigor of logic; i = 1 s ( k s η i ) = j = 1 M ( k M μ j ) = k ensures the number of dummy locations and dummy query content selected are both k; j = 1 M x i j ( z ) = k s η i indicates the number of locations belonging to the ith organization in the zth times query; i = 1 s x i j ( z ) = k M μ j represents the number of times that the service category C j is selected in the zth times query. z = 1 n j = 1 M x i j ( z ) z = 1 n j = 1 M x i j ( z ) = 0 denotes that the number of times that any of the basic service categories is selected in the times queries are equal, which can be derived from i = 1 s x i j ( z ) = k M μ j .
The objective function having a solution means that given k, when s changes from 2 to k, there is at least one scheme for each value of s such that it satisfies the constraints. Then k and s can be regarded as a fixed value in a specific solving process, and whether or not the objective function with a solution is transformed, the equations in the constraint conditions have a solution.
Conclusion 1: The System of Linear equations (6) with integer coefficient has multiple integer solutions.
Proof. 
In a certain solving process, the number of variables is s × m , and the coefficient matrix and the augmented matrix of System of Linear equations (6) are, respectively, denoted as B and B ¯ . Let b i = k s η i and b i = k M μ j , then B ¯ is expressed as follows.
Applsci 10 00548 i004
B is composed of the first s × M columns in B ¯ . We can work out the invariant factors of the matrix B and the augmented matrix B ¯ through matrix elementary transformation. The invariant factors of B and B ¯ are both ( 1 , 1 , , 1 ) M + s 1 . In addition, the ranks of B and B ¯ are both M + s 1 . The number of effective equations in the constraint conditions is s + M and s + M 1 s + M s M . According to references [34,35], System of Linear equations (6) has multiple integer solutions. Hence, there must be a scheme to make the objective function reach the local or global optimum. Furthermore, the presented scheme is feasible, and the objective function can reach the local or global optimum. □

5.2. Security Analysis

5.2.1. User’s Privacy Protection Requirements < L ( k , s ) , Q l >

User’s privacy protection requirements in a single query: In a single query, the user submits a query request R e q = ( ( L 1 , C 1 ) , ( L 2 , C 2 ) , , ( L k , C k ) ) to the LBS server. If the user wants the attacker to identify the user’s real location with a probability that is not greater than 1 k , and the user’s organization name is identified with a probability that is no greater than 1 s , the user’s location privacy protection requirement is called L ( k , s ) . If the user wants the attacker to identify the user’s actual query content with a probability that is not greater than 1 l , the user’s query privacy protection requirement is called Q l .
User’s location privacy protection requirements in continuous queries in the same location: The user submits queries continuously in the same location. If the following conditions are met:
(i)
P r { L r e a l = L i 1 | R e q i , i = 1 , 2 , , n } 1 k ;
(ii)
P r { L r e a l s = L i 1 | R e q i , i = 1 , 2 , , n } 1 s ;
(iii)
P r { C r e a l = C i 1 | R e q i , i = 1 , 2 , , n } 1 l
The proposed scheme meets the user’s privacy requirements < L ( k , s ) , Q l > in a continuous query request scenario in the same location.

5.2.2. Security Analysis of the Presented Scheme

Conclusion 2: The presented scheme can achieve privacy protection requirements < L ( k , s ) , Q l > .
Proof. 
In a single query, the presented scheme is clearly able to meet the user’s privacy protection requirements < L ( k , s ) , Q l > . In n times continuous queries, the k dummy locations included in each R e q are selected from s different second-level grids, so the number of organizations that n k dummy locations belong to is s and then the probability that the user’s real organization is recognized is no greater than 1 s . Thus, the presented scheme satisfies L ( k , s ) . Moreover, The number of each basic service category selected is n × k M times, so the probability of each basic service category being recognized is ( n k M ) / n k = 1 M 1 l , which satisfies Q l . □
Conclusion 3: The presented scheme can resist LSA.
Proof. 
In this paper, the core idea for location privacy protection is to make the locations submitted to the LBS server as scattered as possible. The locations in each R e q are different. As the number of query times n increases, the n k locations involved in the n query requests are always different from each other. n k locations disperse in no less than s different second-level grids. The distribution characteristics of the n k locations are consistent with the locations distribution characteristics of the locations in G s u b , and there will be no situation where the user’s locations are concentrated in some specific places. Therefore, the attacker cannot identify where the user often appears. In addition, because the user’s real location is replaced by the other location in the same second-level grid that the real one is located in, the probability that the real location is recognized is 0. As to the query content protection, the idea of homogenization is adopted to ensure the number of times that each basic service category is selected in each query is the same, then the probabilities of each service category being selected are all 1 M . From the above, it can be proved that the presented scheme can resist LSA attacks. □
Conclusion 4: The presented scheme can resist RSA.
Proof. 
It can be seen from the proof of Conclusion 3 that the n k mutually different locations in the n queries are scattered in s different second-level grids, and there is no case where some specific locations are highly concentrated. Each query does not submit the user’s real location, and the query content protection reaches l-diversity. So the attacker cannot obtain the user’s real location and query content through association analysis. In a word, the presented scheme can resist RSA attacks. □

5.3. Performance Analysis

  • Utility: In the same LBS, the accuracy of the query results is determined by whether or not the user’s real location and real query content are submitted to the LBS server. Both REGP and L2P2 submit the user’s real location and real query content, and the query results are unaffected. In this paper, although we do not submit the user’s real location, we replace the real location with the other historical location in the same second-level grid where the real location is located. The loss in utility is acceptable, and is caused by this replacement. The reason is as follows. In the actual application, the result of the neighbor query is basically around the organization where the real location is located, and the historical location that replaces the real location is located in the same organization as the real location. So there is a small loss in utility. In addition, each query submits the real query content and the dummy query content, so the obtained query results contain the query results of the real query content.
  • Communication overhead: The communication overhead the presented method primarily includes four aspects: i) The user submits A r e a to the LBS server. The communication overhead of submitting A r e a is O ( 1 ) . ii) The LBS server sends G s u b to the user. G s u b contains no less than s different second-level grids, and each second-level grid contains approximately n records. The communication overhead of sending G s u b is O ( s × n ) . iii) The same as the existing methods, each query request submitted to the LBS server by the user contains k locations and k query content. This part of the communication overhead is O ( k ) . iv) Assuming that each query returns m POI, the LBS server needs to return m k query results to the user, so the communication overhead of the query results is O ( k a × m b ) . Compared with REGP and L2P2, our method has more communication overhead about A r e a and G s u b , which provides the historical locations for dummy generation.
  • Computational cost: The computational cost mostly includes three aspects: (i) After receiving A r e a , the LBS server generates a historical location sub-dataset G s u b , and the computational cost of this part is O ( l ) ; (ii) the user generates k 1 dummy locations and k 1 dummy query content according to the parameter < k , s , l > , then the computational cost of generating dummies is O ( k + m ) ; (iii) The LBS server calculates and returns m × k query results to the user, and the computational cost about query results is O ( k a × m b ) . In terms of computational cost, our method is approximately equal to REGP and L2P2.
  • Storage overhead: In this paper, the client stores a historical location sub-dataset G s u b to generate dummies, which contains approximately s × n records, so the storage overhead is O ( s × n ) . We mainly consider the storage overhead G s u b on the client side. Besides, the LBS server stores the historical location dataset G, which is about 1.6 GB and is negligible relative to the storage space of the server. Similarly, to defend against RSA, REGP needs to obtain and store the PLs on the map. The storage overhead is O ( r × r ) , whereas L2P2 does not need additional storage overhead.

6. Experiment

L2P2 study privacy protection for users is within the same locations both in single request and continuous request, the scene of which is similar to ours. Besides, REGP aims at resisting the attacks of RSA and LSA; we also try to solve the same problem. So we compare DGS-HSA with L2P2 and REGP from privacy protection effects and system overhead to evaluate the effectiveness of the presented scheme, DGS-HSA. The privacy protection level is measured by the probability P A that the user’s real organization is recognized and the probability P S that the user’s real query content is recognized. Among those, P A reflects after considering the factor of the hierarchical structure of the address that the scheme not only protects the specific location but also protects the organization information corresponding to the location. P S reflects after adopting l-diversity that the scheme can protect the user’s query content. The smaller P A and P S are, the better the privacy protection effect is. The system overhead C ( G s u b ) primarily refers to the storage overhead and communication overhead, which is related to the historical location sub-dataset G s u b .
Below, we first describe the dataset and the experimental setup and then give out the experiment results and analysis after conducting extensive simulations.

6.1. Dataset

We choose the Geolife Trajectories 1.3 dataset of Microsoft Research Asia [36,37,38] as the historical location dataset and use the POI dataset of Amap [39] to provide the query results. Amap is a free map product in China, and also a very comprehensive and informative map application based on location.
REGP divided the map into grids of 1000 × 1000 . We preprocess and mesh the original Geolife Trajectories 1.3 dataset according to the administrative region division. Firstly, we delete the sequence number and time field in the dataset and retain the longitude and latitude of the location. Then we add the district-level administrative region names (called district names for short) and organization names to each location of the dataset by using the open developer interface of Amap. At last, we obtain G which contains records corresponding to the historical locations and is used to generate dummies. Each record of G has four fields of longitude, latitude, district name, and organization name. More specifically, we sort G according to the district names, and the records that have the same district name field are regarded as the same first-level grid R i . Then we sort each first-level grid R i by organization name, and the records that have the same organization name are regarded as the same second-level grid U i j . After twice sorting and meshing, we obtain a grid dataset with a two-level structure. The grid dataset is stored on the LBS server and maintained by the LBS provider, using a quadtree for data indexing.

6.2. Experimental Setup

In the experiment of DGS-HSA, we select the trajectory data within three kilometers of BUPT in the Geolife Trajectories 1.3 dataset as the historical location dataset G. The processed historical location dataset has three first-level grids ( | R s u b | = 3 ) and approximately 1300 second-level grids ( | U s u b | = 1300 ). One hundred users are randomly distributed in different second-level grids. The basic service categories of Amap are used as the basic service set of the experiments, which has about 21 categories ( M = 21 ). Set ( k = 45 ), then ( s = 1 , 2 , , k ). Each user sends 100 query requests at a frequency of once per minute.
In the experiment of L2P2, the second-level grids where the residential quarters and hospitals are located are set as sensitive areas. In order to ensure the privacy protection effect, we set the PID exchange probability ρ = 0 . 5 and l = k 2 .
In the experiment of REGP, we divide all second-level grids into four PLs. The second-level grids where the confidential organizations are located are set as the first-level privacy zone w 1 , such as the military and scientific research departments. The second-level grids where the hospital and residential community are located are set as the second-level privacy zone w 2 . In w 2 , personal privacy is easily leaked. The second-level grids where the parks are located have relatively fewer people and are set as the third-level privacy zone w 3 . The second-level grids that the malls and the schools are located in are densely populated, and are set as the fourth-level privacy zone w 4 . We also set the parameter λ = 4 and w 1 : w 2 : w 3 : w 4 = 8 : 6 : 4 : 1 .

6.3. Experimental Results

6.3.1. The Probability P A that the User’s Real Organization is Recognized

The presented scheme does not submit the user’s real location, and the probability that the real location is recognized is zero. Therefore, this section discusses the relationship between P A and k. The results are plotted in Figure 7. When k increases, the P A of all three methods shows a downward trend. The larger k is, the more the number of different organizations of k locations are located, and the smaller P A is. Among the three methods, DGS-HSA has the best privacy protection effect, and L2P2 is the worst. This is because DGS-HSA considers the hierarchical structure of the address. When we select the dummies, k locations need to be located in s different organizations; however, L2P2 realizes anonymity through an ad-hoc network in which the communication distance is limited, so the distribution of k locations is relatively concentrated. In other words, s is smaller, and in extreme cases (populated dense area) s = 1 ; when REGP selects dummy locations based on the probability of the historical queries and the locations are scattered as much as possible, s can increase to some extent. Compared with L2P2, REGP has a larger s, so it has a better privacy protection effect. However, when the same historical query probability occurs frequently in some areas, there is no guarantee that k locations will not be in fewer organizations (i.e., s is relatively smaller) and the privacy protection effect will be weakened.

6.3.2. The Probability P S that the User’s Real Query Content is Recognized

It can be seen from Figure 8 that for different k, the P S of REGP is larger than that of L2P2 and DGS-HSA, indicating that the privacy protection effect of REGP is the worst. This is because both L2P2 and DGS-HSA use l-diversity to protect the query content, while REGP does not use l-diversity. In addition, DGS-HSA requires k locations to be evenly distributed among s different organizations, and the categories of query content in each organization are as different as possible, so DGS-HSA has a better privacy protection effect than L2P2.

6.3.3. The Equilibrium Problem of Privacy Protection and System Overhead

Next, we discuss the equilibrium problem of privacy protection and system overhead for DGS-HSA. From the attacker’s point of view, when the attacker receives a query request R e q , P A = 1 s and P S = P A / k s = 1 k . Given k, then P S is also determined. Hence, we mainly discuss, when k [ 2 , 42 ] , with the change of k, how the value of s can balance the privacy protection and system overhead. When we calculate system overhead, for simplicity, each second-level grid contains about 3000 historical locations.
After normalization, P A and C ( G s u b ) can be represented in the same coordinate system. As shown in Figure 9, when s = 7 , the two curves have an intersection, i.e., the proposed scheme achieves the balance between privacy protection and system overhead at the intersection. At this moment, the storage overhead and communication overhead generated by each submission of the query are about 1.6 Mb. Moreover, as k changes, the value of s changes as shown in Figure 10, which is the equilibrium point of privacy protection and system overhead. We can see from Figure 10 that when k 7 , the equilibrium is reached at s = 7 ; when k < 7 , the local optimum is reached at s = k . Furthermore, the above results show that the presented scheme is feasible.

7. Conclusions

In this paper, we point out two problems of the existing location privacy protection schemes. The continuous query request scenario for the same user in the same location and the hierarchical structure of the address corresponding to the location are not considered, which makes it difficult for the privacy protection scheme to resist statistical attacks and hard to achieve the ideal privacy protection effect. To protect location privacy and privacy preferences, and improve anti-attack capabilities, we present the privacy protection scheme of DGS-HSA. The scheme divides the historical location dataset into a grid with a two-level structure according to the hierarchical structure of the address. Then we select the dummy locations from the structured historical location dataset to achieve the privacy protection level < L ( k , s ) , Q l > . In addition, we formalize the equilibrium problem of privacy protection and system overhead. By solving the multi-objective optimization problem, we give the recommended values of the system parameters. The effectiveness of the presented scheme is evaluated from both theoretical and experimental aspects. The results show that the presented scheme achieves the expected goals.
Nevertheless, the effect of the presented scheme in this paper depends largely on whether the historical location dataset covers enough areas, especially special areas such as sparsely populated or inaccessible areas, which easily lead to anonymous failure due to an insufficient number of organizations. How to process, update, and distribute the historical location dataset more effectively is the focus of future research efforts.

Author Contributions

Conceptualization, M.L.; Data curation, Y.W., G.Y. and Y.X.; Formal analysis, M.L.; Funding acquisition, Y.X.; Investigation, M.L.; Methodology, M.L. and Y.Y.; Supervision, S.L.; Validation, S.L., H.Z. and Y.Y.; Writing—original draft, M.L.; Writing – review & editing, Y.W., G.Y., Y.C. and F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the National Key R&D Program of China under Grant 2017YFB0802300, in part Major Scientific and Technological Special Project of Guizhou Province under Grant 20183001, in part Foundation of Guizhou Provincial Key Laboratory of Public Big Data under Grant 2018BDKFJJ021, and in part Research Project of Hechi University under Grant XJ2017ZD08 and XJ2016ZD007.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Demertzis, K.; Iliadis, L. A Computational Intelligence System Identifying Cyber-Attacks on Smart Energy Grids. Mod. Discret. Math. Anal. 2018, 131, 97–116. [Google Scholar]
  2. Demertzis, K.; Iliadis, L.; Spartalis, S. A spiking one-class anomaly detection framework for cyber-security on industrial control systems. In Proceedings of the International Conference on Engineering Applications of Neural Networks, Athens, Greece, 25–27 August 2017; pp. 122–134. [Google Scholar]
  3. Luo, S.F.; Seideman, J.D.; Dietrich, S. Fingerprinting Cryptographic Protocols with Key Exchange using an Entropy Measure. In Proceedings of the IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 24 May 2018; pp. 170–179. [Google Scholar]
  4. Duessel, P.; Gehl, C.; Flegel, U.; Dietrich, S.; Meier, M. Detecting zero-day attacks using context-aware anomaly detection at the application-layer. Int. J. Inf. Secur. 2017, 16, 475–490. [Google Scholar] [CrossRef]
  5. IETF. RFC 5870. A Uniform Resource Identifier for Geographic Locations (‘geo’ URI). Available online: https://datatracker.ietf.org/doc/rfc5870/ (accessed on 14 October 2015).
  6. W3C. Platform for Privacy Preferences (P3P) Project. Enabling Smarter Privacy Tools for the Web. Available online: https://www.w3.org/P3P/ (accessed on 2 February 2018).
  7. Ghinita, G.; Kalnis, P.; Khoshgozaran, A.; Shahabi, C.; Tan, K.L. Private queries in location based services: Anonymizers are not necessary. In Proceedings of the 2008 ACM SIGMOD international conference on Management of Data, Vancouver, BC, Canada, 9–12 June 2008; pp. 121–132. [Google Scholar]
  8. Khoshgozaran, A.; Shahabi, C.; Shirani-Mehr, H. Location privacy: Going beyond K-anonymity, cloaking and anonymizers. Knowl. Inf. Syst. 2011, 26, 435–465. [Google Scholar] [CrossRef]
  9. Li, L.; Lu, R.; Huang, C. EPLQ: Efficient privacy-preserving location-based query over outsourced encrypted data. IEEE Internet Things J. 2015, 3, 206–218. [Google Scholar] [CrossRef]
  10. Zhu, H.; Liu, F.; Li, H. Efficient and privacy-preserving polygons spatial query framework for location-based services. IEEE Internet Things J. 2016, 4, 536–545. [Google Scholar] [CrossRef]
  11. Lu, R.; Lin, X.; Shi, Z.; Shao, J. PLAM: A privacy-preserving framework for local-area mobile social networks. In Proceedings of the IEEE INFOCOM 2014-IEEE Conference on Computer Communications, Toronto, ON, Canada, 27 April–2 May 2014. [Google Scholar]
  12. Beresford, A.R.; Stajano, F. Mix zones: User privacy in location-aware services. In Proceedings of the IEEE Annual Conference on Pervasive Computing and Communications Workshops, Budapest, Hungary, 24–28 March 2004; pp. 127–131. [Google Scholar]
  13. Sun, Y.M.; Chen, M.; Hu, L.; Qian, Y.F.; Hassan, M.M. ASA: Against statistical attacks for privacy-aware users in Location Based Service. Future Gener. Comput. Syst. 2017, 70, 48–58. [Google Scholar] [CrossRef]
  14. Gruteser, M.; Grunwald, D. Anonymous usage of location-based services through spatial and temporal cloaking. In Proceedings of the 1st International Conference on Mobile Systems, Applications and Services, San Francisco, CA, USA, 5–8 May 2003; pp. 31–42. [Google Scholar]
  15. Gedik, B.; Liu, L. Location privacy in mobile systems: A personalized anonymization model. In Proceedings of the 25th IEEE International Conference on Distributed Computing Systems (ICDCS’05), Columbus, OH, USA, 6–10 June 2005; pp. 620–629. [Google Scholar]
  16. Gedik, B.; Liu, L. Protecting location privacy with personalized k-anonymity: Architecture and algorithms. IEEE Trans. Mob. Comput. 2007, 7, 1–18. [Google Scholar] [CrossRef]
  17. Niu, B.; Li, Q.; Zhu, X.Y.; Cao, G.; Li, H. Achieving k-anonymity in privacy-aware location-based services. In Proceedings of the IEEE INFOCOM 2014-IEEE Conference on Computer Communications, Toronto, ON, Canada, 27 April–2 May 2014; pp. 754–762. [Google Scholar]
  18. Niu, B.; Zhu, X.; Li, W.H. A personalized two-tier cloaking scheme for privacy-aware location-based services. In Proceedings of the 2015 International Conference on Computing, Networking and Communications (ICNC), Garden Grove, CA, USA, 16–19 February 2015; pp. 94–98. [Google Scholar]
  19. Kido, H.; Yanagisawa, Y.; Satoh, T. An anonymous communication technique using dummies for location-based services. In Proceedings of the ICPS’05. Proceedings. International Conference on Pervasive Services, Santorini, Greece, 11–14 July 2005; pp. 88–97. [Google Scholar]
  20. Xu, T.; Cai, Y. Exploring historical location data for anonymity preservation in location-based services. In Proceedings of the IEEE INFOCOM 2008-The 27th Conference on Computer Communications, Phoenix, AZ, USA, 13–18 April 2008; pp. 547–555. [Google Scholar]
  21. Suzuki, A.; Iwata, M.; Arase, Y.; Hare, T.; Xie, X.; Nishio, S. A user location anonymization method for location based services in a real environment. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 3–5 November 2010; pp. 398–401. [Google Scholar]
  22. Kato, R.; Iwata, M.; Hara, T.; Suzuki, A. A dummy-based anonymization method based on user trajectory with pauses. In Proceedings of the 20th International Conference on Advances in Geographic Information Systems, Redondo Beach, CA, USA, 7–9 November 2012; pp. 249–258. [Google Scholar]
  23. Hara, T.; Suzuki, A.; Iwata, M. Dummy-based user location anonymization under real-world constraints. IEEE Access 2016, 4, 673–687. [Google Scholar] [CrossRef]
  24. Liu, H.; Li, X.; Li, H. Spatiotemporal correlation-aware dummy-based privacy protection scheme for location-based services. In Proceedings of the IEEE INFOCOM 2017-IEEE Conference on Computer Communications, Atlanta, GA, USA, 1–4 May 2017. [Google Scholar]
  25. Hayashida, S.; Amagata, D.; Hara, T.; Xie, X. Dummy generation based on user-movement estimation for location privacy protection. IEEE Access 2018, 6, 22958–22969. [Google Scholar] [CrossRef]
  26. Zhang, C.; Huang, Y. Cloaking locations for anonymous location based services: A hybrid approach. GeoInformatica 2009, 13, 159–182. [Google Scholar] [CrossRef] [Green Version]
  27. Sun, G.; Liao, D.; Li, H.; Yu, H.F.; Chang, V. L2P2: A location-label based approach for privacy preserving in LBS. Future Gener. Comput. Syst. 2017, 74, 375–384. [Google Scholar] [CrossRef] [Green Version]
  28. Beresford, A.R.; Stajano, F. Location privacy in pervasive computing. IEEE Pervasive Comput. 2003, 1, 46–55. [Google Scholar] [CrossRef] [Green Version]
  29. Chow, C.Y.; Mokbel, M.F.; Liu, X. A peer-to-peer spatial cloaking algorithm for anonymous location-based service. In Proceedings of the 14th Annual ACM International Symposium on Advances in Geographic Information Systems, Arlington, VA, USA, 10–11 November 2006; pp. 171–178. [Google Scholar]
  30. Chow, C.Y.; Mokbel, M.F.; Liu, X. Spatial cloaking for anonymous location-based services in mobile peer-to-peer environments. GeoInformatica 2011, 15, 351–380. [Google Scholar] [CrossRef] [Green Version]
  31. Yang, G.; Luo, S.; Zhu, H. An Efficient Approach for LBS Privacy Preservation in Mobile Social Networks. Appl. Sci. 2019, 9, 316. [Google Scholar] [CrossRef] [Green Version]
  32. Li, J.; Wong, R.C.W.; Fu, A.W.C. Achieving k-anonymity by clustering in attribute hierarchical structures. In International Conference on Data Warehousing and Knowledge Discovery; Springer: Berlin/Heidelberg, Germany, 2006; pp. 405–416. [Google Scholar]
  33. Andrés, M.E.; Bordenabe, N.E.; Chatzikokolakis, K.; Palamidessi, C. Geo-indistinguishability: Differential privacy for location-based systems. arXiv 2017, 1984, 901–914. [Google Scholar]
  34. Gomory, R.E. Outline of an algorithm for integer solutions to linear programs. Bull. Am. Math. Soc. 1958, 64, 275–278. [Google Scholar] [CrossRef] [Green Version]
  35. Mangasarian, O.L.; Recht, B. Probability of unique integer solution to a system of linear equations. Eur. J. Oper. Res. 2011, 214, 27–30. [Google Scholar] [CrossRef] [Green Version]
  36. Zheng, Y. GeoLife GPS Trajectories 1.3. Available online: https://www.microsoft.com/enus/download/confirmation.aspx?id=52367 (accessed on 13 November 2018).
  37. Zheng, Y.; Zhang, L.; Xie, X.; Ma, W.Y. Mining interesting locations and travel sequences from GPS trajectories. In Proceedings of the 18th International Conference on World Wide Web, Madrid, Spain, 20–24 April 2009; pp. 791–800. [Google Scholar]
  38. Zheng, Y.; Xie, X.; Ma, W.Y. Geolife: A collaborative social networking service among user, location and trajectory. IEEE Data Eng. Bull 2010, 33, 32–39. [Google Scholar]
  39. Peking University Opens Research Data. Amap POI Data. Available online: http://opendata.pku.edu.cn/dataset.xhtml?persistentId=doi:10.18170/DVN/WSXCNM (accessed on 13 November 2018).
Figure 1. The same user’s continuous query request privacy in the same location.
Figure 1. The same user’s continuous query request privacy in the same location.
Applsci 10 00548 g001
Figure 2. Hierarchical structure of the address.
Figure 2. Hierarchical structure of the address.
Applsci 10 00548 g002
Figure 3. An example of users’ organization information leakage.
Figure 3. An example of users’ organization information leakage.
Applsci 10 00548 g003
Figure 4. Two levels of hierarchical structure grid.
Figure 4. Two levels of hierarchical structure grid.
Applsci 10 00548 g004
Figure 5. System architecture.
Figure 5. System architecture.
Applsci 10 00548 g005
Figure 6. Algorithm framework.
Figure 6. Algorithm framework.
Applsci 10 00548 g006
Figure 7. The probability P A that the user’s real organization is recognized varies with k.
Figure 7. The probability P A that the user’s real organization is recognized varies with k.
Applsci 10 00548 g007
Figure 8. The probability P S that the user’s real query content is recognized varies with k.
Figure 8. The probability P S that the user’s real query content is recognized varies with k.
Applsci 10 00548 g008
Figure 9. The balance of privacy protection level and system overhead.
Figure 9. The balance of privacy protection level and system overhead.
Applsci 10 00548 g009
Figure 10. The value of s that balances the privacy protection and system overhead varies with k.
Figure 10. The value of s that balances the privacy protection and system overhead varies with k.
Applsci 10 00548 g010

Share and Cite

MDPI and ACS Style

Li, M.; Wang, Y.; Yang, G.; Luo, S.; Xin, Y.; Zhu, H.; Yang, Y.; Chen, Y.; Luo, F. DGS-HSA: A Dummy Generation Scheme Adopting Hierarchical Structure of the Address. Appl. Sci. 2020, 10, 548. https://doi.org/10.3390/app10020548

AMA Style

Li M, Wang Y, Yang G, Luo S, Xin Y, Zhu H, Yang Y, Chen Y, Luo F. DGS-HSA: A Dummy Generation Scheme Adopting Hierarchical Structure of the Address. Applied Sciences. 2020; 10(2):548. https://doi.org/10.3390/app10020548

Chicago/Turabian Style

Li, Mingzhen, Yunfeng Wang, Guangcan Yang, Shoushan Luo, Yang Xin, Hongliang Zhu, Yixian Yang, Yuling Chen, and Fugui Luo. 2020. "DGS-HSA: A Dummy Generation Scheme Adopting Hierarchical Structure of the Address" Applied Sciences 10, no. 2: 548. https://doi.org/10.3390/app10020548

APA Style

Li, M., Wang, Y., Yang, G., Luo, S., Xin, Y., Zhu, H., Yang, Y., Chen, Y., & Luo, F. (2020). DGS-HSA: A Dummy Generation Scheme Adopting Hierarchical Structure of the Address. Applied Sciences, 10(2), 548. https://doi.org/10.3390/app10020548

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop