Geomasking to Safeguard Geoprivacy in Geospatial Health Data

Wang, Jue

doi:10.3390/encyclopedia4040103

Open AccessEntry

Geomasking to Safeguard Geoprivacy in Geospatial Health Data

by

Jue Wang

^1,2

¹

Department of Geography, Geomatics and Environment, University of Toronto—Mississauga, 3359 Mississauga Road, Mississauga, ON L5L 1C6, Canada

²

Department of Geography and Planning, University of Toronto, 100 St. George Street, Toronto, ON M5S 3G3, Canada

Encyclopedia 2024, 4(4), 1581-1589; https://doi.org/10.3390/encyclopedia4040103

Submission received: 20 July 2024 / Revised: 17 September 2024 / Accepted: 8 October 2024 / Published: 21 October 2024

(This article belongs to the Section Mathematics & Computer Science)

Download

Browse Figures

Versions Notes

Definition

:

Geomasking is a set of techniques that introduces noise or intentional errors into geospatial data to minimize the risk of identifying exact location information related to individuals while preserving the utility of the data to a controlled extent. It protects the geoprivacy of the data contributor and mitigates potential harm from data breaches while promoting safer data sharing. The development of digital health technologies and the extensive use of individual geospatial data in health studies have raised concerns about geoprivacy. The individual tracking data and health information, if accessed by unauthorized parties, may lead to privacy invasions, criminal activities, and discrimination. These risks underscore the importance of robust protective measures in the collection, management, and sharing of sensitive data. Geomasking techniques have been developed to safeguard geoprivacy in geospatial health data, addressing the risks and challenges associated with data sharing. This entry paper discusses the importance of geoprivacy in geospatial health data and introduces various kinds of geomasking methods and their applications in balancing the protection of individual privacy with the need for data sharing to ensure scientific reproducibility, highlighting the urgent need for more effective geomasking techniques and their applications.

Keywords:

geomasking; geoprivacy; geospatial health data; GIS

1. Geoprivacy in Geospatial Health Data

The advance of geospatial data collection techniques and analytics methods led to the collection of high-resolution individual geospatial data for public health studies. From the detailed datasets of subjects’ residential locations and health information to the GPS tracking and wearable environmental sensors that record their daily movements and real-time environment exposures, geospatial health data contain a vast volume of details. Analyzing those individual-level geospatial data helps in understanding the nuanced relationship between environmental exposure and health outcomes. Further integrating with high-resolution geospatial context data (e.g., fine-scale census data, geographic information data, and remote sensing data) promotes exciting new research findings in health geography studies.

As geospatial data analytics grew in popularity in health studies, so did the risks of privacy breaches through decoding individual-level data or map products. It poses privacy concerns, especially when sensitive health information is involved, such as a person’s medical conditions, prescriptions, or genetic information. If the high-resolution individual geospatial data were overlayed with geographic context information via geospatial intelligence, the dataset contributors’ individual privacy was also at high risk [1,2]. Detailed geospatial information can expose the dataset contributors’ important activity locations, including home, workplace, school, family, or friends’ locations. Further details can also be derived from the GPS tracking data, which could breach the subjects’ daily movement patterns, like where and when the subjects were staying at home or going to work.

The potential exposure of subjects’ location information raises the concern of geoprivacy in geospatial health data. Geoprivacy refers to the confidentiality of dataset contributors’ personal information of activity locations, movement patterns, and any geospatial information that may expose their location privacy. The geoprivacy concern is especially important in geospatial health data, where subjects’ health information can be easily related to their locations and re-identified [3].

Concerns about geoprivacy mean that geospatial health data collected in one study cannot be easily shared. This limits the reusability of the scientific data and results in a significant waste of resources due to repetitive data collection [4]. Furthermore, this impediment to data sharing undermines reproducibility, which is one of the cornerstones of the scientific paradigm [2,5]. In the field of public health and health geography, geospatial health data are regulated by government policies, such as the Health Insurance Portability and Accountability Act [6] in the United States, the General Data Protection Regulation in the European Union [7], and the Personal Information Protection and Electronic Documents Act [8] in Canada. The sharing or publishing of geospatial health data is not allowed if it does not meet the geoprivacy standard [9,10]. In this context, sharing sensitive geospatial health data and promoting reproducibility of research findings in health geography studies are constrained by the efforts to protect geoprivacy. Scholars in the field are looking for solutions to balance the accessibility and the confidentiality of geospatial health data.

2. Geomasking to Safeguard Geoprivacy

The concept of geomasking has emerged to safeguard the geoprivacy of data contributors, which tries to balance the accessibility and usability of geospatial data with the protection of geoprivacy. Geomasking involves purposely manipulating geospatial data by adding controlled levels of noise to reduce spatial accuracy to an acceptable degree, thereby de-identifying individuals and masking personal location information from the geospatial dataset [11]. This added noise can be random or adhere to predefined rules. With this additional layer of protection, the geomasked data may be shared with other scholars or published for general good. Before sharing or publishing geospatial health data, geomasking could help ensure individuals in the dataset remain anonymous, thereby maintaining a designated level of confidentiality [12,13]. However, the geomasking process inevitably decreases data resolution, which can affect the accuracy of analysis results. With reckless experiments and sophisticated calibrations, the added noise can be controlled to a degree to ensure the usability of the geospatial health data. Therefore, the application of geomasking is an art of balance between data accuracy (or usability) and confidentiality [14,15,16].

3. Geomasking Methods

Geomasking methods mask sensitive geographic locations by relocating them with controlled settings. This process obscures personal geospatial information and reduces the risk of re-identification by decreasing the accuracy of location data while still maintaining its usability for research purposes. According to Cassa et al. [17], the aim is to provide assurance to individuals, institutions, and public health authorities by enabling the sharing of anonymized data instead of raw, fully identifiable data. Since the emergence of geomasking, different methods have been proposed, which will be introduced and discussed in the following sections.

3.1. Affine Transformation

Affine transformation adjusts the geographic coordinates of spatial datasets according to predefined transformation formulas. The original set of locations in the dataset was deterministically relocated to a new set of locations by three types of transformation: displacement using translation, change in scale, and rotation [13]. Figure 1 illustrates these three types of affine transformation geomasking methods. These transformations can be applied solely and independently to achieve the masking of original location information, but they can also be used in a combination of various transformation methods to achieve higher levels of confidentiality.

Affine transformation is easy to apply and has a low computation cost. However, they are not widely used because the original coordinates can be easily restored from the geomasked locations if the transformation algorithm is known [2].

3.2. Aggregation

Aggregation methods group geospatial locations into smaller numbers of points, with each point representing multiple incidents, namely the point aggregation, or aggregate information into the geographical area units selected, namely the area aggregation [13], or aggregate data points based on the density of their distribution within given geographical area units or grid cells, namely density-based aggregation [16]. Figure 2 illustrates these aggregation geomasking methods. The geographical area units can be administration regions or census areas at different spatial scales, such as census tracts, communities, counties, or states/provinces, or defined grid cells in a specific spatial resolution, such as 1 km by 1 km grid cells. The grouping methods can summarize the number of location points within a region, the point density related to the size of a region, or the kernel density with a defined spatial resolution and search radius.

Aggregation methods effectively protect privacy by obscuring precise locations and are particularly useful for regional or large-scale spatial analysis. However, aggregation reduces the dataset’s granularity and assumes uniformity within the aggregated units, which significantly decreases the precision and usability of the spatial data for spatial pattern discovery.

3.3. Random Perturbation

Random perturbation is a technique that alters the position of each geospatial record in a dataset by introducing a random spatial shift [13]. This shift is computed separately for each record according to functions integrated with random components. There are three major types of random perturbation: naïve random perturbation, random perturbation with distribution functions, and random perturbation with preset potential locations.

Naïve random perturbation is a basic geomasking method that applies random noise within a spatial range to the original geographic coordinates, resulting in their relocation within a designated area. According to the different ways the spatial range is defined, this method can be further classified into random perturbation within a circle [13], random movement in a fixed radius direction [18], and random perturbation within an annulus or donut-shaped area [19,20] centered on the original coordinates. Figure 3 illustrates these Naïve random perturbation geomasking methods.

Random perturbation with distribution functions involves determining random spatial shifts based on specific probability distributions rather than using the uniform probability approach of naïve random perturbation. Examples of this method include Gaussian displacement and bimodal Gaussian displacement [18]. Figure 4 illustrates geomasking methods involving random perturbation with distribution functions.

Random perturbation with preset potential locations, also known as location swapping, takes into account the context information of the study area [21]. Unlike the previously mentioned perturbation methods, which might relocate original records to unreasonable locations, such as a home location being relocated into a lake, location swapping ensures that records are moved to surrounding potential locations with similar geographic characteristics. In short, location swapping relocates a point from its original location to a random location among the set of potential locations within a defined spatial range (such as a circle or annulus centered on its original location). Figure 5 illustrates geomasking methods involving random perturbation with preset potential locations.

3.4. Synthetic Data Generation

Different from previously discussed techniques that directly manipulate the raw data for geomasking, synthetic data generation creates new simulated datasets that mimic the statistical attributes of the original spatial dataset without revealing the sensitive location information [22]. This method ensures that the synthetic or simulated data align with the spatial patterns and statistical attributes observed in the raw dataset [23]. This technique retains the statistical properties of the original data while providing high privacy protection since it does not contain any real locations. However, it requires sophisticated modeling and careful validation to accurately reflect the original data’s attributes [22] and may produce misleading results for advanced and complicated spatial analysis or fine-scale geographic analysis [24].

This technique can be further classified into fully synthetic, where all the data points are generated artificially and none of the original data points are retained in the synthetic dataset, and partial synthetic, where only a partial of the original data points was replaced by synthetic ones [25]. Fully synthetic datasets are ideal for scenarios requiring a high level of confidentiality, while partial synthetic datasets are better suited for applications where retaining high data utility is more important.

3.5. Differential Privacy

The differential privacy approach protects individual privacy by adding carefully calibrated noise or transforming geospatial health data, maintaining the utility of the data for accurate statistical analysis [26]. The noise is usually calculated based on a distribution function, such as the Laplace distribution. In geomasking, the planar Laplace distribution may be used, which is suited for geographic coordinates. Employing mathematical methods, the level of privacy is measured and controlled by the parameter epsilon (ε), which controls the trade-off between data utility and privacy. The larger the epsilon value, the better the privacy with more added noise, while the smaller the epsilon, the more accurate the data are, but the lower the level of privacy. This technique preserves the overall statistical properties while preventing the reidentification of sensitive locations or individuals. Differential privacy can be applied on two levels: global and local. At the global level, noise is added to the results of statistical analysis of all individual data, while at the local level, the noise is added to each individual data before statistical analysis [27].

The method offers a framework to quantify privacy levels and minimizes the impact of including or excluding any single data point on the outcomes of statistical analysis [28]. Differential privacy is widely used across various fields for privacy-preserving data analysis [29], including geospatial analysis of health data [30]. The differential privacy has been applied to the geospatial data from the 2020 census by the U.S. Census Bureau before releasing the data, allowing public access for research policymaking while protecting sensitive individuals or household data [31]. The technique was also used during the COVID-19 pandemic, where the contact tracing apps or sharing of disease spread spatial data for analysis of disease transmission patterns while preventing re-identification of individuals [32].

Differential privacy provides a reliable approach to protecting sensitive geospatial health data with configurable privacy levels by controlling the epsilon value. The mathematical approach guaranteed the expected level of privacy while minimizing the noise added. It also provides protection against auxiliary information (such as the spatial context of locations), which other geomasking techniques may not be capable of. While with the benefits, the method still faces challenges in choosing the right epsilon value in balancing the accuracy and utility of the data. Additionally, implementing the method is more complicated than other geomasking methods, which requires expertise in both mathematics and statistics and specific knowledge of the targeted geospatial data [33].

3.6. Other Cryptographic Techniques and Hybrid Approaches

Rather than the above-discussed methods of geomasking, there are unique types of approaches that apply cryptographic techniques using specific algorithms to encrypt spatial information without compromising the accuracy of the original spatial data. The methods allow direct analysis of the encrypted location data without revealing the actual locations. Homomorphic encryption [34,35] and secure multi-party computation [36] are representative examples of the technique. Homomorphic encryption has been applied in geoprivacy protection, which ensures privacy by keeping data encrypted throughout the analysis process while preserving data accuracy [37]. Secure multi-party computation allows multiple parties to collaboratively analyze data without revealing their individual datasets, ensuring privacy in shared environments [38]. However, both methods are computationally demanding and necessitate advanced cryptographic expertise for effective implementation [35,39].

Recently, a decentralized approach, comparable to other methods that are centralized when implementing geomasking, has been developed, which employs cryptographic techniques and peer-to-peer trust models to mask the precise locations in the raw geospatial data [40]. It adds an additional layer of security by distributing the control of privacy to multiple parties to mitigate the risks associated with other centralized data storage, masking, and processing approaches.

Additionally, various geomasking techniques can be combined to increase the level of confidentiality. For example, affine transformation can be applied after random perturbation to achieve a higher level of geoprivacy. Rather than a simple combination, it is also possible to integrate different methods, such as the differentially private synthetic data generation [41,42], which integrates the differential privacy technique with synthetic data generation. This hybrid approach ensures that the synthetic data generated not only mimics the original dataset’s statistical properties but also adheres to differential privacy guarantees [43].

4. Conclusions and Prospects

With the continuing development of advanced data collection techniques and the proliferation of smart devices, high-resolution individual-based geospatial health data are increasingly being utilized in research. This shift has brought significant benefits to public health studies, enabling more precise and personalized analyses. However, it also raises pressing concerns regarding geoprivacy. Protecting the privacy of individuals while promoting data sharing for the common good has become a critical challenge.

The need for effective geomasking techniques is more urgent than ever. These techniques must balance the dual objectives of safeguarding individual privacy and maintaining data utility. Although various geomasking methods have been proposed and developed as above-introduced, selecting the appropriate method and configuring parameters to fit specific geospatial health data to achieve the desired level of geoprivacy while balancing the data utility remains a significant challenge for practitioners. Recently, an exploratory study compared and assessed the effectiveness of different kinds of geomasking methods for geospatial health data [2]. It offers a set of general guidelines, depending on the spatial patterns of the sensitive data and intended spatial analysis, for effective application of geomaksing. However, this assessment is not exhaustive, and further studies are needed to evaluate the effectiveness of geomasking methods more comprehensively. Practical guidance is also necessary for a more effective application of geomasking in geospatial health data.

As geospatial health data become more diverse, ranging from simple home location data to continuous 24 h location tracking, the limitations of current geomasking methods are becoming evident [2]. Traditional methods may not be sufficient to address the privacy risks associated with such detailed and dynamic datasets.

One future direction is the development of more customized geomasking methods tailored to the specific types of geospatial health data. For instance, there is a significant gap in effective methods for masking individual location-tracking datasets despite substantial research efforts [4,38]. Innovative solutions that can adequately protect the privacy of individuals in these highly detailed datasets are urgently needed.

Moreover, integrating different geomasking techniques to create more effective hybrid approaches holds promise. By combining the strengths of various methods, it may be possible to enhance privacy protection while minimizing the impact on data utility [44]. Such hybrid approaches could offer more flexible and robust solutions for diverse types of geospatial data.

With the demand of geomasking for protecting sensitive geospatial health data, tools have been developed to assist researchers and policymakers with processing or sharing the data. Examples are the GeoPriv Plugin [45], which offers integrated environments in QGIS to implement geomasking, and the Geomask [46] on the GitHub repository is an R package that provides some functions for geomasking spatial data. Additionally, Diffprivlib [47] offers a general-purpose Python library for the application of differential privacy.

Regardless of how advanced geomasking techniques become, the core issue will always be balancing data accuracy (or usability) and confidentiality (or privacy protection). As the demand for high-resolution geospatial data continues to grow, researchers and policymakers must continually strive to achieve this balance. Effective geomasking techniques will play a crucial role in ensuring that valuable geospatial health data can be shared and utilized for research and public well-being while not sacrificing individual geoprivacy.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The author would like to thank the editors and anonymous reviewers for their valuable comments on the manuscript. The author acknowledges the use of ChatGPT as a tool for assisting in the editing and refinement of this manuscript. The tools were utilized only to enhance the clarity, coherence, and overall quality of the text, ensuring that the manuscript effectively communicates the research findings.

Conflicts of Interest

The author declares no conflicts of interest.

References

Kim, J.; Kwan, M.P.; Levenstein, M.C.; Richardson, D.B. How Do People Perceive the Disclosure Risk of Maps? Examining the Perceived Disclosure Risk of Maps and Its Implications for Geoprivacy Protection. Cart. Geogr. Inf. Sci. 2021, 48, 2–20. [Google Scholar] [CrossRef]
Wang, J.; Kim, J.; Kwan, M.P. An Exploratory Assessment of the Effectiveness of Geomasking Methods on Privacy Protection and Analytical Accuracy for Individual-Level Geospatial Data. Cart. Geogr. Inf. Sci. 2022, 49, 385–406. [Google Scholar] [CrossRef]
Ribeiro, A.I.; Dias, V.; Ribeiro, S.; Silva, J.P.; Barros, H. Geoprivacy in Neighbourhoods and Health Research: A Mini-Review of the Challenges and Best Practices in Epidemiological Studies. Public. Health Rev. 2022, 43, 1605105. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Kwan, M.P. Daily Activity Locations K-Anonymity for the Evaluation of Disclosure Risk of Individual GPS Datasets. Int. J. Health Geogr. 2020, 19, 7. [Google Scholar] [CrossRef] [PubMed]
McNutt, M. Reproducibility. Science 2014, 343, 229. [Google Scholar] [CrossRef]
U.S. Department of Health and Human Services. Health Insurance Portability and Accountability Act. In Public Law; U.S. Department of Health and Human Services: Washington, DC, USA, 1996; pp. 104–191. [Google Scholar]
European Union. Regulation (EU) 2016/679 of the European Parliament and of the Council. Off. J. Eur. Union. 2016, 679, 1–88. [Google Scholar]
Canada Department of Justice. Personal Information Protection and Electronic Documents Act; Canada Department of Justice: Ottawa, ON, Canada, 2000; pp. 4356–4364.
Delmelle, E.M.; Desjardins, M.R.; Jung, P.; Owusu, C.; Lan, Y.; Hohl, A.; Dony, C. Uncertainty in Geospatial Health: Challenges and Opportunities Ahead. Ann. Epidemiol. 2022, 65, 15–30. [Google Scholar] [CrossRef]
Tellman, N.; Litt, E.R.; Knapp, C.; Eagan, A.; Cheng, J.; Radonovich, L.J. The Effects of the Health Insurance Portability and Accountability Act Privacy Rule on Influenza Research Using Geographical Information Systems. Geospat. Health 2010, 5, 3–9. [Google Scholar] [CrossRef]
Seidl, D.E. Geoprivacy: Location Masking Strategies and Personal Identification Risk; San Diego State University: San Diego, CA, USA, 2018. [Google Scholar]
Allshouse, W.B.; Fitch, M.K.; Hampton, K.H.; Gesink, D.C.; Doherty, I.A.; Leone, P.A.; Serre, M.L.; Miller, W.C. Geomasking Sensitive Health Data and Privacy Protection: An Evaluation Using an E911 Database. Geocarto Int. 2010, 25, 443–452. [Google Scholar] [CrossRef]
Armstrong, M.P.; Rushton, G.; Zimmerman, D.L. Geographically Masking Health Data to Preserve Confidentiality. Stat. Med. 1999, 18, 497–525. [Google Scholar] [CrossRef]
Carr, J.; Vallor, S.; Freundschuh, S.; Gannon, W.L.; Zandbergen, P. Hitting the Moving Target: Challenges of Creating a Dynamic Curriculum Addressing the Ethical Dimensions of Geospatial Data. J. Geogr. High. Educ. 2014, 38, 444–454. [Google Scholar] [CrossRef]
Kwan, M.-P.; Casas, I.; Schmitz, B. Protection of Geoprivacy and Accuracy of Spatial Information: How Effective Are Geographical Masks? Cartogr. Int. J. Geogr. Inf. Geovisualization 2004, 39, 15–28. [Google Scholar] [CrossRef]
Nissenbaum, H. Privacy in Context: Technology, Policy, and the Integrity of Social Life. In Privacy in Context; Stanford University Press: Stanford, CA, USA, 2009; ISBN 0804772894. [Google Scholar]
Cassa, C.A.; Grannis, S.J.; Overhage, J.M.; Mandl, K.D. A Context-Sensitive Approach to Anonymizing Spatial Surveillance Data: Impact on Outbreak Detection. J. Am. Med. Inform. Assoc. 2006, 13, 160–165. [Google Scholar] [CrossRef] [PubMed]
Zandbergen, P.A. Ensuring Confidentiality of Geocoded Health Data: Assessing Geographic Masking Strategies for Individual-level Data. Adv. Med. 2014, 2014, 567049. [Google Scholar] [CrossRef] [PubMed]
Hampton, K.H.; Fitch, M.K.; Allshouse, W.B.; Doherty, I.A.; Gesink, D.C.; Leone, P.A.; Serre, M.L.; Miller, W.C. Mapping Health Data: Improved Privacy Protection with Donut Method Geomasking. Am. J. Epidemiol. 2010, 172, 1062–1069. [Google Scholar] [CrossRef]
Stinchcomb, D. Procedures for Geomasking to Protect Patient Confidentiality. In Proceedings of the ESRI International Health GIS Conference, Washington, DC, USA, 17–20 October 2004; pp. 1–17. [Google Scholar]
Zhang, S.; Freundschuh, S.M.; Lenzer, K.; Zandbergen, P.A. The Location Swapping Method for Geomasking. Cart. Geogr. Inf. Sci. 2017, 44, 22–34. [Google Scholar] [CrossRef]
Rubin, D.B. Statistical Disclosure Limitation. J. Off. Stat. 1993, 9, 461–468. [Google Scholar]
Beckman, R.J.; Baggerly, K.A.; McKay, M.D. Creating Synthetic Baseline Populations. Transp. Res. Part A Policy Pract. 1996, 30, 415–429. [Google Scholar] [CrossRef]
Sakshaug, J.W.; Raghunathan, T.E. Synthetic Data for Small Area Estimation. In Privacy in Statistical Databases; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6344, pp. 162–173. [Google Scholar]
Drechsler, J. Synthetic Datasets for Statistical Disclosure Control: Theory and Implementation; Springer Science & Business Media: New York, NY, USA; Heidelberg, Germany; Dordrecht, The Netherlands; London, UK, 2011. [Google Scholar]
Dwork, C. Differential Privacy. In International Colloquium on Automata, Languages, and Programming; Springer: Berlin, Heidelberg, 2006; Volume 4052, pp. 1–12. [Google Scholar]
Duchi, J.C.; Jordan, M.I.; Wainwright, M.J. Local Privacy and Statistical Minimax Rates. In Proceedings of the 2013 IEEE 54th Annual Symposium on Foundations of Computer Science, Berkeley, CA, USA, 26–29 October 2013. [Google Scholar]
Mironov, I. Rényi Differential Privacy. In Proceedings of the 2017 IEEE 30th Computer Security Foundations Symposium (CSF), Santa Barbara, CA, USA, 21–25 August 2017; pp. 263–275. [Google Scholar] [CrossRef]
Xiao, Y.; Xiong, L. Protecting Locations with Differential Privacy under Temporal Correlations. Proc. ACM Conf. Comput. Commun. Secur. 2015, 2015, 1298–1309. [Google Scholar] [CrossRef]
Harris, D.R. Leveraging Differential Privacy in Geospatial Analyses of Standardized Healthcare Data. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 3119–3122. [Google Scholar] [CrossRef]
Abowd, J.M. The US Census Bureau Adopts Differential Privacy. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, Cancun, Mexico, 7–12 April 2008; Cornell University ILR School: Geneva, Switzerland, 2008; p. 2867. [Google Scholar] [CrossRef]
Troncoso, C.; Payer, M.; Hubaux, J.-P.; Salathé, M.; Larus, J.; Bugnion, E.; Lueks, W.; Stadler, T.; Pyrgelis, A.; Antonioli, D.; et al. Decentralized Privacy-Preserving Proximity Tracing. Commun. ACM 2022, 65, 48–57. [Google Scholar] [CrossRef]
Yan, Y.; Sun, Z.; Mahmood, A.; Xu, F.; Dong, Z.; Sheng, Q.Z. Achieving Differential Privacy Publishing of Location-Based Statistical Data Using Grid Clustering. ISPRS Int. J. Geo-Inf. 2022, 11, 404. [Google Scholar] [CrossRef]
Rivest, R.L.; Adleman, L.; Dertouzos, M.L. On Data Banks and Privacy Homomorphisms. Found. Secur. Comput. 1978, 4, 169–180. [Google Scholar]
Gentry, C. Fully Homomorphic Encryption Using Ideal Lattices. In Proceedings of the Annual ACM Symposium on Theory of Computin, Bethesda, MD, USA, 31 May–2 June 2009; pp. 169–178. [Google Scholar] [CrossRef]
Goldreich, O.; Micali, S.; Wigderson, A. How to Play Any Mental Game, or a Completeness Theorem for Protocols with Honest Majority. In Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali; Association for Computing Machinery: New York, NY, USA, 2019. [Google Scholar] [CrossRef]
Zhu, X.; Ayday, E.; Vitenberg, R. A Privacy-Preserving Framework for Outsourcing Location-Based Services to the Cloud. IEEE Trans. Dependable Secur. Comput. 2021, 18, 384–399. [Google Scholar] [CrossRef]
Ren, Y.; Li, X.; Miao, Y.; Luo, B.; Weng, J.; Choo, K.K.R.; Deng, R.H. Towards Privacy-Preserving Spatial Distribution Crowdsensing: A Game Theoretic Approach. IEEE Trans. Inf. Forensics Secur. 2022, 17, 804–818. [Google Scholar] [CrossRef]
Goldreich, O. Foundations of Cryptography: Volume 2, Basic Applications; Cambridge University Press: Cambridge, UK, 2001; Volume 2, ISBN 0521830842. [Google Scholar]
Hojati, M.; Farmer, C.; Feick, R.; Robertson, C. Decentralized Geoprivacy: Leveraging Social Trust on the Distributed Web. Int. J. Geogr. Inf. Sci. 2021, 35, 2540–2566. [Google Scholar] [CrossRef]
Rosenblatt, L.; Liu, X.; Pouyanfar, S.; de Leon, E.; Desai, A.; Allen, J.; Development, M.A.; Program, A. Differentially Private Synthetic Data: Applied Evaluations and Enhancements. arXiv 2020. arXiv:2011.05537. [Google Scholar]
Bowen, C.M.; Liu, F. Differentially Private Data Synthesis Methods. arXiv 2016, arXiv:1602.01063. [Google Scholar]
Bowen, C.M.; Snoke, J. Comparative Study of Differentially Private Synthetic Data Algorithms from the NIST PSCR Differential Privacy Synthetic Data Challenge. arXiv 2019, arXiv:1911.12704. [Google Scholar] [CrossRef]
Raghunathan, T.E.; Reiter, J.P.; Rubin, D.B. Multiple Imputation for Statistical Disclosure Limitation. J. Off. Stat. 2003, 19, 1. [Google Scholar]
GeoPriv. Available online: https://diuke.github.io/GeoPrivPlugin/ (accessed on 15 September 2024).
GitHub—Claudiofronterre/Geomask: Geomask. Available online: https://github.com/claudiofronterre/geomask?tab=readme-ov-file (accessed on 15 September 2024).
GitHub—IBM/Differential-Privacy-Library: Diffprivlib: The IBM Differential Privacy Library. Available online: https://github.com/IBM/differential-privacy-library (accessed on 15 September 2024).

Figure 1. Affine transformation geomasking methods (Orange dots: original locations, green dots: geomasked locations).

Figure 2. Aggregation geomasking methods (Hollow circles represent original locations; numbers within each region indicate the aggregated value; the shading of solid circles and regions represents the associated weight after aggregation).

Figure 3. Naïve random perturbation geomasking methods (Orange dots: original locations, green dots: geomasked locations; dotted lines: the radius of the designated geomasking area; solid lines: the relocation of records from their original locations to the geomasked locations).

Figure 4. Geomasking methods involving random perturbation with distribution functions (Orange dots: original locations, green dots: geomasked locations; solid lines: the relocation of records from their original locations to the geomasked locations).

Figure 5. Geomasking methods involving random perturbation with preset potential locations (Dotted lines: the radius of the designated geomasking area; solid lines: the relocation of records from their original locations to the geomasked locations).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, J. Geomasking to Safeguard Geoprivacy in Geospatial Health Data. Encyclopedia 2024, 4, 1581-1589. https://doi.org/10.3390/encyclopedia4040103

AMA Style

Wang J. Geomasking to Safeguard Geoprivacy in Geospatial Health Data. Encyclopedia. 2024; 4(4):1581-1589. https://doi.org/10.3390/encyclopedia4040103

Chicago/Turabian Style

Wang, Jue. 2024. "Geomasking to Safeguard Geoprivacy in Geospatial Health Data" Encyclopedia 4, no. 4: 1581-1589. https://doi.org/10.3390/encyclopedia4040103

APA Style

Wang, J. (2024). Geomasking to Safeguard Geoprivacy in Geospatial Health Data. Encyclopedia, 4(4), 1581-1589. https://doi.org/10.3390/encyclopedia4040103

Article Menu

Geomasking to Safeguard Geoprivacy in Geospatial Health Data

Definition

1. Geoprivacy in Geospatial Health Data

2. Geomasking to Safeguard Geoprivacy

3. Geomasking Methods

3.1. Affine Transformation

3.2. Aggregation

3.3. Random Perturbation

3.4. Synthetic Data Generation

3.5. Differential Privacy

3.6. Other Cryptographic Techniques and Hybrid Approaches

4. Conclusions and Prospects

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI