*3.2. Perturbation Metrics*

Two perturbation metrics are proposed and investigated, as defined in Equations (1) and (2).

$$M\_{\argmin}(\mathbf{u}\_u) = \argmin\_{\mathbf{b} \in \mathbf{B}} ||\mathbf{b} - \mathbf{x}\_u|| + \zeta \tag{1}$$

where || · || is the distance between **b** and **x***u* vectors and *ξ* is a multivariate (3D) noise vector of zero mean (to be explained later in this section). Also,

$$M\_{\text{ar}\text{argmax}}(\mathbf{u}\_{\text{\textquotedblleft}}) = \arg\max\_{\mathbf{b}\in\mathbf{B}} ||\mathbf{b} - \mathbf{x}\_{\text{\textquotedblleft}}|| + \xi\tag{2}$$

While the argmin operator is rather intuitive, stating that the user location is only slightly perturbed by mapping it to the nearest grid point and then adding a random noise to it, the argmax operator may seem less intuitive at a first glance. Indeed, with argmax operator, all users located, for example, at the extreme north-west of the building, will be

mapped, after argmax operator, as being close to the extreme south-east of the building. As we are only focusing here on the proximity-detection type of application relying on the relative distance between users, such as digital contact tracing or find a friend, this mapping does not decrease the service utility, as nearby users (which were, for example, at the extreme north-west of the building) will still appear as nearby users after the mapping to the other side of the building.

**Figure 2.** Example of mapping the whole building space **B** into grid points **b**, Δ*s* = 5 m for a 100 × 200 m<sup>2</sup> building with 4 floors and 4 m floor height .

In order for *Margmin*(**<sup>u</sup>***u*) and *Margmax*(**<sup>u</sup>***u*) metrics to remain inside the building space **B** and to offer plausible perturbed locations, an additional correction is done after the mappings in Equations (1) and (2), in such a way that the points that would fall outside the building edges, are re-mapped to the nearest point inside the building. In addition, if the perturbed *z* coordinate does not match any of the floor heights in the building, then the perturbed *z*-coordinate is mapped to the nearest floor level. Examples will be provided in Section 5.

The *argmin* metric in Equation (1) is mapping the true position to the nearest grid point in the building and it then applies a noise factor to it, while the *argmax* metric in Equation (2) is mapping the true position to the furthest grid point in the building and it then applies a noise factor to it. Clearly, on one hand, Equation (1) mapping preserves a minimum distance between the perturbed location and the true location, enabling various location-based services that require absolute user-location knowledge, but it acts quite poorly in terms of privacy preservation, as an attacker could still identify the approximate location of an user with an accuracy depending on the inverse of the standard deviation 1/ of the added multivariate noise *ξ*. On the other hand, the second proposed metric from Equation (2) is able to protect the user location privacy to a grea<sup>t</sup> extent (as the privacy increases when the distance between the perturbed location and original location increases), with an increased privacy level for larger/wider buildings, and, as we will show in Section 5, without destroying the usefulness of the services, meaning that an accurate contact tracing can be also achieved under a heavy protection of user's location privacy.

Regarding the added noise vector *ξ*, two multivariate noise distributions are considered, namely a Gaussian distribution of equal standard deviation in *x*, *y*, *z* dimensions of 1/, see Equation (3), and a Laplacian distribution of equal scale factor in *x*, *y*, *z* dimensions of 1/, see Equation (4). The zero-mean multivariate (3D) Gaussian noise is:

$$f\_{\text{Gauss}}(\xi) = \frac{1}{(2\pi)^{1.5} |\Sigma|^{0.5}} \exp\left(-0.5 \xi^T \Sigma^{-1} \xi\right) \tag{3}$$

with Σ = *diag*([ 1 1 1 ]) = 1 **I**3 being a diagonal covariance matrix and **I**3 a unit matrix of dimension 3 × 3, and |Σ| = <sup>−</sup><sup>3</sup> being the determinant of Σ.

The zero-mean multivariate (3D) Laplacian noise is:

$$f\_{\text{Laplace}}(\xi) = \frac{2}{(2\pi)^{1.5} |\Sigma|^{0.5}} (0.5 \mathfrak{J}^T \Sigma^{-1} \xi)^{-0.5} K\_v(\sqrt{2\xi^T \Sigma^{-1} \xi}) \tag{4}$$

where *Kv* is the modified Bessel function of second kind.

### *3.3. Private Proximity-Detection Architecture with the Proposed Mechanism*

The wireless communication process between user/edge devices and the proximitydetection service is depicted in Figure 3. Users are assumed to be spread across a multi-floor space of commercial or commuting interest (e.g., shopping mall, commuting hall/airport/ train station, etc.). Users' devices are supposed to be equipped with a localization engine, such as GNSS, WiFi, BLE or a combination of several localization methods. A proximity service provider is operating in the building of interest, with access to the building floor plans and able to send the floor-map coordinates **b** to all users interested in the proximitybased service or application. The coordinates can be provided as Earth Centered Earth Fixed (ECEF) coordinates, as (latitude, longitude, and altitude)-coordinates, or as local coordinates (*<sup>x</sup>*, *y*, *z*) and the mapping between any of these coordinate systems is assumed known both at the user side and at the server side. The user devices performs the location perturbation locally and sends the perturbed location to the server; the server processes in an aggregate form all the data based on the perturbed locations of the users inside the building and offers the proximity-based service to the users.

**Figure 3.** An illustration of the considered scenario: a building (e.g., a shopping mall) with users willing to use the digital contact-tracing and/or 'find-a-friend' applications. The 'Adversary' entity refers to any third party which aims to access the information about devices' whereabouts.

#### **4. Theoretical Analysis of the Proposed Argmax Perturbed Location Mechanism**

For simplicity, in this section we focus on the argmax metric from Equation (2) and we denote via *<sup>M</sup>*(·) = *Margmax*(·), with the observation that similar derivations can be obtained in a straightforward manner for argmin metric. Let denote by *pu* the probability that an adversary finds out **x***u* by listening to **y***u* = *<sup>M</sup>*(**<sup>x</sup>***u*). Then

$$\begin{split} p\_{\boldsymbol{\mu}} = \operatorname{prob} (M(\mathbf{x}\_{\boldsymbol{\mu}}) = \mathbf{x}\_{\boldsymbol{\mu}}) &= \operatorname{prob} (\operatorname{argmax}\_{\mathbf{b} \in \mathbf{B}} ||\mathbf{b} - \mathbf{x}\_{\boldsymbol{\mu}}|| + \boldsymbol{\xi} = \mathbf{x}\_{\boldsymbol{\mu}}) \\ &= \operatorname{prob} (\boldsymbol{\xi} = \mathbf{x}\_{\boldsymbol{\mu}} - \operatorname{argmax}\_{\mathbf{b} \in \mathbf{B}} ||\mathbf{b} - \mathbf{x}\_{\boldsymbol{\mu}}||) \end{split} \tag{5}$$

If we denote via **a***u argmax***b**∈**<sup>B</sup>**||**<sup>b</sup>** − **<sup>x</sup>***u*||, under Gaussian-noise assumption, the above formula is determined by the Gaussian noise probability distribution function (PDF) from Equation (3) and it becomes equal to

$$p\_{\rm \mu} = \frac{\epsilon^3}{(2\pi)^{1.5}} \exp(-0.5\epsilon ||\mathbf{x}\_{\rm \mu} - \mathbf{a}\_{\rm \mu}||^2) \tag{6}$$

Similarly, if *pv* is the probability that an adversary intercepts the perturbed location of user *v*, namely *Margmax*(**<sup>x</sup>***v*) and maps it to the location of user *u*, after straightforward derivations (as above) and following the Gaussian noise assumption, we ge<sup>t</sup>

$$p\_{\overline{\nu}} = \frac{\epsilon^3}{(2\pi)^{1.5}} \exp(-0.5\epsilon ||\mathbf{x}\_{\overline{\nu}} - \mathbf{a}\_{\overline{\nu}}||^2) \tag{7}$$

with **a***v argmax***b**∈**<sup>B</sup>**||**<sup>b</sup>** − **<sup>x</sup>***v*||.

By dividing Equation (6) to Equation (7) and using Cauchy-Schwarz inequality, one gets

$$\frac{p\_u}{p\_v} = \exp\left(0.5\varepsilon \left(||\mathbf{x}\_u - \mathbf{a}\_u||^2 - ||\mathbf{x}\_u - \mathbf{a}\_u||^2\right)\right)$$

$$\leq \varepsilon \exp\left(0.5\varepsilon ||\mathbf{a}\_u - \mathbf{a}\_v||^2\right)$$

$$\leq \varepsilon \exp\left(0.5\varepsilon ||\mathbf{x}\_u - \mathbf{x}\_v||^2\right) \tag{8}$$

Thus, the proposed mechanism *<sup>M</sup>*(·) offers GeoInd type of user location privacy.

### **5. Simulation-Based Results**

### *5.1. Simulation Scenarios and Performance Metrics*

A 4-floor scenario with *Nu* users spread within the building, with most of them within couple of pre-defined hotspot areas was considered. Table 2 shows the main parameters used in the simulation model (additional parameters were investigated in some scenarios and they are specified in the figures' captions when different from those in Table 2). The users are assumed to transmit their perturbed location *<sup>M</sup>*(**<sup>x</sup>***u*) to a server provider offering a proximity-based service with a proximity threshold *γ* (i.e, the service is offered if the users are determined to be at a distance less than *γ*, based on their perturbed location transmitted to the server).

At each Monte Carlo run, another realization of users' random positions within the building is implemented. Two examples of the users distribution in the building during two Monte Carlo runs is shown in Figure 4.

Examples of perturbed locations during one Monte Carlo run with *argmin* metric (left plot) and *argmax* metric (right plot) are shown in Figure 5, for  = 0.1 and Laplacian noise.

A zoomed version of perturbed locations for one floor and with only 4 users is illustrated in Figure 6, this time showing both the scenario with no hotspots (left plot) and with hotspots (right plot). The squares show the perturbed location via *argmin* metric and the circles show the perturbed location via *argmax* metric.

The utility functions are defined as the probability of correctly detecting two users to be in close proximity to each other *Pd*, as well as the complement of the false alarm probability *Pf a*, meaning the probability to detect that two users are in close proximity to each other, when in fact they are not. Mathematically, *Pd* and *Pf a* are defined via

$$P\_d = \frac{|\left\{ (\mathbf{u}, \mathbf{v}) \in \mathcal{N}\_\mathbf{u} \times \mathcal{N}\_\mathbf{u}, \boldsymbol{\mu} \neq \boldsymbol{\nu} \; | \; ||M(\mathbf{x}\_\mathbf{u}) - M(\mathbf{x}\_\mathbf{v})|| \le \gamma \; \text{and} \; ||\mathbf{x}\_\mathbf{u} - \mathbf{x}\_\mathbf{v}|| \le \gamma \right\}|}{|\left\{ (\mathbf{u}, \mathbf{v}) \in \mathcal{N}\_\mathbf{u} \times \mathcal{N}\_\mathbf{u}, \boldsymbol{\mu} \neq \boldsymbol{\nu} \; | \; ||\mathbf{x}\_\mathbf{u} - \mathbf{x}\_\mathbf{v}|| \le \gamma \right\}|}\tag{9}$$

and, respectively,

$$P\_{fa} = \frac{|\left\{(\mathbf{u}, \mathbf{v}) \in \mathcal{N}\_{\mathbf{u}} \times \mathcal{N}\_{\mathbf{u}\prime} \boldsymbol{\mu} \neq \boldsymbol{\nu} \; | \; \left\| M(\mathbf{x}\_{\mathrm{u}}) - M(\mathbf{x}\_{\mathrm{v}}) \right\| \le \gamma \; \text{and} \; \left\| \mathbf{x}\_{\mathrm{u}} - \mathbf{x}\_{\mathrm{v}} \right\| \ge \gamma \right\} \; |}{|\left\{(\mathbf{u}, \mathbf{v}) \in \mathcal{N}\_{\mathbf{u}} \times \mathcal{N}\_{\mathbf{u}\prime} \boldsymbol{\mu} \neq \boldsymbol{\nu} \; | \; \left\| \mathbf{x}\_{\mathrm{u}} - \mathbf{x}\_{\mathrm{v}} \right\| \ge \gamma \right\} |} \; \tag{10}$$

where |·| is the cardinal operator, *Nu* is the number of users inside the building, and *Pd* and *Pf a* correspond to detection probability (here also the sensitivity) and false positive rate in confusion-matrix terminology, respectively. Clearly, the proximity-based service utility increases when *Pd* increases and when *Pf a* decreases.

**Table 2.** Main simulation parameters (unless otherwise specified in plots' titles).


**Figure 4.** Two examples of users distribution within a 4-floor building during two Monte Carlo runs. (**a**) Monte Carlo run 1; (**b**) Monte Carlo run 2. In these runs, we allocated 80% of users are in hotspot areas and 20% of users are outside hotspot areas, uniformly distributed within the building.

**Figure 5.** Examples of perturbed locations based on (**a**) *Margmin*(·) and (**b**) *Margmax*(·) metrics.  = 0.1 m, Laplace perturbation.

The ensured privacy level is proportional to the distance between the perturbed location and the true location, or the RMSE between *<sup>M</sup>*(**<sup>x</sup>***u*) and **x***u*, namely

$$RMSE = \sqrt{\frac{1}{N\_{\mu}} \sum\_{u=1}^{N\_{\text{tr}}} ||M(\mathbf{x}\_{\text{it}}) - \mathbf{x}\_{\text{it}}||^2} \tag{11}$$

Clearly, the ensured privacy level is better when RMSE from Equation (11) is higher.

**Figure 6.** Two examples of perturbed location via argmin + Laplacian noise and via argmax + Laplacian noise. (**a**) users uniformly distributed over one floor; (**b**) users uniformly distributed within a circular hotspot of radius 5 m.

### *5.2. Comparison with State-of-the-Art Perturbation Mechanisms*

Several obfuscation models have been proposed so far in the literature to protect the location information, as described in Section 2. Three of the most common ones, selected here as benchmarks are the uniform obfuscation [31], the Laplacian perturbation [47], and the Gaussian perturbation [48]. The uniform perturbation model from [31] was given for 2D case and it was based on the idea that a random vector shift is applied to the user location with a certain radius. The model from [31] extended to 3D scenarios can be written as

$$M\_{uniform}(\mathbf{u}\_{\mathcal{U}}) = \mathbf{x}\_{\mathcal{U}} + \mathfrak{J}\_{\mathcal{U}} \tag{12}$$

where *ξu* is a 3D vector with elements [*ξ<sup>u</sup>*,*x*, *ξ<sup>u</sup>*,*y*, *ξ<sup>u</sup>*,*<sup>z</sup>*] given by

$$
\xi\_{\mu,x}^{\alpha} = \mu \cos(\theta) \tag{13}
$$

$$
\zeta\_{u,y} = \mu \sin(\theta) \tag{14}
$$

*ξ<sup>u</sup>*,*<sup>z</sup>* = *μtan*(*α*) (15)

and *μ*, *θ*, and *α* are the random radius, azimuth, and elevation angles, respectively, drawn from the following three uniform distributions: *μ U*(0, 1/), *θ U*(0, <sup>2</sup>*π*), and *α U*(0, <sup>2</sup>*π*), where *<sup>U</sup>*(*<sup>a</sup>*, *b*) stands for a uniform distribution in the interval [*a*, *b*].

The Laplacian [47] and Gaussian [48] perturbations can be modeled as

$$M\_{Laplace,Gaussian}(\mathbf{u}\_{\rm u}) = \mathbf{x}\_{\rm u} + \boldsymbol{\xi} \tag{16}$$

where *ξ* is a Laplacian or a Gaussian noise, as given in Equations (4) and (3), respectively. The comparison with the three state-of-the-art algorithms described above, namely uniform obfuscation [31], Laplacian perturbation [47], and Gaussian perturbation [48] is shown in Figure 7.

**Figure 7.** Comparison with state-of-the-art algorithms: (**a**) *Pd* versus the noise perturbation level; (**b**) *Pf a* versus the noise perturbation level; (**c**) RMSE between the perturbed location and original location versus the noise perturbation level; (**d**) utility versus privacy.

As seen in Figure 7, the argmax-based metric offers the best detection probability (upper left plot) and the best privacy level (lower left plot), but slightly worse false alarm probabilities (upper right plot) than the other four investigated algorithms, namely argminbased and three bench,ark ones. The most important plot is however the one depicted in the lower right part of Figure 7, where the utility-privacy tradeoff is illustrated. For a fairer comparison, the utility here comprises the average between the *Pd* and 1 − *Pf a*; the closest to 100% this value is, the higher utility we have; ideally, a best service would have *Pd* = 1 and *Pf a* = 0. The privacy level is given by RMSE; the higher the RMSE between the perturbed and true location is, the higher the privacy. Clearly, the argmax-based perturbation is a clear winner among all considered algorithms, as it can reach simultaneously high levels of privacy and high levels of utility of a proximity service relying in inter-users distance. It is to be emphasized that such utility pertains only to such proximity-based services relying on inter-user distances; other location-based services needing absolute location information would have a different utility, where our argmax-based algorithm would most likely perform poorer than the other approaches. In terms of argmin-based approach versus the three considered benchmark, there is very little difference in the utility-privacy tradeoff. For this reason and in order to keep clarity in the subsequent plots, we will focus from now on only on the comparisons between argmin- and argmax-based perturbations and on the deeper analysis of the argmax-based operator.

#### *5.3. Privacy Level as a Function of Parameter*

The RMSE between the transmitted perturbed location and the original location, as defined in Equation (11), is shown in Figure 8. A higher RMSE value means a higher user privacy level. There is no significant difference between the noise type *ξ* used in the perturbation mechanism, with the Laplacian noise giving slightly better results than the Gaussian one in terms of privacy for the *argmax* metric, and the Gaussian noise giving slightly better results in terms of privacy for the *argmin* metric.

A very interesting finding is that by using an *argmax* metric, not only one achieves significantly higher privacy level than by using *argmin* metric (i.e., higher RMSE values), but also the noise level 1/ acts in an opposite manner on the *argmax* metric than on the *argmin* metric, meaning that a higher  ensures more obfuscation in the argmin-based approach, but less obfuscation in the argmax-based approach. This points out that high levels of  (or, equivalently low levels of the noise standard deviation) are giving better results in terms of privacy with the *argmax* metric than lower levels of . This is observed due to the fact that the users' location is already mapped far away from its initial location through the *argmax* operator, and it is enough to add only a small additional random perturbation in order to make difficult the 'guessing' of true user location **x***u* based on the disclosed perturbed location *<sup>M</sup>*(**<sup>x</sup>***u*) in case an attacker or eavesdropper gets access to the perturbed location.

**Figure 8.** RMSE between the perturbed location and original location versus the noise perturbation level for two noise types (Laplacian and Gaussian) and two mapping metrics (argmin and argmax).

#### *5.4. Utility Level as a Function of Parameter*

Figure 9 shows the utility (i.e., the detection probability) as well as the false alarm probabilities in the presence of various perturbations (*argmin* versus *argmax* and Gaussian versus Laplacian noises).

Clearly, the *argmax* metric has higher utility at the expense of a moderately higher false alarm than the *argmin* metric. The differences between Gaussian and Laplacian noises are minor and therefore Gaussian perturbation is recommended to be used for simplicity. The best detection probabilities for a proximity-based application are achieved with  values above 1 (or equivalently, standard deviation of the noise below 1 m). We can see from the left plot in Figure 9 that detection probabilities close to 100% are achievable with the proposed *argmax* metric, with moderate false alarms of about 16%. As the user privacy is highly preserved with an *argmax* metric and high enough  values (see also Figure 8), the price to pay in terms of false alarm probabilities of up to 16% may seem reasonable for users desiring high location privacy. Indeed, the cost of a false alarm may be quite low to the user (e.g., user is incorrectly informed that a friend is nearby or user is incorrectly informed that he or she might have been close contact of a person confirmed with COVID-19 and thus he/she would take unnecessary, but also not-hurtful additional protection measures). However, the utility of a correct proximity detection in a proximity-based service is high and, as shown in the left plot of Figure 9, it is preserved with the *Margmax* metric and an  value above 1.

**Figure 9.** (**a**) Detection and (**b**) false-alarm probabilities versus the noise perturbation level for two noise types (Laplacian and Gaussian) and two mapping metrics (argmin and argmax). The proximity threshold *γ* was set to 2 m (e.g., for a digital contract-tracing application). A 4-floor building with 1000 users and 80% of them placed in hotspot areas.
