Approximating Functions with Approximate Privacy for Applications in Signal Estimation and Learning

Large corporations, government entities and institutions such as hospitals and census bureaus routinely collect our personal and sensitive information for providing services. A key technological challenge is designing algorithms for these services that provide useful results, while simultaneously maintaining the privacy of the individuals whose data are being shared. Differential privacy (DP) is a cryptographically motivated and mathematically rigorous approach for addressing this challenge. Under DP, a randomized algorithm provides privacy guarantees by approximating the desired functionality, leading to a privacy–utility trade-off. Strong (pure DP) privacy guarantees are often costly in terms of utility. Motivated by the need for a more efficient mechanism with better privacy–utility trade-off, we propose Gaussian FM, an improvement to the functional mechanism (FM) that offers higher utility at the expense of a weakened (approximate) DP guarantee. We analytically show that the proposed Gaussian FM algorithm can offer orders of magnitude smaller noise compared to the existing FM algorithms. We further extend our Gaussian FM algorithm to decentralized-data settings by incorporating the

CAPE

protocol and propose

capeFM

. Our method can offer the same level of utility as its centralized counterparts for a range of parameter choices. We empirically show that our proposed algorithms outperform existing state-of-the-art approaches on synthetic and real datasets.

Keywords:

differential privacy; functional mechanism; decentralized-data systems

1. Introduction

Differential privacy (DP) [] has emerged as a de facto standard for privacy-preserving technologies in research and practice due to the quantifiable privacy guarantee it provides. DP involves randomizing the outputs of an algorithm in such a way that the presence or absence of a single individual’s information within a database does not significantly affect the outcome of the algorithm. DP typically introduces randomness in the form of additive noise, ensuring that an adversary cannot infer any information about a particular record with high confidence. The key challenge is to keep the performance or utility of the noisy algorithm close enough to the unperturbed one to be useful in practice [].

In its pure form, DP measures privacy risk by a parameter

ϵ

, which can be interpreted as the privacy budget, that bounds the log-likelihood ratio of the output of a private algorithm under two datasets differing in a single individual’s data. The smaller

ϵ

used, the greater the privacy ensured, but at the cost of worse performance. In privacy-preserving machine learning models, higher values of

ϵ

are generally chosen to achieve acceptable utility. However, setting

ϵ

to arbitrarily large values severely undermines privacy, although there are no hard threshold values for

ϵ

above which formal guarantees provided by DP become meaningless in practice []. In order to improve utility for a given privacy budget, a relaxed definition of differential privacy, referred to as

(ϵ, δ)

-DP, was proposed []. Under this privacy notion, a randomized algorithm is considered privacy-preserving if the privacy loss of the output is smaller than

exp (ϵ)

with a high probability (i.e., with probability at least

1 - δ

) [].

Our current work is motivated by the necessity of a decentralized differentially private algorithm to efficiently solve practical signal estimation and learning problems that (i) offers better privacy–utility trade-off compared to existing approaches, and (ii) offers similar utility as the pooled-data (or centralized) scenario. Some noteworthy real-world examples of systems that may need such differentially private decentralized solutions include []: (i) medical research consortium of healthcare centers and labs, (ii) decentralized speech processing systems for learning model parameters for speaker recognition, (iii) multi-party cyber-physical systems. To this end, we first focus on improving the privacy–utility trade-off of a well known DP mechanism, called the functional mechanism (FM) []. The FM approach is more general and requires fewer assumptions on the objective function than other objective perturbation approaches [,].

The functional mechanism was originally proposed for “pure”

ϵ

-DP. However, it involves an additive noise with very large variance for datasets with even moderate ambient dimension, leading to a severe degradation in utility. We propose a natural “approximate”

(ϵ, δ)

-DP variant using Gaussian noise and show that the proposed Gaussian FM scheme significantly reduces the additive noise variance. A recent work by Ding et al. [] proposed relaxed FM using the Extended Gaussian mechanism [], which also guarantees approximate

(ϵ, δ)

-DP instead of pure DP. However, we will show analytically and empirically that, just like the original FM, the relaxed FM also suffers from prohibitively large noise variance even for moderate ambient dimensions. Our tighter sensitivity analysis for the Gaussian FM, which is different from the technique used in [], allows us to achieve much better utility for the same privacy guarantee. We further extend the proposed Gaussian FM framework to the decentralized or “federated” learning setting using the

CAPE

protocol []. Our

capeFM

algorithm can offer the same level of utility as the centralized case over a range of parameters. Our empirical evaluation of the proposed algorithms on synthetic and real datasets demonstrates the superiority of the proposed schemes over the existing methods. We now review the relevant existing research works in this area before summarizing our contributions.

Related Works. There is a vast literature on the perturbation techniques to ensure DP in machine learning algorithms. The simplest method for ensuring that an algorithm satisfies DP is input perturbation, where noise is introduced to the input of the algorithm []. Another common approach is output perturbation, which obtains DP by adding noise to the output of the problem. In many machine learning algorithms, the underlying objective function is minimized with gradient descent. As the gradient is dependent on the privacy-sensitive data, randomization is introduced at each step of the gradient descent [,]. The amount of noise we need to add at each step depends on the sensitivity of the function to changes in its input []. Objective perturbation [,,] is another state-of-the-art method to obtain DP, where noise is added to the underlying objective function of the machine learning algorithm, rather than its solutions. A newly proposed take on output perturbation [] injects noise after model convergence, which imposes some additional constraints. In addition to optimization problems, Smith [] proposed a general approach for computing summary statistics using the sample-and-aggregate framework and both the Laplace and Exponential mechanisms [].

Zhang et al. originally proposed functional mechanism (FM) [] as an extension to the Laplace mechanism. FM has been used in numerous studies to ensure DP in practical settings. Jorgensen et al. applied FM in personalized differential privacy (PDP) [], where the privacy requirements are specified at the user-level, rather than by a single, global privacy parameter. FM has also been combined with homomorphic encryption [] to obtain both data secrecy and output privacy, as well as with fairness-aware learning [,] in classification models. The work of Fredrikson et al. [], which demonstrated privacy in pharmacogenetics using FM and other DP mechanisms, is of particular interest to us. Pharmacogenetic models [,,,] contain sensitive clinical and genomic data that need to be protected. However, poor utility of differentially private pharmacogenetic models can expose patients to increased risk of disease. Fredrikson et al. [] tested the efficacy of such models against attribute inference by using a model inversion technique. Their study shows that, although not explicitly designed to protect attribute privacy, DP can prevent attackers from accurately predicting genetic markers if

ϵ

is sufficiently small (≤1). However, the small value of

ϵ

results in poor utility of the models due to excessive noise addition, leading them to conclude that when utility cannot be compromised much, the existing methods do not give an ϵ for which state-of-the-art DP mechanisms can be reasonably employed. As mentioned before, Ding et al. [] recently proposed relaxed FM in an attempt to improve upon the original FM using the Extended Gaussian mechanism [], which offered approximate DP guarantee.

DP algorithms provide different guarantees than Secure Multi-party Computation (SMC)-based methods. Several studies [,,] applied a combination of SMC and DP for distributed learning. Gade and Vaidya [] demonstrated one such method in which each site adds and subtracts arbitrary functions to confuse the adversary. Heikkilä et al. [] also studied the relationship of additive noise and sample size in a distributed setting. In their model, S data holders communicate their data to M computation nodes to compute a function. Tajeddine et al. [] used DP-SMC on vertically partitioned data, i.e., where data of the same participants are distributed across multiple parties or data holders. Bonawitz et al. [] proposed a communication-efficient method for federated learning over a large number of mobile devices. More recently, Heikkilä et al. [] considered DP in a cross-silo federated learning setting by combining it with additive homomorphic secure summation protocols. Xu et al. [] investigated DP for multiparty learning in vertically partitioned data setting. Their proposed framework dissects the objective function into single-party and cross-party sub-functions, and applies functional mechanisms and secure aggregation to achieve the same utility as the centralized DP model. Inspired by the seminal work of Dwork et al. [] that proposed distributed noise generation for preserving privacy, Imtiaz et al. [] proposed the Correlation Private Estimation (

CAPE

) protocol.

CAPE

employs a similar principle as Anandan and Clifton [] to reduce the noise added for DP in decentralized-data settings.

Our Contributions. As mentioned before, we are motivated by the necessity of a decentralized differentially private algorithm that injects a smaller amount of noise (compared to existing approaches) to efficiently solve practical signal estimation and learning problems. To that end, we first propose an improvement to the existing functional mechanism. We achieve this by performing a tighter characterization of the sensitivity analysis, which significantly reduces the additive noise variance. As we utilize the Gaussian mechanism [] to ensure

(ϵ, δ)

-DP, we call our improved functional mechanism Gaussian FM. Using our novel sensitivity analysis, we show that the proposed Gaussian FM injects a much smaller amount of additive noise compared to the original FM [] and the relaxed FM [] algorithms. We empirically show the superiority of Gaussian FM in terms of privacy guarantee and utility by comparing it with the corresponding non-private algorithm, the original FM [], the relaxed FM [], the objective perturbation [], and the noisy gradient descent [] methods. Note that the original FM [] and the objective perturbation [] methods guarantee pure DP, whereas the other methods guarantee approximate DP. We compare our

(ϵ, δ)

-DP Gaussian FM with the pure DP algorithms as a means for investigating how much performance/utility gain one can achieve by trading off pure the DP guarantee with an approximate DP guarantee. Additionally, the noisy gradient descent method is a multi-round algorithm. Due to the composition theorem of differential privacy [], the privacy budgets in multi-round algorithms accumulate across the number of iterations during training. In order to perform better accounting for the total privacy loss in the noisy gradient descent algorithm, we use Rényi differential privacy [].

Considering the fact that machine learning algorithms are often used in decentralized/federated data settings, we adapt our proposed Gaussian FM algorithm to decentralized/federated data settings following the (

CAPE

) [] protocol, and propose

capeFM

. In many signal processing and machine learning applications, where privacy regulations prevent sites from sharing the local raw data, joint learning across datasets can yield discoveries that are impossible to obtain from a single site. Motivated by scientific collaborations that are common in human health research,

CAPE

improves upon the conventional decentralized DP schemes and achieves the same level of utility as the pooled-data scenario in certain regimes. It has been shown [] that

CAPE

can benefit computations with sensitivies satisfying some conditions. Many functions of interest in machine learning and deep neural networks have sensitivites that satisfy these conditions. Our proposed

capeFM

algorithm utilizes the Stone–Weierstrass theorem [] to approximate a cost function in the decentralized-data setting and employs the

CAPE

protocol.

To summarize, the goal of our work is to improve the privacy–utility trade-off and reduce the amount of noise in the functional mechanism at the expense of approximate DP guarantee for applications of machine learning in decentralized/federated data settings, similar to those found in research consortia. Our main contributions are:

We propose Gaussian FM as an improvement over the existing functional mechanism by performing a tighter sensitivity analysis. Our novel analysis has two major features: (i) the sensitivity parameters of the data-dependent (hence, privacy-sensitive) polynomial coefficients of the Stone–Weierstrass decomposition of the objective function are free of the dataset dimensionality; and (ii) the additive noise for privacy is tailored for the order of the polynomial coefficient of the Stone–Weierstrass decomposition of the objective function, rather than being the same for all coefficients. These features give our proposed Gaussian FM a significant advantage by offering much less noisy function computation compared to both the original FM [] and the relaxed FM [], as shown for linear and logistic regression problems. We also empirically validate this on real and synthetic data.
We extend our Gaussian FM to decentralized/federated data settings to propose $capeFM$ , a novel extension of the functional mechanism for decentralized-data. To this end, we note another significant advantage of our proposed Gaussian FM over the original FM: the Gaussian FM can be readily extended to decentralized/federated data settings by exploiting the fact that the sum of a number of Gaussian random variables is another Gaussian random variable, which is not true for Laplace random variables. We show that the proposed $capeFM$ can achieve the same utility as the pooled-data scenario for some parameter choices. To the best of our knowledge, our work is the first functional mechanism for decentralized-data settings.
We demonstrate the effectiveness of our algorithms with varying privacy and dataset parameters. Our privacy analysis and empirical results on real and synthetic datasets show that the proposed algorithms can achieve much better utility than the existing state-of-the-art algorithms.

2. Definitions and Preliminaries

Notation. We denote vectors, matrices, and scalars with bold lower case letters

(x)

, bold upper case letters

(X)

, and unbolded letters

(N)

, respectively. We denote indices with lower case letters and they typically run from 1 to their upper case versions (

d \in 1, 2, \dots, D ≜ [D]

). The n-th column of a matrix X is denoted as

x_{n}

. We denote the Euclidean (or

L_{2}

) norm of a vector and the spectral norm of a matrix with

{‖ \cdot ‖}_{2}

. Finally, we denote the inner product of two matrices A and B as

⟨ A, B ⟩ = tr (A^{⊤} B)

.

2.1. Definitions

Definition 1

((

ϵ, δ

)-Differential Privacy []). Let us consider a domain

D

of datasets consisting of N records, and

D, D^{'} \in D

where D and

D^{'}

differ in a single record (neighboring datasets). Then, for all measurable

S \subseteq T

and all neighboring data sets

D, D^{'} \in D

, an algorithm

A : D \mapsto T

provides

(ϵ, δ)

-differential privacy (

(ϵ, δ)

-DP) if

Pr [A (D) \in S] \leq exp (ϵ) Pr [A (D^{'}) \in S] + δ .

This definition is also known as bounded differential privacy (as opposed to unbounded differential privacy []). One way to interpret this is that an algorithm $A$ satisfies $(ϵ, δ)$ -DP if the probability distribution of the output of $A$ does not change significantly if the input database is changed by one sample. That is to say, whether or not a particular individual takes part in a differentially private study, the outcome of the study is not changed by much. An adversary attempting to identify an individual will not be able to verify the individual’s presence or absence in the study with high confidence. The privacy of the individual is thus preserved by plausible deniability. In the definition of DP, $(ϵ, δ)$ are privacy parameters, where lower $(ϵ, δ)$ ensure more privacy. The parameter $δ$ can be interpreted as the probability that the algorithm fails to provide privacy risk $ϵ$ . Note that $(ϵ, δ)$ -DP is known as approximate differential privacy whereas $ϵ$ -differential privacy ( $ϵ$ -DP) is known as pure differential privacy. In general, we denote approximate (bounded) differentially private algorithms with DP. An important feature of DP is that post-processing of the output does not change the privacy guarantee, as long as that post-processing does not use the original data []. Among the most commonly used mechanisms for formulating a DP algorithm are additive noise mechanisms such as the Gaussian [] or Laplace [] mechanisms, and random sampling using the exponential mechanism []. For additive noise mechanisms, the standard deviation of the additive noise is scaled to the sensitivity of the computation.

Definition 2

(

L_{p}

-Sensitivity []). Given neighboring datasets D and

D^{'}

, the

L_{p}

-sensitivity of a vector-valued function

f (D)

is

Δ max_{D, D^{'}} {‖ f (D) - f (D^{'}) ‖}_{p} .

We focus on

p = 1

and 2 in this paper.

Definition 3

(Gaussian Mechanism []). Let

f : D \mapsto R^{D}

be an arbitrary function with

L_{2}

-sensitivity Δ. The Gaussian mechanism with parameter τ adds noise scaled to

N (0, τ^{2})

to each of the D entries of the output and satisfies

(ϵ, δ)

-differential privacy for

ϵ \in (0, 1)

if

τ \geq \frac{Δ}{ϵ} \sqrt{2 log \frac{1.25}{δ}} .

Note that, for any given $(ϵ, δ)$ pair, we can calculate a noise variance $τ^{2}$ such that addition of a noise term drawn from $N (0, τ^{2})$ guarantees ( $ϵ, δ$ )-differential privacy. There are infinitely many $(ϵ, δ)$ pairs that yield the same $τ^{2}$ . Therefore, we parameterize our methods using $τ^{2}$ [] in this paper. We refer the reader to [,,] for a broader discussion of privacy parameter $ϵ$ .

Definition 4

(Rényi Differential Privacy (RDP) []). A randomized mechanism

A : D \mapsto T

is

(a, ϵ_{r})

-Rényi differentially private if, for any adjacent

D, D^{'} \in D

, the following holds:

D_{a} (A (D) ‖ A (D^{'})) \leq ϵ_{r}

Here,

D_{a} (P (x) ‖ Q (x)) = \frac{1}{a - 1} log E_{x \sim Q} {(\frac{P (x)}{Q (x)})}^{a}

, and

P (x)

and

Q (x)

are probability density functions defined on

T

.

Analyzing the total privacy loss of a multi-round algorithm, each stage of which is DP, is a challenging task. It has been shown [,] that the advanced composition theorem [] for

(ϵ, δ)

-differential privacy can be loose. Hence, we use RDP, which offers a much simpler composition rule that is shown to be tight. Here, we review the properties of RDP [] that we utilize in our analysis in Section 3.

Proposition 1

(From RDP to Differential Privacy []). If

A

is an

(α, ϵ_{r})

-RDP mechanism, then it also satisfies

(ϵ_{r} + \frac{log \frac{1}{δ_{r}}}{α - 1}, δ_{r})

-differential privacy for any

0 < δ_{r} < 1

.

Proposition 2

(Composition of RDP []). Let

A : D \mapsto T_{1}

be

(α, ϵ_{r 1})

-RDP and

B : D \mapsto T_{2}

be

(α, ϵ_{r 2})

-RDP. Then the mechanism defined as

(X, Y)

, where

X \sim A (D)

and

Y \sim B (X, D)

, satisfies

(α, ϵ_{r 1} + ϵ_{r 2})

-RDP.

Proposition 3

(RDP and Gaussian Mechanism []). If

A

has

L_{2}

-sensitivity 1, then the Gaussian mechanism

G_{σ} A (D) = A (D) + E

, where

E \sim N (0, σ^{2})

satisfies

(α, \frac{α}{2 σ^{2}})

-RDP. Additionally, a composition of T Gaussian mechanisms satisfies

(α, \frac{α T}{2 σ^{2}})

-RDP.

Correlation Assisted Private Estimation ( $CAPE$ ) []. As mentioned before, we utilize the

CAPE

protocol for developing

capeFM

. In Section 5.2 we describe the

CAPE

trust/collusion model in detail, and discuss how the correlated noise in a decentralized-data setting is used to reduce the excess noise introduced in conventional decentralized DP algorithms. We use the terms “distributed” and “decentralized” interchangeably in this paper. Note that the

CAPE

scheme, and consequently the proposed

capeFM

algorithm can be readily extended (see Section III.C of Imtiaz et al. []) for federated learning [] settings.

The

CAPE

protocol considers a decentralized data setting with S sites and a central aggregator node in an “honest but curious” threat model []. For simplicity, we consider the symmetric setting: each site

s \in [S]

holds a dataset of

N_{s} = \frac{N}{S}

disjoint data samples, where the total number of samples across all sites is N.

CAPE

overcomes the utility degradation in conventional decentralized DP schemes and achieves the same noise variance as that of the pooled-data scenario in certain parameter regimes. The privacy of

CAPE

is given by Theorem 1 and the claim that the noise variance of the estimator is exactly the same as if all data were present at the aggregator is formalized in Lemma 1. Here, we review the relevant properties of the

CAPE

scheme for extending our proposed Gaussian FM to the decentralized-data setting. We refer the reader to Imtiaz et al. [] for the proofs of these properties.

Theorem 1

(Privacy of

CAPE

scheme []). In a decentralized data setting with

N_{s} = \frac{N}{S}

and

τ_{s}^{2} = τ^{2}

for all sites

s \in [S]

, if at most

S_{C} = ⌈ \frac{S}{3} ⌉ - 1

collude after execution, then

CAPE

guarantees

(ϵ, δ)

-differential privacy for each site, where

(ϵ, δ)

satisfy the relation

δ = 2 \frac{σ_{z}}{ϵ - μ_{z}} ϕ (\frac{ϵ - μ_{z}}{σ_{z}})

,

ϵ \in (0, 1)

and

(μ_{z}, σ_{z})

are given by

\begin{matrix} μ_{z} & = \frac{S^{3}}{2 τ^{2} N^{2} (1 + S)} (\frac{S - S_{C} + 2}{S - S_{C}} + \frac{\frac{9}{S - S_{C}} S_{C}^{2}}{S (1 + S) - 3 S_{C}^{2}}), \\ σ_{z} & = 2 μ_{z} . \end{matrix}

Lemma 1

([]). Consider the symmetric setting:

N_{s} = \frac{N}{S}

and

τ_{s}^{2} = τ^{2}

for all sites

s \in [S]

. Let the variances of the noise terms

e_{s}

and

g_{s}

be

τ_{e}^{2} = (1 - \frac{1}{S}) τ_{s}^{2}

and

τ_{g}^{2} = \frac{τ_{s}^{2}}{S}

, respectively. If we denote the variance of the additive noise (for preserving privacy) in the pooled-data scenario by

τ_{p o o l}^{2}

and the variance of the estimator

a_{c a p e}

by

τ_{c a p e}^{2}

then

CAPE

protocol achieves the same noise variance as the pooled-data scenario (i.e.,

τ_{p o o l}^{2} = τ_{c a p e}^{2}

).

Proposition 4

(Performance improvement using

CAPE

[]). If the local noise variances are

{τ_{s}^{2}}

for

s \in [S]

then the

CAPE

scheme provides a reduction

G = \frac{τ_{c o n v}^{2}}{τ_{c a p e}^{2}} = S

in noise variance over the conventional decentralized DP scheme in the symmetric setting (

N_{s} = \frac{N}{S}

and

τ_{s}^{2} = τ^{2}

∀

s \in [S]

), where

τ_{c o n v}^{2}

and

τ_{c a p e}^{2}

are the noise variances of the final estimate at the aggregator in the conventional scheme and the

CAPE

scheme, respectively.

Proposition 5

(Scope of

CAPE

[]). Consider a decentralized setting with

S > 1

sites in which site

s \in [S]

has a dataset

D_{s}

of

N_{s}

samples and

\sum_{s = 1}^{S} N_{s} = N

. Suppose the sites are employing the

CAPE

scheme to compute a function

f (D)

with

L_{2}

-sensitivity

Δ (N)

. Denote

n = [N_{1}, N_{2}, \dots, N_{S}]

and observe the ratio

H (n) = \frac{τ_{c a p e}^{2}}{τ_{p o o l}^{2}} = \frac{\sum_{s = 1}^{S} Δ^{2} (N_{s})}{S^{3} Δ^{2} (N)}

. Then the

CAPE

protocol achieves

H (n) = 1

, if (i)

Δ (\frac{N}{S}) = S Δ (N)

for convex

Δ (N)

; and (ii)

S^{3} Δ^{2} (N) = \sum_{s = 1}^{S} Δ^{2} (N_{s})

for general

Δ (N)

.

2.2. Functional Mechanism []

In this section, we first review the existing functional mechanism through a regression model following [] before describing our proposed improvement. Let

D

be a dataset that contains N samples of the form

(x_{n}, y_{n})

, where

x_{n} \in R^{D}

is the feature vector and

y_{n} \in R

is the response for

n \in [N]

. Without loss of generality, we assume for each sample that

∥ x_{n} ∥_{2} \leq 1

. The objective is to construct a regression model that enables one to predict any

y_{n}

based on

x_{n}

. Depending on the regression model, the mapping function can be of various types. Without loss of generality, it can be parameterized with a D-dimensional vector

w

of real numbers. To evaluate whether

w

leads to an accurate model, a cost function f is defined to measure the deviation between the original and predicted values of

y_{n}

, given

w

as the model parameters. The optimal model parameter

w^{*}

is defined as

w^{*} = arg min_{w} f_{D} (w),

where the empirical average cost function is

f_{D} (w) = \frac{1}{N} \sum_{n = 1}^{N} f (x_{n}, w) .

(1)

Note that

f_{D} (w)

depends on the data samples. In cases where the data are privacy-sensitive, the empirical average cost function

f_{D} (w)

(or any function computed from it, such as its gradient or the optimizer

w^{*}

) may reveal private information about the members of the dataset. To make the model differentially private, one approach is to add noise to the gradients of the cost function at every iteration []. We refer to this approach as noisy gradient descent in this paper. Another approach is the to perturb the objective function [,,,]. In particular, the original FM [] and the relaxed FM [] use a randomized approximation of the objective function.

Now, recall that

w \in R^{D}

contains the model parameters

w = {[w_{1}, w_{2}, \dots, w_{D}]}^{⊤}

. We define

ϕ (w) = w_{1}^{c_{1}} w_{2}^{c_{2}} \dots w_{D}^{c_{D}}

for some

c_{1}, c_{2}, \dots, c_{D} \in N

. Let

Φ_{j}

denote the set of all

ϕ (w)

with degree

j \in N

, i.e.,

Φ_{j} = \{w_{1}^{c_{1}} w_{2}^{c_{2}} \dots w_{D}^{c_{D}} | \sum_{d = 1}^{D} c_{d} = j\} .

For example,

Φ_{0} = {1}

,

Φ_{1} = {w_{1}, w_{2}, \dots, w_{D}}

, and

Φ_{2} = {w_{d_{1}} w_{d_{2}} ∣ d_{1}, d_{2} \in [D]}

. By the Stone–Weierstrass Theorem [], any continuous and differentiable

f (x_{n}, w)

can be always written as a (potentially infinite) sum of monomials of

{w_{d}}

, i.e., for some

J \in [0, \infty)

, we have

f (x_{n}, w) = \sum_{j = 0}^{J} \sum_{ϕ \in Φ_{j}} λ_{ϕ n} ϕ (w),

where

λ_{ϕ n} \in R

denotes the coefficient of

ϕ (w)

in the polynomial. Note that

λ_{ϕ n}

is a function of the n-th data sample. Consequently, the

f (x_{n}, w)

as expressed above depends on the model parameters through

ϕ (w)

and on the data samples through

λ_{ϕ n}

. The expression for average cost in (1) can now be written as

f_{D} (w) = \frac{1}{N} \sum_{n = 1}^{N} \sum_{j = 0}^{J} \sum_{ϕ \in Φ_{j}} λ_{ϕ n} ϕ (w) = \sum_{j = 0}^{J} \sum_{ϕ \in Φ_{j}} (\frac{1}{N} \sum_{n = 1}^{N} λ_{ϕ n}) ϕ (w) .

(2)

For regression analysis on two neighboring datasets

D

and

D^{'}

differing in a single sample, the

L_{1}

-sensitivity of the data-dependent term in (2) is computed as []:

\sum_{j = 0}^{J} \sum_{ϕ \in Φ_{j}} \frac{1}{N} ∥ (\sum_{D} λ_{ϕ n} - \sum_{D^{'}} λ_{ϕ n}) ∥_{1} \leq \frac{2}{N} max_{n} \sum_{j = 0}^{J} \sum_{ϕ \in Φ_{j}} ∥ λ_{ϕ n} ∥_{1} ≜ Δ^{f m} .

In FM, Zhang et al. [] proposed to perturb

f_{D} (w)

by injecting Laplace noise with variance

2 {(\frac{Δ^{f m}}{ϵ})}^{2}

into each coefficient of the polynomial. FM achieves

ϵ

-DP by obtaining the optimal model parameters

{\hat{w}}^{*}

that minimize the noise-perturbed function

{\hat{f}}_{D} (w)

.

As mentioned before, decomposition such as (2) can be performed for any continuous and differentiable cost function

f (x_{n}, w)

. However, depending on the complexity of

f (x_{n}, w)

, the decomposition may be non-trivial. In Section 4, we show how such decomposition can be performed on linear regression and logistic regression problems, as illustrative examples.

3. Functional Mechanism with Approximate Differential Privacy: Gaussian FM

Zhang et al. [] computed the

L_{1}

-sensitivity

Δ^{f m}

of the data-dependent terms for linear regression and logistic regression problems. The

Δ^{f m}

is shown to be

\frac{2}{N} {(1 + D)}^{2}

for linear regression, and

\frac{1}{N} (\frac{D^{2}}{4} + 3 D)

for logistic regression. We note that

Δ^{f m}

grows quadratically with the ambient dimension of the data samples, resulting in a excessively large amount of noise to be injected into the objective function. Additionally, Ding et al. [] proposed relaxed FM, a “utility-enhancement scheme”, by replacing the Laplace mechanism with the Extended Gaussian mechanism [], and thus achieving slightly better utility than the original FM at the expense of an approximate DP guarantee instead of a pure DP guarantee. However, Ding et al. [] showed that the

L_{2}

-sensitivity of the data-dependent terms for the logistic regression problem is

Δ^{r l x - f m} = \frac{1}{N} \sqrt{\frac{D^{2}}{16} + D}

. Additionally, using the technique outlined in [], it can be shown that the

L_{2}

-sensitivity of the data-dependent terms is

Δ^{r l x - f m} = \frac{2}{N} \sqrt{1 + 4 D + D^{2}}

for the linear regression problem (please see Appendix A for details). For both cases, we observe that

Δ^{r l x - f m}

grows linearly with the ambient dimension of the data samples. Therefore, the privacy-preserving additive noise variances in both the original FM and relaxed FM schemes are data-dimensionality dependent, and therefore, can be prohibitively large even for moderate D. Moreover, both FM and relaxed FM schemes add the same amount of noise to each polynomial coefficient

λ_{ϕ n}

irrespective of the order j. With a tighter characterization, we show in Section 4 that the sensitivities of these coefficients are different for different order j. We reduce the amount of added noise by addressing these issues and performing a novel sensitivity analysis. The key points are as follows:

Instead of computing the $ϵ$ -DP approximation of the objective function using the Laplace mechanism, we use the Gaussian mechanism to compute the ( $ϵ, δ$ )-DP approximation of $f_{D} (w)$ . This gives a weaker privacy guarantee than the pure differential privacy, but provides much better utility.
Recall that the original FM achieves $ϵ$ -DP by adding Laplace noise scaled to the $L_{1}$ -sensitivity of the data-dependent terms of the objective function $f_{D} (w)$ in (2). As we use the Gaussian mechanism, we require $L_{2}$ -sensitivity analysis. To compute the $L_{2}$ -sensitivity of the data-dependent terms of the objective function $f_{D} (w)$ in (2), we first define an array $Λ_{j}$ that contains $\frac{1}{N} \sum_{n = 1}^{N} λ_{ϕ n}$ as its entries for all $ϕ (w) \in Φ_{j}$ . The term “array” is used because the dimension of $Λ_{j}$ depends on the cardinality of $Φ_{j}$ . For example, for $j = 0$ , $Λ_{0}$ is a scalar because $Φ_{0} = {1}$ ; for $j = 1$ , $Λ_{1}$ can be expressed as a D-dimensional vector because $Φ_{1} = {w_{1}, w_{2}, . . ., w_{D}}$ ; for $j = 2$ , $Λ_{2}$ can be expressed as a $D \times D$ matrix because $Φ_{2} = {w_{d_{1}} w_{d_{2}} ∣ d_{1}, d_{2} \in [D]}$ .

We rewrite the objective function as

$f_{D} (w) = \sum_{j = 0}^{J} \sum_{ϕ \in Φ_{j}} (\frac{1}{N} \sum_{n = 1}^{N} λ_{ϕ n}) ϕ (w) = \sum_{j = 0}^{J} ⟨Λ_{j}, {\bar{ϕ}}_{j}⟩,$

(3)

where ${\bar{ϕ}}_{j}$ is the array containing all $ϕ (w) \in Φ_{j}$ as its entries. Note that ${\bar{ϕ}}_{j}$ and $Λ_{j}$ have the same dimensions and number of elements. We define the $L_{2}$ -sensitivity of $Λ_{j}$ as

$Δ_{j} = max_{D, D^{'}} ∥ Λ_{j}^{D} - Λ_{j}^{D^{'}} ∥_{2},$

(4)

where $Λ_{j}^{D}$ and $Λ_{j}^{D^{'}}$ are computed on neighboring datasets $D$ and $D^{'}$ , respectively. Following the Gaussian mechanism [], we can calculate the $(ϵ, δ)$ differentially private estimate of $Λ_{j}$ , denoted ${\hat{Λ}}_{j}$ as

${\hat{Λ}}_{j} = Λ_{j} + e_{j},$

(5)

where the noise array $e_{j}$ has the same dimension as $Λ_{j}$ , and contains entries drawn i.i.d. from $N (0, τ_{j}^{2})$ with $τ_{j} = \frac{Δ_{j}}{ϵ} \sqrt{2 log \frac{1.25}{δ}}$ . Finally, we have

${\hat{f}}_{D} (w) = \sum_{j = 0}^{J} ⟨{\hat{Λ}}_{j}, {\bar{ϕ}}_{j}⟩ .$

(6)

As the function $f_{D} (w)$ depends on the data only through $\{Λ_{j}\}$ , this computation satisfies ( $ϵ, δ$ )-differential privacy. Our proposed Gaussian FM is shown in detail in Algorithm 1.

Theorem 2

(Privacy of the Gaussian FM (Algorithm 1)). Consider Algorithm 1 with privacy parameters

(ϵ, δ)

, and the empirical average cost function

f_{D} (w)

represented as in (3). Then Algorithm 1 computes an

(ϵ, δ)

differentially private approximation

{\hat{f}}_{D} (w)

to

f_{D} (w)

. Consequently, the minimizer

{\hat{w}}^{*} = {arg min}_{w} {\hat{f}}_{D} (w)

satisfies

(ϵ, δ)

-differential privacy.

Algorithm 1 Gaussian FM

Require:: Data samples $(x_{n}, y_{n})$ for $n \in [N]$ ; cost function $f_{D} (w)$ represented as in (3); privacy parameters ( $ϵ, δ$ ).
1:: for $0 \leq j \leq J$ do
2:: Compute $Λ_{j}$ as shown in Section 4
3:: Compute $Δ_{j} = {max}_{D, D^{'}} ∥ Λ_{j}^{D} - Λ_{j}^{D^{'}} ∥_{2}$
4:: Compute $τ_{j} = \frac{Δ_{j}}{ϵ} \sqrt{2 log \frac{1.25}{δ}}$
5:: Compute $e_{j} \sim N (0, τ_{j}^{2})$ with the same dimension as $Λ_{j}$
6:: Release ${\hat{Λ}}_{j} = Λ_{j} + e_{j}$
7:: end for
8:: Compute ${\hat{f}}_{D} (w) = \sum_{j = 0}^{J} ⟨{\hat{Λ}}_{j}, {\bar{ϕ}}_{j}⟩$
9:: return Perturbed objective function ${\hat{f}}_{D} (w)$

Proof.

The proof of Theorem 2 follows from the fact that the function

{\hat{f}}_{D} (w)

depends on the data samples only through

{{\hat{Λ}}_{j}}

. The computation of

{{\hat{Λ}}_{j}}

is

(ϵ, δ)

-differentially private by the Gaussian mechanism [,]. Therefore, the release of

{\hat{f}}_{D} (w)

satisfies

(ϵ, δ)

-differential privacy. One way to rationalize this is to consider that the probability of the event of selecting a particular set of

{{\hat{Λ}}_{j}}

is the same as the event of formulating a function

{\hat{f}}_{D} (w)

with that set of

{{\hat{Λ}}_{j}}

. Therefore, it suffices to consider the joint density of the

{{\hat{Λ}}_{j}}

and find an upper bound on the ratio of the joint densities of the

{{\hat{Λ}}_{j}}

under two neighboring datasets

D

and

D^{'}

. As we employ the Gaussian mechanism to compute

{{\hat{Λ}}_{j}}

, the ratio is upper bounded by

exp (ϵ)

with probability at least

1 - δ

. Therefore, the release of

{\hat{f}}_{D} (w)

satisfies

(ϵ, δ)

-differential privacy. Furthermore, differential privacy is post-processing invariant. Therefore, the computation of the minimizer

{\hat{w}}^{*} = {arg min}_{w} {\hat{f}}_{D} (w)

also satisfies

(ϵ, δ)

-differential privacy. □

Privacy Analysis of Noisy Gradient Descent [] using Rényi Differential Privacy. One of the most crucial qualitative properties of DP is that it allows us to evaluate the cumulative privacy loss over multiple computations []. Cumulative, or total, privacy loss is different from (

ϵ, δ

)-DP in multi-round machine learning algorithms. In order to demonstrate the superior privacy guarantee of the proposed Gaussian FM, we compare it to the existing functional mechanism [], the relaxed functional mechanism [], the objective perturbation [], and the noisy gradient descent [] method. Note that, similar to objective perturbation, FM and relaxed FM, the proposed Gaussian FM injects randomness in a single round, and therefore does not require privacy accounting. However, the noisy gradient descent method involves addition of noise in each step the gradient is computed. That is, noise is added to the computed gradients of the parameters of the objective function during optimization. Since it is a multi-round algorithm, the overall

ϵ

used during optimization is different from the

ϵ

for every iteration. We follow the analysis procedure outlined in [] for the privacy accounting of the noisy gradient descent algorithm. Note that Proposition 3 described in Section 2.1 is defined for functions with unit

L_{2}

-sensitivity. Therefore, if a noise from

N (0, τ^{2})

is added to a function with sensitivity

Δ

, then the resulting mechanism satisfies

(α, \frac{α}{2 \frac{τ^{2}}{Δ^{2}}})

-RDP. Now, according to Proposition 3, the T-fold composition of Gaussian mechanisms satisfies

(α, \frac{α T}{2 \frac{τ^{2}}{Δ^{2}}})

-RDP. Finally, according to Proposition 1, it also satisfies

(ϵ_{r} + \frac{log \frac{1}{δ_{r}}}{α - 1}, δ_{r})

-differential privacy for any

0 \leq δ_{r} \leq 1

, where

ϵ_{r} = \frac{α T}{2 \frac{τ^{2}}{Δ^{2}}}

. For a given value of

δ_{r}

, we can express the value of the optimal overall

ϵ_{opt}

as a function of

α_{opt}

:

ϵ_{opt} = \frac{α_{opt} T}{2 \frac{τ^{2}}{Δ^{2}}} + \frac{log \frac{1}{δ_{r}}}{α_{opt} - 1},

(7)

where

α_{opt}

is given by

α_{opt} = 1 + \sqrt{\frac{2}{T} \frac{τ^{2}}{Δ^{2}} log \frac{1}{δ_{r}}} .

(8)

We compute the overall

ϵ

following this procedure for the noisy gradient descent algorithm [] in our experiments in Section 6.

4. Application of Gaussian FM in Regression Analysis

In this section, we demonstrate how our proposed Gaussian FM can be applied to linear and logistic regression problems to achieve (

ϵ, δ

)-DP. For both cases, we first decompose the objective function (i.e., the empirical average cost function) into a finite series of polynomials, inject noise into the coefficients (i.e., the only data-dependent components in the decomposition) using Gaussian mechanism, and finally minimize the (

ϵ, δ

)-differentially private objective function. As before, we assume that we have a dataset

D

with N samples of the form

(x_{n}, y_{n})

, where for each sample

n \in [N]

, the D-dimensional feature vector is

x_{n} = {[x_{n 1} x_{n 2} \dots x_{n D}]}^{⊤}

(normalized to ensure

∥ x_{n} ∥_{2} \leq 1

) and the corresponding output is

y_{n}

.

4.1. Linear Regression

For our linear regression problem, we assume

y_{n} \in [- 1, 1]

. Let

w \in R^{D}

be the parameter vector. The goal of linear regression is to find the optimal

w^{*}

so that

x_{n}^{⊤} w^{*} \approx y_{n}

. The empirical average cost function is defined as

f_{D} (w) = \frac{1}{N} \sum_{n = 1}^{N} {(y_{n} - x_{n}^{⊤} w)}^{2} .

(9)

Using simple algebra, this equation can be decomposed into a series of polynomials as

f_{D} (w) = (\frac{1}{N} \sum_{n = 1}^{N} y_{n}^{2}) + \sum_{d = 1}^{D} (- \frac{2}{N} \sum_{n = 1}^{N} y_{n} x_{n d}) w_{d} + \sum_{d_{1} = 1}^{D} \sum_{d_{2} = 1}^{D} (\frac{1}{N} \sum_{n = 1}^{N} x_{n d_{1}} x_{n d_{2}}) w_{d_{1}} w_{d_{2}} .

As we intend to compute the differentially private minimizer

{\hat{w}}^{*}

, we observe that the representation of

f_{D} (w)

is of the form

f_{D} (w) = \sum_{j = 0}^{J} ⟨Λ_{j}, {\bar{ϕ}}_{j}⟩

with

J = 2

. The expressions for

Λ_{j}

are

\begin{matrix} Λ_{0} & = \frac{1}{N} \sum_{n = 1}^{N} y_{n}^{2}, \\ Λ_{1} & = - \frac{2}{N} [\begin{matrix} \sum_{n = 1}^{N} y_{n} x_{n 1} \\ \sum_{n = 1}^{N} y_{n} x_{n 2} \\ ⋮ \\ \sum_{n = 1}^{N} y_{n} x_{n D} \end{matrix}], \\ Λ_{2} & = \frac{1}{N} [\begin{matrix} \sum_{n = 1}^{N} x_{n 1}^{2} & \dots & \sum_{n = 1}^{N} x_{n 1} x_{n D} \\ ⋮ & ⋱ & ⋮ \\ \sum_{n = 1}^{N} x_{n D} x_{n 1} & \dots & \sum_{n = 1}^{N} x_{n D}^{2} \end{matrix}] = \frac{1}{N} (X X^{⊤}) . \end{matrix}

Here,

Λ_{0}

is a scalar,

Λ_{1}

is a D-dimensional vector, and

Λ_{2}

is a

D \times D

symmetric matrix, since

X

is an

D \times N

matrix containing

x_{n}

as its columns. The expressions for

{\bar{ϕ}}_{j}

are

\begin{matrix} {\bar{ϕ}}_{0} & = 1, \\ {\bar{ϕ}}_{1} & = [\begin{matrix} w_{1} \\ w_{2} \\ ⋮ \\ w_{D} \end{matrix}], \\ {\bar{ϕ}}_{2} & = [\begin{matrix} w_{1}^{2} & w_{1} w_{2} & \dots & w_{1} w_{D} \\ w_{2} w_{1} & w_{2}^{2} & \dots & w_{2} w_{D} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ w_{D} w_{1} & w_{D} w_{2} & \dots & w_{D}^{2} \end{matrix}] . \end{matrix}

The next step is finding the sensitivities of

\{Λ_{j}\}

using (4). Let

D

and

D^{'}

be two neighboring datasets differing in only one sample, e.g., the last samples

(x_{N}, y_{N})

and

(x_{N}^{'}, y_{N}^{'})

. Now, the

L_{2}

-sensitivity of

Λ_{0}

is

\begin{matrix} Δ_{0} & = max_{D, D^{'}} ∥ \frac{1}{N} \sum_{n = 1}^{N} y_{n}^{2} - \frac{1}{N} \sum_{n = 1}^{N} y_{n}^{' 2} ∥_{2} \\ = \frac{1}{N} max_{D, D^{'}} ∥ y_{N}^{2} - y_{N}^{' 2} ∥_{2} \\ \leq \frac{1}{N}, \end{matrix}

since

y_{n} \in [- 1, 1]

and hence

y_{n}^{2} \in [0, 1]

. Next, the

L_{2}

-sensitivity of

Λ_{1}

is

\begin{matrix} Δ_{1} & = max_{D, D^{'}} ∥ - \frac{2}{N} y_{N} x_{N} + \frac{2}{N} y_{N}^{'} x_{N}^{'} ∥_{2} \\ \leq \frac{2}{N} max_{D, D^{'}} (∥ y_{N} x_{N} ∥_{2} + ∥ y_{N}^{'} x_{N}^{'} ∥_{2}) \\ = \frac{2}{N} max_{D, D^{'}} (| y_{N} | ∥ x_{N} ∥_{2} + | y_{N}^{'} | ∥ x_{N}^{'} ∥_{2}) \\ \leq \frac{4}{N}, \end{matrix}

where the second line follows from the triangle inequality, and the last line follows from the assumptions that

y_{n} \in [- 1, 1]

and

∥ x_{n} ∥_{2} \leq 1

. Finally, the

L_{2}

-sensitivity of

Λ_{2}

is

\begin{matrix} Δ_{2} & = max_{D, D^{'}} ∥ \frac{1}{N} (X X^{⊤}) - \frac{1}{N} (X^{'} X^{' ⊤}) ∥_{2} \\ = \frac{1}{N} max_{D, D^{'}} ∥ x_{N} x_{N}^{⊤} - x_{N}^{'} x_{N}^{' ⊤} ∥_{2} \\ \leq \frac{1}{N} . \end{matrix}

The proof of the inequality in the last line is as follows:

Proof.

The term

(x_{N} x_{N}^{⊤} - x_{N}^{'} x_{N}^{' ⊤})

is a

D \times D

symmetric matrix, whose norm can be expressed [] as

sup \{u^{⊤} [x_{N} x_{N}^{⊤} - x_{N}^{'} x_{N}^{' ⊤}] v | u = v, ∥ u ∥_{2} = ∥ v ∥_{2} = 1\}

. It follows that

\begin{matrix} ∥ x_{N} x_{N}^{⊤} - x_{N}^{'} x_{N}^{' ⊤} ∥_{2} & = sup \{u^{⊤} x_{N} x_{N}^{⊤} u - u^{⊤} x_{N}^{'} x_{N}^{' ⊤} u\} \\ = sup \{{(x_{N}^{⊤} u)}^{⊤} (x_{N}^{⊤} u) - {(x_{N}^{' ⊤} u)}^{⊤} (x_{N}^{' ⊤} u)\} \\ = sup \{∥ x_{N}^{⊤} u ∥_{2}^{2} - ∥ x_{N}^{' ⊤} u ∥_{2}^{2}\} \\ \leq sup \{∥ x_{N}^{⊤} ∥_{2}^{2} ∥ u ∥_{2}^{2} - ∥ x_{N}^{' ⊤} ∥_{2}^{2} ∥ u ∥_{2}^{2}\} \leq 1 . \end{matrix}

□

After computing the

L_{2}

-sensitivity of

Λ_{j}

for

j = 0, 1

, and 2, we can now compute the noise array

e_{j} \sim N (0, τ_{j}^{2})

, where

τ_{j} = \frac{Δ_{j}}{ϵ} \sqrt{2 log \frac{1.25}{δ}}

, and then compute

\{{\hat{Λ}}_{j}\}

following (5). Using these, we can compute the

(ϵ, δ)

differentially private

{\hat{f}}_{D} (w)

according to (6), and consequently, the minimizer

{\hat{w}}^{*} = {arg min}_{w} {\hat{f}}_{D} (w)

. Note that, unlike the existing FM and relaxed FM, the additive noise variances of our proposed Gaussian FM do not depend on the sample dimension D. More specifically, for the linear regression problem, the

L_{1}

-sensitivity of the coefficients in FM [] is

Δ^{f m} = \frac{2}{N} {(1 + D)}^{2}

and the

L_{2}

-sensitivity of the coefficients in relaxed FM [] is

Δ^{r l x - f m} = \frac{2}{N} \sqrt{1 + 4 D + D^{2}}

(see Appendix A for the proof). Both of these sensitivities are orders of magnitude larger than

Δ_{j}

that we achieved for

j \in {0, 1, 2}

, and for practical values of D and N. Thus, the proposed Gaussian FM can offer the (

ϵ, δ

)-differentially private approximation

{\hat{f}}_{D} (w)

with much less noise, which results in a (

ϵ, δ

)-differentially private model

{\hat{w}}^{*}

that is much closer to the true model

w^{*}

. We show empirical validation on synthetic and real datasets in Section 6.

4.2. Logistic Regression

For the logistic regression problem, we assume

y_{n} \in \{0, 1\}

to be the class labels. The class label is approximated using the sigmoid function defined as

f_{s i g} (z) = \frac{1}{1 + exp (- z)}

. Let

w \in R^{D}

be the parameter vector. The goal of logistic regression is to find the optimal

w^{*}

so that

f_{s i g} (x_{n}^{⊤} w^{*}) \approx y_{n}

. The empirical average cost function for logistic regression is defined as

\begin{matrix} f_{D} (w) & = - \frac{1}{N} \sum_{n = 1}^{N} y_{n} log (f_{s i g} (x_{n}^{⊤} w)) + (1 - y_{n}) log (1 - f_{s i g} (x_{n}^{⊤} w)) \\ = \frac{1}{N} \sum_{n = 1}^{N} log (1 + exp (x_{n}^{⊤} w)) - y_{n} x_{n}^{⊤} w . \end{matrix}

(10)

Unlike linear regression, the simplified form of

f_{D} (w)

in the second line cannot be represented with a finite series of polynomials. Zhang et al. [] proposed an approximate polynomial form of

f_{D} (w)

using Taylor series expansion, written as

{\tilde{f}}_{D} (w) = \frac{1}{N} \sum_{n = 1}^{N} \sum_{k = 0}^{2} \frac{f_{1}^{(k)} (0)}{k!} {(x_{n}^{⊤} w)}^{k} - \frac{1}{N} \sum_{n = 1}^{N} y_{n} x_{n}^{⊤} w .

Using simple algebra and the values of

f_{1}^{(k)} (0)

for

k = 0, 1

, and 2, i.e.,

f_{1}^{(0)} (0) = log 2

,

f_{1}^{(1)} (0) = \frac{1}{2}

, and

f_{1}^{(k)} (0) = \frac{1}{4}

, we obtain

{\tilde{f}}_{D} (w) = log 2 + \sum_{d = 1}^{D} (\frac{1}{N} \sum_{n = 1}^{N} (\frac{1}{2} - y_{n}) x_{n d}) w_{d} + \sum_{d_{1} = 1}^{D} \sum_{d_{2} = 1}^{D} (\frac{1}{8 N} \sum_{n = 1}^{N} x_{n d_{1}} x_{n d_{2}}) w_{d_{1}} w_{d_{2}} .

As before, we intend to compute the differentially private minimizer

{\hat{w}}^{*}

, and we observe that the representation of

{\tilde{f}}_{D} (w)

is of the form

f_{D} (w) = \sum_{j = 0}^{J} ⟨Λ_{j}, {\bar{ϕ}}_{j}⟩

with

J = 2

. The expressions for

Λ_{j}

are

\begin{matrix} Λ_{0} & = log 2, \\ Λ_{1} & = \frac{1}{N} [\begin{matrix} \sum_{n = 1}^{N} (\frac{1}{2} - y_{n}) x_{n 1} \\ \sum_{n = 1}^{N} (\frac{1}{2} - y_{n}) x_{n 2} \\ ⋮ \\ \sum_{n = 1}^{N} (\frac{1}{2} - y_{n}) x_{n D} \end{matrix}], \\ Λ_{2} & = \frac{1}{8 N} [\begin{matrix} \sum_{n = 1}^{N} x_{n 1}^{2} & \dots & \sum_{n = 1}^{N} x_{n 1} x_{n D} \\ ⋮ & ⋱ & ⋮ \\ \sum_{n = 1}^{N} x_{n D} x_{n 1} & \dots & \sum_{n = 1}^{N} x_{n D}^{2} \end{matrix}] = \frac{1}{8 N} (X X^{⊤}) . \end{matrix}

Again,

Λ_{j}

is a scalar, a D-dimensional vector, and a

D \times D

matrix for

j = 0, 1

, and 2, respectively. We can express

{\bar{ϕ}}_{j}

for

j = 0, 1

, and 2 the same way as we did for linear regression in Section 4.1. To compute the sensitivities of

\{Λ_{j}\}

using (4), let

D

and

D^{'}

be two neighboring datasets differing in only the last samples, which are

(x_{N}, y_{N})

and

(x_{N}^{'}, y_{N}^{'})

, respectively. Now, the

L_{2}

-sensitivity of

Λ_{0}

is

Δ_{0} = {max}_{D, D^{'}} ∥ log 2 - log 2 ∥_{2} = 0

. The

L_{2}

-sensitivity of

Λ_{1}

is

\begin{matrix} Δ_{1} & = max_{D, D^{'}} ∥ \frac{1}{N} (\frac{1}{2} - y_{N}) x_{N} - \frac{1}{N} (\frac{1}{2} - y_{N}^{'}) x_{N}^{'} ∥_{2} \\ \leq \frac{1}{N} max_{D, D^{'}} (| \frac{1}{2} - y_{N} | ∥ x_{N} ∥_{2} + | \frac{1}{2} - y_{N}^{'} | ∥ x_{N}^{'} ∥_{2}) \\ \leq \frac{1}{N}, \end{matrix}

where

| \frac{1}{2} - y_{N} | \leq \frac{1}{2}

, since

y_{n} \in \{0, 1\}

, and

∥ x_{n} ∥_{2} \leq 1

. Finally, the

L_{2}

-sensitivity of

Λ_{2}

is

\begin{matrix} Δ_{2} & = max_{D, D^{'}} ∥ \frac{1}{8 N} (X X^{⊤}) - \frac{1}{8 N} (X^{'} X^{' ⊤}) ∥_{2} \\ = \frac{1}{8 N} max_{D, D^{'}} ∥ x_{N} x_{N}^{⊤} - x_{N}^{'} x_{N}^{' ⊤} ∥_{2} \\ \leq \frac{1}{8 N}, \end{matrix}

where the inequality follows from the expression for the norm of a symmetric matrix, as shown in Section 4.1. After computing the

L_{2}

-sensitivity of

Λ_{j}

for

j = 0, 1

, and 2, we can now compute the noise array

e_{j} \sim N (0, τ_{j}^{2})

, where

τ_{j} = \frac{Δ_{j}}{ϵ} \sqrt{2 log \frac{1.25}{δ}}

, and then compute

\{{\hat{Λ}}_{j}\}

following (5). Using these, we can compute the

(ϵ, δ)

differentially-private

{\hat{f}}_{D} (w)

according to (6), and consequently, the minimizer

{\hat{w}}^{*} = {arg min}_{w} {\hat{f}}_{D} (w)

. Again we note that the

L_{1}

-sensitivity of the coefficients in FM [] is

Δ^{f m} = \frac{1}{N} (\frac{D^{2}}{4} + 3 D)

and the

L_{2}

-sensitivity of the coefficients in relaxed FM [] is

Δ^{r l x - f m} = \frac{1}{N} \sqrt{\frac{D^{2}}{16} + D}

for logistic regression. As in the case of linear regression, both of these sensitivities are orders of magnitude larger than

Δ_{j}

that we achieved for

j \in {1, 2}

, and for practical values of D and N. Since additive noise variances of our proposed Gaussian FM do not depend on the sample dimension D, we obtain

{\hat{f}}_{D} (w)

, the (

ϵ, δ

)-differentially private approximation to

{\tilde{f}}_{D} (w)

, with much less noise. As mentioned before, we validate our analysis empirically using synthetic and real datasets in Section 6.

4.3. Avoiding Unbounded Noisy Objective Functions

Our proposed Gaussian FM achieves (

ϵ

,

δ

)-DP by injecting noise drawn from a Gaussian distribution into the coefficients of the Stone–Weierstrass decomposition of the empirical average objective function. However, the injection of noise may render the objective function unbounded, which means there may not exist any optimal solution for the noisy objective function. As shown in Section 4.1 and Section 4.2, the Stone–Weierstrass decomposition would transform the objective functions of linear and logistic regression problems into quadratic polynomials in our Gaussian FM. Let

{\hat{f}}_{D} (w) = w^{⊤} M w + α^{⊤} w + β

be the matrix representation of the quadratic polynomial, where

M

is a symmetric and positive semi-definite matrix,

α

is a D-dimensional vector and

β

is a scalar. After injection of noise, the noisy objective function becomes

{\hat{f}}_{D} (w) = w^{⊤} \hat{M} w + {\hat{α}}^{⊤} w + \hat{β}

. In order to ensure that

{\hat{f}}_{D} (w)

is bounded after introducing noise, it suffices to make sure

\hat{M}

is also symmetric and positive semi-definite [].

We follow the seminal work of Dwork et al. [] in our implementation—the symmetry of

\hat{M}

is ensured by constructing the noise matrix in such a way that noise is first drawn from the Gaussian distribution to form an upper triangular matrix, and the elements of the upper triangle part of the matrix (excluding the diagonal elements) are then copied to its lower triangle part. Adding the symmetric noise matrix to

M

results in a symmetric

\hat{M}

. However,

{\hat{f}}_{D} (w)

may still be unbounded if

\hat{M}

is not positive semi-definite. To resolve this, we perform eigen-decomposition of

\hat{M}

to obtain the eigenvalues and corresponding eigenvectors. We then project the eigenvalues onto the non-negative orthant. Let

Q^{⊤} S Q

be the eigen-decomposition of

\hat{M}

, where

Q

is a

D \times D

matrix containing an eigenvector of

\hat{M}

in each row, and

S

is a diagonal matrix where the i-th diagonal element is the eigenvalue of

\hat{M}

corresponding to the eigenvector in the i-th row of

Q

. We can write

{\hat{f}}_{D} (w) = w^{⊤} (Q^{⊤} S Q) w + {\hat{α}}^{⊤} w + \hat{β} .

If the i-th diagonal element of

S

is negative, we turn that entry to zero. After this projection onto the non-negative orthant, let the resulting matrix be

\hat{S}

, where any i-th diagonal element is bigger than or equal to zero. The noisy objective function then becomes

{\hat{f}}_{D} (w) = w^{⊤} (Q^{⊤} \hat{S} Q) w + {\hat{α}}^{⊤} w + \hat{β},

where

(Q^{⊤} \hat{S} Q)

is symmetric positive semi-definite. Thus,

{\hat{f}}_{D} (w)

is bounded. Since all of these are performed after the differentially-private noise addition, we can invoke the post-processing invariability of differential privacy and guarantee that

{\hat{f}}_{D} (w)

is (

ϵ

,

δ

)-differentially private. Consequently, the minimizer

{\hat{w}}^{*}

also satisfies (

ϵ

,

δ

) differential privacy. Note that it is possible for all the eigenvalues of the differentially private estimate of the

M

matrix to be negative. We leave the solution to such cases for future work.

5. Extension of Gaussian FM to Decentralized-Data Setting: $capeFM$

In many signal processing and machine learning applications, the privacy-sensitive user data being collected/used are of decentralized nature. Training machine learning and neural-network-based models on such a huge amount of data is certainly lucrative from an algorithmic perspective, but privacy constraints often make it challenging to share such datasets with a central aggregator. However, training locally at one node/site is infeasible due to the number of samples in each node/site could be too small for meaningful model training. Decentralized DP can benefit such research work by allowing data owners to share information while maintaining local privacy. The conventional decentralized DP scheme, however, always results in a degradation in performance compared to that of the pooled-data scenario. In this section, we first describe the problem with conventional decentralized DP. Then we review the

CAPE

scheme [] in brief, as we employ the

CAPE

scheme into our Gaussian FM to propose

capeFM

.

The Decentralized-data Setting. In line with our discussions in Section 2.2, let us consider a decentralized data setting with S sites and a central aggregator node. We assume an “honest but curious” threat model []: all parties follow the protocol honestly, but a subset are “curious” and can collude (maybe with an external adversary) to learn other sites’ data/function outputs. For simplicity, we consider the symmetric setting: each site

s \in [S]

holds a dataset

D_{s}

of

N_{s} = \frac{N}{S}

disjoint data samples

(x_{s, n}, y_{s, n})

, where the total number of samples across all sites is N, and

x_{s, n} \in R^{D}

. The cost incurred by the model parameters

w \in R^{D}

due to one data sample is

f (x_{s, n}; w) : R^{D} \times R^{D} \mapsto R

. We need to minimize the average cost to find the optimal

w^{*}

. The empirical average cost for a particular

w

over all the samples is expressed as

\begin{matrix} f_{D} (w) & = \frac{1}{N} \sum_{s = 1}^{S} \sum_{n = 1}^{N_{s}} f (x_{s, n}; w) = \frac{1}{S} \sum_{s = 1}^{S} \frac{1}{N_{s}} \sum_{n = 1}^{N_{s}} f (x_{s, n}; w) . \end{matrix}

According to (3), the above expression can be written as

\begin{matrix} f_{D} (w) & = \frac{1}{S} \sum_{s = 1}^{S} \sum_{j = 0}^{J} ⟨Λ_{j}^{s}, {\bar{ϕ}}_{j}⟩ = \sum_{j = 0}^{J} ⟨Λ_{j}, {\bar{ϕ}}_{j}⟩, \end{matrix}

where

Λ_{j}^{s}

contains

\frac{1}{N_{s}} \sum_{n = 1}^{N_{s}} λ_{ϕ s, n}

as its entries for all

ϕ (w) \in Φ_{j}

at site s,

Λ_{j} = \frac{1}{S} \sum_{s = 1}^{S} Λ_{j}^{s}

, and

{\bar{ϕ}}_{j}

is the array containing all

ϕ (w) \in Φ_{j}

as its entries. Finally, we can compute the minimizer:

\begin{matrix} w^{*} & = \underset{w}{arg min} f_{D} (w) = \underset{w}{arg min} \sum_{j = 0}^{J} ⟨Λ_{j}, {\bar{ϕ}}_{j}⟩ . \end{matrix}

5.1. Problems with Conventional Decentralized DP Computations

In this section, we discuss the problems with conventional decentralized DP schemes []. Consider estimating the mean

f (x) = \frac{1}{N} \sum_{n = 1}^{N} x_{n}

of N scalars

x = {[x_{1}, \dots, x_{N - 1}, x_{N}]}^{⊤}

, where each

x_{n} \in [0, 1]

. The

L_{2}

-sensitivity of the function

f (x)

is

\frac{1}{N}

. Therefore, for computing the

(ϵ, δ)

-DP estimate of the average

a = f (x)

, we can follow the Gaussian mechanism [] to release

{\hat{a}}_{p o o l} = a + e_{p o o l}

, where

e_{p o o l} \sim N (0, τ_{p o o l}^{2})

and

τ_{p o o l} = \frac{1}{N ϵ} \sqrt{2 log \frac{1.25}{δ}}

.

Suppose now that the N samples are equally distributed among S sites. An aggregator wishes to estimate and publish the mean of all the samples. For preserving privacy, the conventional DP approach is for each site s to release (or send to the aggregator node) an

(ϵ, δ)

-DP estimate of the function

a_{s} = f (x_{s})

as:

{\hat{a}}_{s} = f (x_{s}) + e_{s}

, where

e_{s} \sim N (0, τ_{s}^{2})

and

τ_{s} = \frac{1}{N_{s} ϵ} \sqrt{2 log \frac{1.25}{δ}} = \frac{S}{N ϵ} \sqrt{2 log \frac{1.25}{δ}}

. The aggregator can then compute the

(ϵ, δ)

-DP approximate average as

{\hat{a}}_{c o n v} = \frac{1}{S} \sum_{s = 1}^{S} {\hat{a}}_{s} = \frac{1}{S} \sum_{s = 1}^{S} a_{s} + \frac{1}{S} \sum_{s = 1}^{S} e_{s} .

The variance of the estimator

{\hat{a}}_{c o n v}

is

S \cdot \frac{τ_{s}^{2}}{S^{2}} = \frac{τ_{s}^{2}}{S} ≜ τ_{c o n v}^{2}

. We observe the ratio

\frac{τ_{p o o l}^{2}}{τ_{c o n v}^{2}} = \frac{τ_{s}^{2} / S^{2}}{τ_{s}^{2} / S} = \frac{1}{S} .

That is, the decentralized DP averaging scheme will always result in a poorer performance than the pooled-data case. Imtiaz et al. [] proposed the

CAPE

protocol that improves the performance of such systems by assuming the availability of some reasonable resources.

5.2. Correlation Assisted Private Estimation ( $CAPE$ )

Trust/Collusion Model. In order to incorporate the

CAPE

scheme to our proposed Gaussian FM in a decentralized data setting, we assume a similar trust model as in []. As mentioned before, we assume all of the S sites and the central aggregator node to be honest-but-curious. That is, the sites and central node can collude with an adversary to learn about the data or function output of some other site. We assume that up to

S_{C} = ⌈ \frac{S}{3} ⌉ - 1

sites, as well as the central node can collude with an adversary. In addition to having access to the outputs from each site and the aggregator, the adversary can know everything about the

S_{C}

colluding sites, including their private data. Denoting the non-colluding sites with

S_{H}

, we have

S = S_{C} + S_{H}

.

Correlated Noise and the $CAPE$ Protocol. Imtiaz et al. [] proposed a novel framework that ensures

(ϵ, δ)

-DP guarantee of the output from each site, while achieving the same noise level of the pooled-data scenario in the final output from the aggregator. In the

CAPE

scheme, each site

s \in [S]

first generates two noise terms:

g_{s} \sim N (0, τ_{g}^{2})

locally, and

e_{s} \sim N (0, τ_{e}^{2})

jointly with all other sites such that

\sum_{s = 1}^{S} e_{s} = 0

. The correlated noise term

e_{s}

is generated by employing the secure aggregation protocol (

SecureAgg

) by Bonawitz et al. [], which utilizes Shamir’s t-out-of-n secret sharing [] and is communication-efficient. The procedure is outlined in Algorithm 2.

Algorithm 2 Generate Zero-Sum Noise

Require:: Local noise variances ${τ_{s}^{2}}$ ; security parameter $λ$ ; threshold value t
1:: Each site generates ${\hat{e}}_{s} \sim N (0, τ_{s}^{2})$
2:: Aggregator computes $\sum_{s = 1}^{S} {\hat{e}}_{s}$ according to $SecureAgg$ ( $λ, t$ ) []
3:: Aggregator broadcasts $\sum_{s = 1}^{S} {\hat{e}}_{s}$ to all sites $s \in [S]$
4:: Each site computes $e_{s} = {\hat{e}}_{s} - \frac{1}{S} \sum_{s^{'} = 1}^{S} {\hat{e}}_{s^{'}}$
5:: return $e_{s}$

Note that neither of the terms

e_{s}

and

g_{s}

has large enough variance to provide an acceptable

(ϵ, δ)

-DP guarantee. However, the variances of

e_{s}

and

g_{s}

are chosen in such a way that the noise

e_{s} + g_{s}

is sufficient to ensure a stringent DP guarantee to

f (x_{s})

at site s. We observe that the variance of

e_{s}

is given by

τ_{e}^{2} = (1 - \frac{1}{S}) τ_{s}^{2}

and the variance of

g_{s}

is set to

τ_{g}^{2} = \frac{τ_{s}^{2}}{S}

[]. Considering the decentralized mean computation problem of Section 5.1, under the

CAPE

scheme, each site sends

{\hat{a}}_{s} = f (x_{s}) + e_{s} + g_{s}

to the aggregator. We can then compute the following at the aggregator

a_{c a p e} = \frac{1}{S} \sum_{s = 1}^{S} {\hat{a}}_{s} = \frac{1}{S} \sum_{s = 1}^{S} f (x_{s}) + \frac{1}{S} \sum_{s = 1}^{S} g_{s},

where we used

\sum_{s = 1}^{S} e_{s} = 0

. The variance of the estimator

a_{c a p e}

is

τ_{c a p e}^{2} = S \cdot \frac{τ_{g}^{2}}{S^{2}} = τ_{p o o l}^{2}

, which is exactly the same as if all the data were present at the aggregator. This claim is formalized in Lemma 1 [] in Section 2.1. That is, the

CAPE

protocol achieves the same noise variance as the pooled-data scenario in the symmetric decentralized-data setting.

5.3. Proposed Gaussian FM for Decentralized Data $(capeFM)$

For employing the

CAPE

scheme to extend our proposed Gaussian FM for decentralized-data setting, we need to generate the zero-sum noise. We can readily extend Algorithm 2 to generate array-valued zero-sum noise terms for each of the

Λ_{j}

terms of the decomposition (3). That is, according to the

CAPE

scheme, the sites generate the noise

e_{j}^{s}

using Algorithm 2, such that

\sum_{s = 1}^{S} e_{j}^{s} = 0

holds for all

j \in {0, \dots, J}

. The sites also generate noise

g_{j}^{s}

with entries i.i.d.

\sim N (0, {τ_{j g}^{s}}^{2})

. The sites then compute the perturbed coefficient arrays locally as

{\hat{Λ}}_{j}^{s} = Λ_{j}^{s} + e_{j}^{s} + g_{j}^{s}

for all

j \in {0, \dots, J}

and send

{\hat{Λ}}_{j}^{s}

to the central aggregator. Note that

e_{j}^{s}

and

g_{j}^{s}

are arrays of the same dimension as

Λ_{j}^{s}

. Now, the aggregator simply computes the average of each coefficient term for all

j \in {0, \dots, J}

as

\begin{matrix} {\hat{Λ}}_{j} & = \frac{1}{S} \sum_{s = 1}^{S} {\hat{Λ}}_{j}^{s} = \frac{1}{S} \sum_{s = 1}^{S} Λ_{j}^{s} + \frac{1}{S} \sum_{s = 1}^{S} g_{j}^{s}, \end{matrix}

because

\sum_{s} e_{j}^{s} = 0

. The aggregator then uses these

{{\hat{Λ}}_{j}}

to compute

{\hat{f}}_{D} (w) = \sum_{j = 0}^{J} ⟨{\hat{Λ}}_{j}, {\bar{ϕ}}_{j}⟩

and release

{\hat{w}}^{*} = {arg min}_{w} {\hat{f}}_{D} (w)

. The privacy of

capeFM

follows directly from Theorem 1 and Theorem 2. It follows from Lemma 1 [] that in the symmetric setting (i.e.,

N_{s} = \frac{N}{S}

and

τ_{j}^{s} = τ_{j}

for all sites

s \in [S]

and all

j \in {0, 1, \dots, J}

), the noise variance achieved at the aggregator is the same as that of the pooled-data scenario. Additionally, the performance gain of

capeFM

over any conventional decentralized functional mechanism is given by Proposition 4. We refer to our proposed decentralized functional mechanism as

capeFM

, shown in Algorithm 3.

Algorithm 3 Proposed Decentralized Gaussian FM (

capeFM

)

Require:: Data samples $(x_{s, n}, y_{s, n})$ for $s \in [S]$ ; cost function $f_{D} (w)$ as in (3); local noise variances ${τ_{j}^{2}}$ for all $j \in {0, \dots, J}$
1:: for $0 \leq s \leq S$ do
2:: for $0 \leq j \leq J$ do
3:: Compute $Λ_{j}^{s}$ as shown in Section 4
4:: Generate $e_{j}^{s}$ according to Algorithm 2 (entrywise)
5:: Compute ${τ_{j g}^{s}}^{2} = \frac{{τ_{j}^{s}}^{2}}{S}$
6:: Generate $g_{j}^{s}$ with entries i.i.d. $\sim N (0, {τ_{j g}^{s}}^{2})$
7:: Compute ${\hat{Λ}}_{j}^{s} = Λ_{j}^{s} + e_{j}^{s} + g_{j}^{s}$
8:: end for
9:: end for
10:: At the central aggregator, compute for all $j \in {0, \dots, J}$ : ${\hat{Λ}}_{j} = \frac{1}{S} \sum_{s = 1}^{S} {\hat{Λ}}_{j}^{s}$
11:: Compute ${\hat{f}}_{D} (w) = \sum_{j = 0}^{J} ⟨{\hat{Λ}}_{j}, {\bar{ϕ}}_{j}⟩$
12:: return Perturbed objective function ${\hat{f}}_{D} (w)$

5.4. Computation and Communication Overhead of $capeFM$

We analyze the computation and communication costs associated with the proposed

capeFM

algorithm according to [,] for the decentralized linear regression and logistic regression problems. At each iteration round, we need to generate the zero-sum noise terms

e_{j}^{s}

, which entails

O (S + D^{2})

communication complexity of the sites and

O (S^{2} + S D^{2})

communication complexity of the aggregator []. Each site computes the noisy coefficient arrays

Λ_{j}^{s}

and sends those to the aggregator, incurring an

O (D^{2})

communication cost for the sites. Therefore, the total communication cost is

O (S + D^{2})

for the sites and

O (S^{2} + S D^{2})

for the aggregator node. On the other hand, the zero-sum noise generation entails

O (S^{2} + S D^{2})

computation cost at the sites and

O (S^{2} D^{2})

computation cost at the aggregator []. This is expected since the largest coefficient arrays we are computing/sending are

D \times D

matrices in the decentralized setting. Note that we are not incorporating the computation cost of

{\hat{w}}^{*} = {arg min}_{w} {\hat{f}}_{D} (w)

.

6. Experimental Results

In this section, we empirically compare the performance of our proposed Gaussian FM algorithm (gauss-fm) with those of some state-of-the-art differentially private linear and logistic regression algorithms, namely noisy gradient descent (noisy-gd) [], objective perturbation (obj-pert) [], original functional mechanism (fm) [], and relaxed functional mechanism (rlx-fm) []. We also compare the performance of these algorithms with non-private linear and logistic regression (non-priv). As mentioned before, we compute the overall

ϵ

using RDP for the multi-round noisy-gd algorithm. Additionally, we show how our proposed decentralized functional mechanism (cape-fm) can improve a decentralized computation if the target function has sensitivity satisfying the conditions of Proposition 5 in Section 2.1. We show the variation in performance with privacy parameters and number of training samples. For the decentralized setting, we further show the empirical performance comparison by varying the number of sites.

Performance Indices. For the linear regression task, we use the mean squared error (MSE) as the performance index. Let the test dataset be

D_{test} = {(x_{n}, y_{n}) \in X \times Y : n \in [N_{test}]}

. Then the MSE can be defined as:

MSE = \frac{1}{N_{test}} \sum_{n = 1}^{N_{test}} {({\hat{y}}_{n} - y_{n})}^{2}

, where

{\hat{y}}_{n}

is the prediction from the algorithm. For the classification task, we use accuracy as the performance index. The accuracy can be defined as:

Accuracy = \frac{1}{N_{test}} \sum_{n = 1}^{N_{test}} I (round ({\hat{y}}_{n}) = y_{n})

, where

I (\cdot)

is the indicator function, and

{\hat{y}}_{n}

is the prediction from the algorithm. Note that, in addition to a small MSE or large accuracy, we want to attain a strict privacy guarantee, i.e., small overall

(ϵ, δ)

values. Recall from Section 3 that the overall

ϵ

for multi-shot algorithms is a function of the number of iterations, the target

δ

, the additive noise variance

τ^{2}

and the

L_{2}

sensitivity

Δ

. To demonstrate the overall

ϵ

guarantee for a fixed target

δ

, we plotted the overall

ϵ

(with dotted red lines on the right y-axis) along with MSE/accuracy (with solid blue lines on the left y-axis) as a means for visualizing how the privacy–utility trade-off varies with different parameters. For a given privacy budget (or performance requirement), the user can use the overall

ϵ

plot on the right y-axis, shown with dotted lines, (or MSE/accuracy plot on the left y-axis, shown with solid lines) to find the required noise standard deviation

τ

on the x-axis and, thereby, find the corresponding performance (or overall

ϵ

). We compute the overall

ϵ

for the noisy-gd algorithm using the RDP technique shown in Section 3.

6.1. Linear Regression

For the linear regression problem, we perform experiments on three real datasets (and a synthetic dataset, as shown in Appendix B). The pharmacogenetic dataset was collected by the International Warfarin Pharmacogenetics Consortium (IWPC) [] for the purpose of estimating personalized warfarin dose based on clinical and genotype information of a patient. The data used for this study have ambient dimension

D = 9

, and features are collected from

N = 5052

patients. Out of the wide variety of numerical modeling methods used in [], linear regression provided the most accurate dose estimates. Fredrikson et al. [] later implemented an attack model assuming an adversary who employed an inference algorithm to discover the genotype of a target individual, and showed that an existing functional mechanism (fm) failed to provide a meaningful privacy guarantee to prevent such attacks. We perform privacy-preserving linear regression on the IWPC dataset (Figure 1a–c) to show the effectiveness of our proposed gauss-fm over fm, rlx-fm, and other existing approaches. Additionally, we use the Communities and Crime dataset (crime) [], which has a larger dimensionality

D = 101

(Figure 1d–f), and the Buzz in Social Media dataset (twitter) [] with

D = 77

and a large sample size

N = 10, 000

(Figure 1g–i). We refer the reader to [] for a detailed description of these real datasets. For all the experiments, we pre-process the data so that the samples satisfy the assumptions

∥ x_{n} ∥_{2} \leq 1

and

y_{n} \in [- 1, 1]

∀

n \in [N]

. We divide each dataset into train and test partitions with a ratio of 90:10. We show the average performance over 10 independent runs.

Figure 1. Linear regression performance comparison in terms of MSE and overall

ϵ

for IWPC

(D = 9)

, crime

(D = 101)

, and twitter

(D = 77)

datasets with varying noise standard deviation

τ

in (a,d,g) the number of training samples

N_{t r a i n}

in (b,e,h), and privacy parameter

δ

in (c,f,i).

Performance Comparison with Varying $τ$ . We first investigate the variation of MSE with the DP additive noise standard deviation

τ

. We plot MSE against

τ

in Figure 1a,d,g. Recall from Definition 3 that, in the Gaussian mechanism, the noise is drawn from a Gaussian distribution with standard deviation

τ = \frac{Δ}{ϵ} \sqrt{2 log \frac{1.25}{δ}}

. We keep

δ

fixed at

10^{- 5}

. Note that one can vary

ϵ

to vary

τ

. Since noise standard deviation is inversely proportional to

ϵ

, increasing

ϵ

means decreasing

τ

, i.e., smaller noise variance. We observe from the plots that smaller

τ

leads to smaller MSE for all DP algorithms, indicating better utility at the expense of higher privacy loss. It is evident from these MSE vs.

τ

plots that our proposed method gauss-fm has much smaller MSE compared to all the other methods for the same

τ

values for all datasets. The obj-pert and fm algorithms offer pure DP by trading off utility, whereas gauss-fm and rlx-fm algorithms offer approximate DP. Although rlx-fm improves upon fm, the excess noise due to linear dependence on data dimension D leads to higher MSE than gauss-fm. Our proposed gauss-fm outperforms all of these methods by reducing the additive noise with the novel sensitivity analysis as shown in Section 4. We recall that the overall privacy loss for noisy-gd is calculated using the RDP approach, since noise is injected into the gradients in every iteration during optimization, with target

δ = 10^{- 5}

. On the other hand, gauss-fm, rlx-fm, and fm add noise to the polynomial coefficients of the cost function

f_{D} (w)

before optimization, and obj-pert injects noise into the regularized cost function []. We plot the total privacy loss for all of the algorithms against

τ

. We observe from the y-axis on the right that the total privacy loss of the multi-round noisy-gd is considerably higher than the single-shot algorithms.

Performance Comparison with Varying $N_{t r a i n}$ . Next, we investigate the variation of MSE with the number of training samples

N_{t r a i n}

. For this task, we shuffle and divide the total number of samples N into smaller partitions and perform the same pre-processing steps, while keeping the test partition untouched. We kept the values of the privacy parameters fixed:

ϵ = 0.5

and

δ = 10^{- 5}

. We plot MSE against

N_{t r a i n}

in Figure 1b,e,h. We observe that performance generally improves with the increase in

N_{t r a i n}

, which indicates that it is easier to ensure the same level of privacy when the training dataset cardinality is higher. We also observe from the MSE vs.

N_{t r a i n}

plots that our proposed method gauss-fm offers MSE very close to that of non-priv even for moderate sample sizes, outperforming fm, rlx-fm, noisy-gd, and obj-pert. Again, we compute the overall

ϵ

spent using RDP for noisy-gd, and show that the multi-round algorithm suffers from larger privacy loss. Recall from (7) in Section 3 that the overall

ϵ

depends on sensitivity

Δ

, and the number of iterations T. In the computation of

\frac{τ^{2}}{Δ^{2}}

, the number of training samples

N_{t r a i n}

is cancelled out. Thus, the overall

ϵ

depends only on T for noisy-gd. We keep T fixed at 1000 iterations for noisy-gd and observe that the overall privacy risk exceeds 20. Note that we set the value of the target

δ_{r}

in (7) to be equal to

δ

in our computations.

Performance Comparison with Varying $δ$ . Recall that we can interpret the privacy parameter

δ

as the probability that an algorithm fails to provide privacy risk

ϵ

. The obj-pert and fm algorithms offer pure

ϵ

-DP, where the additional privacy parameter

δ

is zero. Hence, we compare our proposed gauss-fm method with the rlx-fm and noisy-gd methods, which also guarantee (

ϵ

,

δ

)-DP. In the Gaussian mechanism,

δ

is in the denominator of the logarithmic term within the square root in the expression of

τ

. Therefore, the noise variance

τ^{2}

is not significantly changed by varying

δ

. We keep privacy parameter

ϵ

fixed at

0.5

and observe from the MSE vs.

δ

plots in Figure 1c,f,i show that the performance of our algorithm does not degrade much for smaller

δ

. For the IWPC dataset in Figure 1c, for a value of

δ

as small as

10^{- 2}

(indicating

1 %

probability of the algorithm failing to provide

ϵ

-differential privacy), the MSE of gauss-fm is almost the same as that of the non-priv case. For the other datasets, our proposed method also gives better performance and overall

ϵ

, and thus a better privacy–utility trade-off than rlx-fm and noisy-gd.

6.2. Logistic Regression

For the logistic regression problem, we again perform experiments on three real datasets (and a synthetic dataset, as shown in Appendix B): the Phishing Websites dataset (phishing) [] with dimensionality

D = 30

(Figure 2a–c), the Census Income dataset (adult) [] with

D = 13

(Figure 2d–f), and the KDD Cup ’99 dataset (kdd) [] with

D = 36

(Figure 2g–i). As before, we pre-process the data so that the feature vectors satisfy

∥ x_{n} ∥_{2} \leq 1

, and

y_{n} \in \{0, 1\}

∀

n \in [N]

. Note for obj-pert that the cost function is regularized and the labels are assumed to be

\{- 1, 1\}

in []. We divide each dataset into train and test partitions with a ratio of 90:10. We use percent accuracy on the test dataset as the performance index for logistic regression, and show the average performance over 10 independent runs.

Figure 2. Logistic regression performance comparison in terms of accuracy and overall

ϵ

for phishing

(D = 30)

, adult

(D = 13)

, and kdd

(D = 36)

datasets with varying noise standard deviation

τ

in (a,d,g), the number of training samples

N_{t r a i n}

in (b,e,h), and privacy parameter

δ

in (c,f,i).

Performance Comparison with Varying $τ$ . We plot accuracy against the DP additive noise standard deviation

τ

in Figure 2a,d,g. We observe that accuracy degrades when the additive DP noise standard deviation

τ

increases, indicating a greater privacy guarantee at the cost of performance. When noise is too high, privacy-preserving logistic regression may not learn a meaningful

w

at all, and provide random results. Depending on the class distribution, this may not be obvious and the accuracy score may be misleading. We observe this for the kdd dataset in Figure 2g, where the classes are highly imbalanced, with ∼80% positive labels. Although the existing fm performs poorly on this dataset, our proposed gauss-fm provides significantly higher accuracy for all datasets, outperforming fm, as well as rlx-fm, obj-pert, and noisy-gd. As before, we observe the total privacy loss, i.e., overall

ϵ

spent, from the y-axis on the right.

Performance Comparison with Varying $N_{t r a i n}$ . We perform the same steps described in Section 6.1 and observe the variation in performance with the number of training samples,

N_{t r a i n}

while keeping the privacy parameters fixed in Figure 2b,e,h. Accuracy generally improves with increasing

N_{t r a i n}

. We observe that the same DP algorithm does not perform equally well for different datasets. For example, obj-pert performs better than noisy-gd on the adult dataset (Figure 2e), whereas noisy-gd performs better than obj-pert on the phishing dataset (Figure 2b). In general, fm and rlx-fm suffer from too much noise due to the quadratic and linear dependence on D of their sensitivities, respectively. However, our proposed gauss-fm overcomes this issue and consistently achieves accuracy close to the non-priv case even for moderate sample sizes. We also show the overall privacy guarantee, as before.

Performance Comparison with Varying $δ$ . Similar to the linear regression experiments shown in Section 6.1, we keep

ϵ

and

N_{t r a i n}

fixed for this task and vary the other privacy parameter

δ

. Figure 2c,f,i show that percent accuracy improves with increased

δ

. For sufficiently large

δ

(indicating 1–5% probability of the algorithm failing to provide

ϵ

privacy risk), gauss-fm accuracy can reach that of the non-priv algorithm in some datasets (e.g., Figure 2i). Although the accuracy of noisy-gd also improves, it comes at the cost of additional privacy risk, as shown in the overall

ϵ

vs.

δ

plots along the y-axes on the right. Due to the higher noise variance, rlx-fm achieves much inferior accuracy compared to both gauss-fm and noisy-gd.

6.3. Decentralized Functional Mechanism ( $capeFM$ )

In this section, we empirically show the effectiveness of

capeFM

, our proposed decentralized Gaussian FM which utilizes the

CAPE

[] protocol. We implement differentially private linear and logistic regression for the decentralized-data setting using the same datasets described in Section 6.1 and Section 6.2, respectively. Note that the IWPC [] data were collected from 21 sites across 9 countries. After obtaining informed consent to use de-identified data from patients prior to the study, the Pharmacogenetics Knowledge Base has since made the dataset publicly available for research purpose. As mentioned before, the type of data contained in the IWPC dataset is similar to many other medical datasets containing private information [].

We implement our proposed cape-fm according to Algorithm 3, along with fm, rlx-fm, obj-pert, and noisy-gd according to the conventional decentralized DP approach. We compare the performance of these methods in Figure 3 and Figure 4. Similar to the pooled-data scenario, we also compare performance of these algorithms with non-private linear and logistic regression (non-priv). For these experiments, we assume

N_{s} = \frac{N}{S}

and

τ_{s} = τ

. Recall that the

CAPE

scheme achieves the same noise variance as the pooled-data scenario in the symmetric setting (see Lemma 1 [] in Section 2.1). As our proposed

capeFM

algorithm follows the

CAPE

scheme, we attain the same advantages. When varying privacy parameters and

N_{t r a i n}

, we keep the number of sites S fixed. Additionally, we show the variation in performance due to change in the number of sites in Figure 5. We pre-process each dataset as before, and use MSE and percent accuracy on test dataset as performance indices of the decentralized linear and logistic regression problems, respectively.

Figure 3. Decentralized linear regression performance comparison in terms of MSE and overall

ϵ

for IWPC

(D = 9)

, crime

(D = 101)

, and twitter

(D = 77)

datasets with varying noise standard deviation

τ

in (a,d,g), the number of training samples

N_{t r a i n}

in (b,e,h), and privacy parameter

δ

in (c,f,i).

Figure 4. Decentralized logistic regression performance comparison in terms of accuracy and overall

ϵ

for phishing

(D = 30)

, adult

(D = 13)

, and kdd

(D = 36)

datasets with varying noise standard deviation

τ

in (a,d,g), the number of training samples

N_{t r a i n}

in (b,e,h), and privacy parameter

δ

in (c,f,i).

Figure 5. Decentralized linear and logistic regression performance comparison and overall

ϵ

with varying number of sites S for the datasets (a) IWPC

(D = 9)

, (b) crime

(D = 101)

, (c) twitter

(D = 77)

, (d) phishing

(D = 30)

, (e) adult

(D = 13)

, and (f) kdd

(D = 36)

.

Performance Comparison by Varying $τ$ . For this experiment, we keep the total number of samples N, privacy parameter

δ

, and the number of sites S fixed. We observe from the plots (a), (d), and (g) in both Figure 3 and Figure 4 that as

τ

increases, the performance degrades. The proposed cape-fm outperforms conventional decentralized noisy-gd, obj-pert, fm, and rlx-fm by a larger margin than the pooled-data case. The reason for this is that we can achieve a much smaller noise variance at the aggregator due to the correlated noise scheme detailed in Section 5.3. The utility of cape-fm thus stays the same as the centralized case in the decentralized-data setting, whereas the conventional scheme’s utility always degrades by a factor of S (see Section 5.1). The overall

ϵ

usage vs.

τ

plots on the right y-axes for each site show that noisy-gd suffers from much higher privacy loss.

Performance Comparison by Varying $N_{t r a i n}$ . We keep

ϵ

,

δ

, and S fixed while investigating variation in performance with respect to

N_{t r a i n}

. As the sensitivities we computed in Section 4.1 and Section 4.2 are inversely proportional to the sample size, it is straightforward to infer that guaranteeing smaller privacy risk and higher utility is much easier when the sample size is large. Similar to the pooled-data cases in Section 6.1 and Section 6.2, we again observe from the plots (b), (e), and (h) in both Figure 3 and Figure 4 that, for sufficiently large

N_{t r a i n} = S N_{s, t r a i n}

, utility of cape-fm can reach that of the non-priv case. Note that the non-priv algorithms are the same as the pooled-data scenario, because if privacy is not a concern, all sites can send the data to aggregator for learning.

Performance Comparison by Varying $δ$ . For this task, we keep

ϵ

,

N_{t r a i n}

, and S fixed. Note according to the

CAPE

scheme that the proposed cape-fm algorithm guarantees

(ϵ, δ)

-DP where

(ϵ, δ)

satisfy the relation

δ = 2 \frac{σ_{z}}{ϵ - μ_{z}} ϕ (\frac{ϵ - μ_{z}}{σ_{z}})

. Recall that

δ

is the probability that the algorithm fails to provide privacy risk

ϵ

, and that we assumed a fixed number of colluding sites

S_{C} = ⌈ \frac{S}{3} ⌉ - 1

. From the plots (c), (f), and (i) in both Figure 3 and Figure 4, we observe that even for moderate values of

δ

, cape-fm easily outperforms rlx-fm and noisy-gd. Moreover, as seen from the overall

ϵ

plots, noisy-gd provides a much weaker privacy guarantee. Thus, our proposed cape-fm algorithm offers superior performance and privacy–utility trade-off in the decentralized setting.

Performance Comparison by Varying S. Finally, we investigate performance variation with the number of sites S, keeping the privacy and dataset parameters fixed. This automatically varies the number of samples

N_{s}

at each site

s \in [S]

, as we consider the symmetric setting. Figure 5a–c shows the results for decentralized linear regression, and Figure 5d–f shows the results for decentralized logistic regression. We observe that the variation in S does not affect the utility of cape-fm, as long as the number of colluding sites meets the condition

S_{C} \leq ⌈ \frac{S}{3} ⌉ - 1

. However, increasing S leads to significant degradation in performance for conventional decentralized DP mechanisms, since the additive noise variance increases as

N_{s}

decreases. We show additional experimental results on synthetic datasets in Appendix B.

7. Conclusions and Future Work

In this paper, we proposed Gaussian FM that offers a significant improvement over the existing FM to compute functions that are commonly used in signal processing and machine learning applications, satisfying differential privacy. Our improvement stems from a novel sensitivity analysis that resulted in an orders-of-magnitude reduction in the amount of noise added to the coefficients of the Stone–Weierstrass decomposition of the functions. We showed two common regression problems—linear and logistic regression—as examples to demonstrate our analyses. Additionally, we experimentally showed the superior privacy guarantee and utility of our proposed method over existing methods by varying privacy parameters and relevant dataset parameters for both synthetic and real datasets. We extended our Gaussian FM algorithm to decentralized data settings by taking advantage of a correlated noise protocol,

CAPE

, and proposed

capeFM

, which ensures the same utility as the pooled-data scenario in certain regimes. We empirically compared the performance of the proposed

capeFM

with that of existing and conventional algorithms for decentralized linear and logistic regression problems. In addition to varying privacy and dataset parameters, we showed performance comparison by varying the number of sites, which further proves the superior privacy guarantee and improved utility of our proposed method. For future work, we plan to extend our research to more complex algorithms and neural networks to ensure differential privacy on other challenging signal processing and machine learning problems.

Author Contributions

Conceptualization, methodology, formal analysis, N.T., J.M., A.D.S. and H.I.; software, data curation, N.T. and H.I.; supervision, H.I.; writing—original draft preparation, N.T.; writing—review and editing, H.I., J.M. and A.D.S.; funding acquisition, A.D.S. All authors have read and agreed to the published version of the manuscript.

Funding

The work of A.D. Sarwate was funded in part by the US National Science Foundation under awards CNS-2148104 and CIF-1453432 and by the US National Institutes of Health under award 2R01DA040487-01A1.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The experimental data used to evaluate the performance of the algorithms proposed in this paper are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Comparison of Sensitivity and Noise Standard Deviation

To provide further details and rationale behind the superior performance of our proposed Gaussian FM (gauss-fm) over the original FM [] (fm) and the relaxed FM [] (rlx-fm) algorithms, we compare the additive noise standard deviation

τ

for each mechanism by varying the privacy parameter

ϵ

for different values of data dimension D. Recall that

τ

is scaled to the sensitivity of the data-dependent terms in the Stone–Weierstrass [] decomposition of the objective function. The computed sensitivities for each of the three mechanisms are shown in Table A1.

Table A1. Comparison of sensitivities for various DP mechanisms.

		$Δ^{fm}$	$Δ^{rlx - fm}$	$Δ^{gauss - fm}$
Linear Regression	$j = 0$	$\frac{2}{N} {(1 + D)}^{2}$	$\frac{2}{N} \sqrt{1 + 4 D + D^{2}}$	$\frac{1}{N}$
	$j = 1$	$\frac{2}{N} {(1 + D)}^{2}$	$\frac{2}{N} \sqrt{1 + 4 D + D^{2}}$	$\frac{4}{N}$
	$j = 2$	$\frac{2}{N} {(1 + D)}^{2}$	$\frac{2}{N} \sqrt{1 + 4 D + D^{2}}$	$\frac{1}{N}$
Logistic Regression	$j = 1$	$\frac{1}{N} (\frac{D^{2}}{4} + 3 D)$	$\frac{1}{N} \sqrt{\frac{D^{2}}{16} + D}$	$\frac{1}{N}$
Logistic Regression	$j = 2$	$\frac{1}{N} (\frac{D^{2}}{4} + 3 D)$	$\frac{1}{N} \sqrt{\frac{D^{2}}{16} + D}$	$\frac{1}{8 N}$

As mentioned before, the sensitivity terms for our proposed gauss-fm are tailored to the order j, and do not depend on the ambient dimension D. On the other hand, the sensitivity terms for both fm and rlx-fm depend on D. This results in injecting prohibitively large amounts of noise into the function computation. The proofs of the

L_{1}

-sensitivity terms

Δ^{f m}

for fm are provided in [], and the proof of the

L_{2}

-sensitivity

Δ^{r l x - f m}

for rlx-fm for the logistic regression is shown in []. We can follow the similar procedure outlined by Ding et al. [] to obtain the

L_{2}

-sensitivity for the linear regression problem as

\frac{2}{N} \sqrt{1 + 4 D + D^{2}}

. The proof is as follows:

Proof.

Let the n-th sample of a dataset

D

be denoted by a tuple

t_{n} = (x_{n}, y_{n})

, where

x_{n} \in R^{D}

is the feature vector and

y_{n} \in R

is the response for

n \in [N]

. Let us assume that two neighboring datasets

D

and

D^{'}

differ in the last tuple

t_{N}

and

t_{N^{'}}

. For linear regression we have

\begin{matrix} f_{D} (w) & = \frac{1}{N} \sum_{n = 1}^{N} {(y_{n} - x_{n}^{⊤} w)}^{2} \\ = (\frac{1}{N} \sum_{n = 1}^{N} y_{n}^{2}) + \sum_{d = 1}^{D} (- \frac{2}{N} \sum_{n = 1}^{N} y_{n} x_{n d}) w_{d} + \sum_{d_{1} = 1}^{D} \sum_{d_{2} = 1}^{D} (\frac{1}{N} \sum_{n = 1}^{N} x_{n d_{1}} x_{n d_{2}}) w_{d_{1}} w_{d_{2}} \\ = \frac{1}{N} \sum_{n = 1}^{N} \sum_{j = 0}^{2} \sum_{ϕ \in Φ_{j}} λ_{ϕ t_{n}} ϕ (w), \end{matrix}

where

{λ_{ϕ t_{n}}}_{ϕ \in Φ_{0}} = : λ_{0 t_{n}} = y_{n}^{2}

;

{λ_{ϕ t_{n}}}_{ϕ \in Φ_{1}} = : λ_{1 t_{n}} = - 2 y_{n} x_{n}

; and

{λ_{ϕ t_{n}}}_{ϕ \in Φ_{2}} = : λ_{2 t_{n}} = x_{n}^{2}

. We denote

A_{1} = {\frac{1}{N} \sum_{n = 1}^{N} λ_{ϕ t_{n}}}_{ϕ \in \cup_{j = 0}^{2} Φ_{j}}

and

A_{2} = {\frac{1}{N} \sum_{n = 1}^{N} λ_{ϕ t_{n^{'}}}}_{ϕ \in \cup_{j = 0}^{2} Φ_{j}}

as the set of polynomial coefficients of

f_{D} (w)

and

f_{D^{'}} (w)

. We also denote

\begin{matrix} C & = (\begin{matrix} y^{2} \\ - 2 y x_{(1)} \\ \dots \\ - 2 y x_{(D)} \\ x_{(1)} x_{(1)} \\ \dots \\ x_{(D)} x_{(D)} \end{matrix}) \in R^{(1 + D + D^{2}) \times 1}, \end{matrix}

where

x_{(c)}

represents the c-th element in the feature vector

x

. Now, the

L_{2}

-sensitivity of linear regression for the relaxed FM algorithm can be expressed as

\begin{matrix} Δ_{2} & = ∥ A_{1} - A_{2} ∥_{2} \\ = ∥ {\frac{1}{N} \sum_{n = 1}^{N} λ_{ϕ t_{n}} - \frac{1}{N} \sum_{n = 1}^{N} λ_{ϕ t_{n^{'}}}}_{ϕ \in \cup_{j = 0}^{2} Φ_{j}} ∥_{2} \\ = \frac{1}{N} ∥ {λ_{ϕ t_{N}} - λ_{ϕ t_{N^{'}}}}_{ϕ \in \cup_{j = 0}^{2} Φ_{j}} ∥_{2} \\ \leq \frac{2}{N} max_{t = (x, y)} ∥ C ∥_{2} \\ = \frac{2}{N} max_{t = (x, y)} \sqrt{y^{2} + \sum_{d = 1}^{D} (- 2 y x_{(d)})^{2} + \sum_{d_{1} = 1}^{D} \sum_{d_{2} = 2}^{D} (x_{(d_{1})} x_{(d_{2})})^{2}} \\ = \frac{2}{N} \sqrt{1 + 4 D + D^{2}} ≜ Δ^{r l x - f m}, \end{matrix}

where t is an arbitrary tuple. □

We now empirically compare the additive noise standard deviation

τ

for gauss-fm, fm, and rlx-fm. In Figure A1, we show

τ

of the additive noise for the coefficient terms for different j and different data dimensionality D. We set the number of samples N = 10,000 and privacy parameter

δ = 10^{- 5}

. From the figure, we observe that the noise standard deviation for gauss-fm is significantly lower than the noise standard deviation for both fm and rlx-fm algorithms. We achieve this by our novel sensitivity analysis, which is tailored to different coefficient terms (i.e., the order j) as shown in Section 4 and Algorithm 1.

Figure A1. Standard deviation

τ

of the additive noise for (a)

j = 0

, (b)

j = 1

, and (c)

j = 2

for different values of dimensionality D for differentially private linear regression using fm, rlx-fm, and gauss-fm.

Figure A2. Standard deviation

τ

of the additive noise for (a)

j = 1

and (b)

j = 2

for different values of dimensionality D for differentially private logistic regression using fm, rlx-fm, and gauss-fm.

Appendix B. Additional Experimental Results on Synthetic Data

In addition to the real datasets, we perform experiments on synthetic datasets while keeping the setup identical to the one described in Section 6. We generate random samples

X

and outputs

y

with dimensionality

D = 20

for the linear regression problems in pooled-data (Figure A3a–c) and distributed-data settings ((Figure A3d–f). For logistic regression in pooled-data (Figure A3g–i) and distributed-data settings (Figure A3j–l), we generate another synthetic dataset with dimensionality

D = 50

where outputs

y

are class labels.

Similar to the results observed in Section 6, performance generally improves with lower noise variance and a weaker privacy guarantee. Our proposed gauss-fm and cape-fm algorithms consistently outperform existing fm, rlx-fm, noisy-gd, and obj-pert methods. We also show variation in performance with number of sites S in Figure A4. The empirical results verify that the utility of cape-fm does not degrade with increased S, and thus provides a better privacy guarantee over conventional decentralized DP schemes.

Figure A3. Performance comparison and overall

ϵ

for synthetic datasets with varying noise standard deviation

τ

in (a,d,g,j), number of training samples

N_{t r a i n}

in (b,e,h,k), and privacy parameter

δ

in (c,f,i,l).

Figure A4. Decentralized linear and logistic regression performance comparison and overall

ϵ

with varying number of sites S for the datasets (a) synth (D = 20) and (b) synth (D = 50).

References

Dwork, C. Differential Privacy. In Automata, Languages and Programming. ICALP 2006; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2006; Volume 4052, pp. 1–12. [Google Scholar]
Sarwate, A.D.; Chaudhuri, K. Signal processing and machine learning with differential privacy: Algorithms and challenges for continuous data. IEEE Signal Process. Mag. 2013, 30, 86–94. [Google Scholar] [CrossRef] [PubMed]
Jayaraman, B.; Evans, D. Evaluating differentially private machine learning in practice. In Proceedings of the 28th USENIX Security Symposium (USENIX Security 19), Santa Clara, CA, USA, 14–16 August 2019; pp. 1895–1912. [Google Scholar]
Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference; Springer: Berlin/Heidelberg, Germany, 2006; pp. 265–284. [Google Scholar]
Desfontaines, D.; Pejó, B. Sok: Differential privacies. Proc. Priv. Enhancing Technol. 2020, 2020, 288–313. [Google Scholar] [CrossRef]
Imtiaz, H.; Mohammadi, J.; Silva, R.; Baker, B.; Plis, S.M.; Sarwate, A.D.; Calhoun, V.D. A Correlated Noise-Assisted Decentralized Differentially Private Estimation Protocol, and its Application to fMRI Source Separation. IEEE Trans. Signal Process. 2021, 69, 6355–6370. [Google Scholar] [CrossRef] [PubMed]
Zhang, J.; Zhang, Z.; Xiao, X.; Yang, Y.; Winslett, M. Functional mechanism: Regression analysis under differential privacy. arXiv 2012, arXiv:1208.0219. [Google Scholar] [CrossRef]
Chaudhuri, K.; Monteleoni, C.; Sarwate, A.D. Differentially private empirical risk minimization. J. Mach. Learn. Res. 2011, 12, 1069–1109. [Google Scholar]
Bassily, R.; Smith, A.; Thakurta, A. Private empirical risk minimization: Efficient algorithms and tight error bounds. In Proceedings of the 2014 IEEE 55th Annual Symposium on Foundations of Computer Science, Philadelphia, PA, USA, 18–21 October 2014; pp. 464–473. [Google Scholar]
Ding, J.; Zhang, X.; Li, X.; Wang, J.; Yu, R.; Pan, M. Differentially private and fair classification via calibrated functional mechanism. Proc. AAAI Conf. Artif. Intell. 2020, 34, 622–629. [Google Scholar] [CrossRef]
Phan, N.; Vu, M.; Liu, Y.; Jin, R.; Dou, D.; Wu, X.; Thai, M.T. Heterogeneous Gaussian mechanism: Preserving differential privacy in deep learning with provable robustness. arXiv 2019, arXiv:1906.01444. [Google Scholar]
Song, S.; Chaudhuri, K.; Sarwate, A.D. Stochastic gradient descent with differentially private updates. In Proceedings of the 2013 IEEE Global Conference on Signal and Information Processing, Austin, TX, USA, 3–5 December 2013; pp. 245–248. [Google Scholar]
Nozari, E.; Tallapragada, P.; Cortés, J. Differentially private distributed convex optimization via objective perturbation. In Proceedings of the 2016 American Control Conference (ACC), Boston, MA, USA, 6–8 July 2016; pp. 2061–2066. [Google Scholar]
Wu, X.; Li, F.; Kumar, A.; Chaudhuri, K.; Jha, S.; Naughton, J. Bolt-on differential privacy for scalable stochastic gradient descent-based analytics. In Proceedings of the 2017 ACM International Conference on Management of Data, Chicago, IL, USA, 14–19 May 2017; pp. 1307–1322. [Google Scholar]
Smith, A. Privacy-preserving statistical estimation with optimal convergence rates. In Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing, San Jose, CA, USA, 6–8 June 2011; pp. 813–822. [Google Scholar]
McSherry, F.; Talwar, K. Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), Providence, RI, USA, 21–23 October 2007; pp. 94–103. [Google Scholar]
Jorgensen, Z.; Yu, T.; Cormode, G. Conservative or liberal? Personalized differential privacy. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering, Seoul, Republic of Korea, 13–17 April 2015; pp. 1023–1034. [Google Scholar]
Aono, Y.; Hayashi, T.; Trieu Phong, L.; Wang, L. Scalable and secure logistic regression via homomorphic encryption. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, New Orleans, LA, USA, 9–11 March 2016; pp. 142–144. [Google Scholar]
Xu, D.; Yuan, S.; Wu, X. Achieving differential privacy and fairness in logistic regression. In Proceedings of the Companion Proceedings of the 2019 World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 594–599. [Google Scholar]
Fredrikson, M.; Lantz, E.; Jha, S.; Lin, S.; Page, D.; Ristenpart, T. Privacy in pharmacogenetics: An End-to-End case study of personalized Warfarin dosing. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security 14), San Diego, CA, USA, 20–22 August 2014; pp. 17–32. [Google Scholar]
Anderson, J.L.; Horne, B.D.; Stevens, S.M.; Grove, A.S.; Barton, S.; Nicholas, Z.P.; Kahn, S.F.; May, H.T.; Samuelson, K.M.; Muhlestein, J.B.; et al. Randomized trial of genotype-guided versus standard Warfarin dosing in patients initiating oral anticoagulation. Circulation 2007, 116, 2563–2570. [Google Scholar] [CrossRef]
Fusaro, V.A.; Patil, P.; Chi, C.L.; Contant, C.F.; Tonellato, P.J. A systems approach to designing effective clinical trials using simulations. Circulation 2013, 127, 517–526. [Google Scholar] [CrossRef]
Consortium, I.W.P. Estimation of the Warfarin dose with clinical and pharmacogenetic data. N. Engl. J. Med. 2009, 360, 753–764. [Google Scholar]
Sconce, E.A.; Khan, T.I.; Wynne, H.A.; Avery, P.; Monkhouse, L.; King, B.P.; Wood, P.; Kesteven, P.; Daly, A.K.; Kamali, F. The impact of CYP2C9 and VKORC1 genetic polymorphism and patient characteristics upon Warfarin dose requirements: Proposal for a new dosing regimen. Blood 2005, 106, 2329–2333. [Google Scholar] [CrossRef] [PubMed]
Gade, S.; Vaidya, N.H. Private learning on networks. arXiv 2016, arXiv:1612.05236. [Google Scholar]
Heikkilä, M.; Lagerspetz, E.; Kaski, S.; Shimizu, K.; Tarkoma, S.; Honkela, A. Differentially private Bayesian learning on distributed data. Adv. Neural Inf. Process. Syst. 2017, 30, 3229–3238. [Google Scholar]
Tajeddine, R.; Jälkö, J.; Kaski, S.; Honkela, A. Privacy-preserving data sharing on vertically partitioned data. arXiv 2020, arXiv:2010.09293. [Google Scholar]
Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.; Segal, A.; Seth, K. Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1175–1191. [Google Scholar]
Heikkilä, M.A.; Koskela, A.; Shimizu, K.; Kaski, S.; Honkela, A. Differentially private cross-silo federated learning. arXiv 2020, arXiv:2007.05553. [Google Scholar]
Xu, D.; Yuan, S.; Wu, X. Achieving differential privacy in vertically partitioned multiparty learning. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 5474–5483. [Google Scholar]
Dwork, C.; Kenthapadi, K.; McSherry, F.; Mironov, I.; Naor, M. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques; Springer: Berlin/Heidelberg, Germany, 2006; pp. 486–503. [Google Scholar]
Anandan, B.; Clifton, C. Laplace noise generation for two-party computational differential privacy. In Proceedings of the 2015 13th Annual Conference on Privacy, Security and Trust (PST), Izmir, Turkey, 21–23 July 2015; pp. 54–61. [Google Scholar]
Dwork, C.; Roth, A. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 2014, 9, 211–407. [Google Scholar] [CrossRef]
Mironov, I. Rényi differential privacy. In Proceedings of the 2017 IEEE 30th Computer Security Foundations Symposium (CSF), Santa Barbara, CA, USA, 21–25 August 2017; pp. 263–275. [Google Scholar]
Rudin, W. Principles of Mathematical Analysis; International Series in Pure and Applied Mathematics; McGraw-Hill: New York, NY, USA, 1976. [Google Scholar]
Imtiaz, H.; Sarwate, A.D. Distributed differentially private algorithms for matrix and tensor factorization. IEEE J. Sel. Top. Signal Process. 2018, 12, 1449–1464. [Google Scholar] [CrossRef]
Balle, B.; Wang, Y.X. Improving the Gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learning; PMLR: Cambridge, MA, USA, 2018; pp. 394–403. [Google Scholar]
Holohan, N.; Antonatos, S.; Braghin, S.; Mac Aonghusa, P. The bounded Laplace mechanism in differential privacy. arXiv 2018, arXiv:1808.10410. [Google Scholar] [CrossRef]
Dong, J.; Roth, A.; Su, W.J. Gaussian differential privacy. arXiv 2019, arXiv:1905.02383. [Google Scholar] [CrossRef]
Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.B.; Mironov, I.; Talwar, K.; Zhang, L. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 308–318. [Google Scholar]
Ergün, G. Random Matrix Theory. In Encyclopedia of Complexity and Systems Science; Meyers, R.A., Ed.; Springer: New York, NY, USA, 2009; pp. 7505–7520. [Google Scholar] [CrossRef]
Strang, G. Introduction to Linear Algebra; Wellesley-Cambridge Press: Wellesley, MA, USA, 1993; Volume 3. [Google Scholar]
Dwork, C.; Talwar, K.; Thakurta, A.; Zhang, L. Analyze Gauss: Optimal Bounds for Privacy-Preserving Principal Component Analysis. In Proceedings of the Forty-Sixth Annual ACM Symposium on Theory of Computing, STOC ’14, New York, NY, USA, 31 May–3 June 2014. [Google Scholar] [CrossRef]
Shamir, A. How to share a secret. Commun. ACM 1979, 22, 612–613. [Google Scholar] [CrossRef]
Redmond, M.; Baveja, A. A data-driven software tool for enabling cooperative information sharing among police departments. Eur. J. Oper. Res. 2002, 141, 660–678. [Google Scholar] [CrossRef]
Kawala, F.; Douzal-Chouakria, A.; Gaussier, E.; Dimert, E. Prédictions d’activité dans les réseaux sociaux en ligne. In Proceedings of the 4ième Conférence sur les Modèles et l’Analyse des réseaux: Approches Mathématiques et Informatiques, Saint-Etienne, France, 16–18 October 2013; p. 16. [Google Scholar]
Dua, D.; Graff, C. UCI Machine Learning Repository, 2017. University of California, Irvine, School of Information and Computer Sciences. Available online: http://archive.ics.uci.edu/ml (accessed on 15 April 2023).

Figure 1. Linear regression performance comparison in terms of MSE and overall

ϵ

for IWPC

(D = 9)

, crime

(D = 101)

, and twitter

(D = 77)

datasets with varying noise standard deviation

τ

in (a,d,g) the number of training samples

N_{t r a i n}

in (b,e,h), and privacy parameter

δ

in (c,f,i).

Figure 2. Logistic regression performance comparison in terms of accuracy and overall

ϵ

for phishing

(D = 30)

, adult

(D = 13)

, and kdd

(D = 36)

datasets with varying noise standard deviation

τ

in (a,d,g), the number of training samples

N_{t r a i n}

in (b,e,h), and privacy parameter

δ

in (c,f,i).

Figure 3. Decentralized linear regression performance comparison in terms of MSE and overall

ϵ

for IWPC

(D = 9)

, crime

(D = 101)

, and twitter

(D = 77)

datasets with varying noise standard deviation

τ

in (a,d,g), the number of training samples

N_{t r a i n}

in (b,e,h), and privacy parameter

δ

in (c,f,i).

Figure 4. Decentralized logistic regression performance comparison in terms of accuracy and overall

ϵ

for phishing

(D = 30)

, adult

(D = 13)

, and kdd

(D = 36)

datasets with varying noise standard deviation

τ

in (a,d,g), the number of training samples

N_{t r a i n}

in (b,e,h), and privacy parameter

δ

in (c,f,i).

Figure 5. Decentralized linear and logistic regression performance comparison and overall

ϵ

with varying number of sites S for the datasets (a) IWPC

(D = 9)

, (b) crime

(D = 101)

, (c) twitter

(D = 77)

, (d) phishing

(D = 30)

, (e) adult

(D = 13)

, and (f) kdd

(D = 36)

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Approximating Functions with Approximate Privacy for Applications in Signal Estimation and Learning

Abstract

1. Introduction

2. Definitions and Preliminaries

2.1. Definitions

2.2. Functional Mechanism []

3. Functional Mechanism with Approximate Differential Privacy: Gaussian FM

4. Application of Gaussian FM in Regression Analysis

4.1. Linear Regression

4.2. Logistic Regression

4.3. Avoiding Unbounded Noisy Objective Functions

5. Extension of Gaussian FM to Decentralized-Data Setting: $capeFM$

5.1. Problems with Conventional Decentralized DP Computations

5.2. Correlation Assisted Private Estimation ( $CAPE$ )

5.3. Proposed Gaussian FM for Decentralized Data $(capeFM)$

5.4. Computation and Communication Overhead of $capeFM$

6. Experimental Results

6.1. Linear Regression

6.2. Logistic Regression

6.3. Decentralized Functional Mechanism ( $capeFM$ )

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Comparison of Sensitivity and Noise Standard Deviation

Appendix B. Additional Experimental Results on Synthetic Data

References

Article Metrics

Citations

Article Access Statistics

Approximating Functions with Approximate Privacy for Applications in Signal Estimation and Learning

Abstract

1. Introduction

2. Definitions and Preliminaries

2.1. Definitions

2.2. Functional Mechanism [7]

3. Functional Mechanism with Approximate Differential Privacy: Gaussian FM

4. Application of Gaussian FM in Regression Analysis

4.1. Linear Regression

4.2. Logistic Regression

4.3. Avoiding Unbounded Noisy Objective Functions

5. Extension of Gaussian FM to Decentralized-Data Setting: capeFM

5.1. Problems with Conventional Decentralized DP Computations

5.2. Correlation Assisted Private Estimation ( CAPE )

5.3. Proposed Gaussian FM for Decentralized Data ( capeFM )

5.4. Computation and Communication Overhead of capeFM

6. Experimental Results

6.1. Linear Regression

6.2. Logistic Regression

6.3. Decentralized Functional Mechanism ( capeFM )

7. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Comparison of Sensitivity and Noise Standard Deviation

Appendix B. Additional Experimental Results on Synthetic Data

References

Article Metrics

Citations

Article Access Statistics

2.2. Functional Mechanism []

5. Extension of Gaussian FM to Decentralized-Data Setting: $capeFM$

5.2. Correlation Assisted Private Estimation ( $CAPE$ )

5.3. Proposed Gaussian FM for Decentralized Data $(capeFM)$

5.4. Computation and Communication Overhead of $capeFM$

6.3. Decentralized Functional Mechanism ( $capeFM$ )