A Node Differential Privacy-Based Method to Preserve Directed Graphs in Wireless Mobile Networks

Yan, Jun; Zhou, Yihui; Lu, Laifeng

doi:10.3390/app13148089

Open AccessArticle

A Node Differential Privacy-Based Method to Preserve Directed Graphs in Wireless Mobile Networks

by

Jun Yan

^1,2,3,

Yihui Zhou

¹ and

Laifeng Lu

^4,*

¹

School of Computer Science, Shaanxi Normal University, Xi’an 710119, China

²

School of Mathematics and Computer Applications, Shangluo College, Shangluo 726000, China

³

Engineering Research Center of Qinling Health Welfare Big Data, Universities of Shaanxi Province, Shangluo 726000, China

⁴

School of Mathematics and Statistics, Shaanxi Normal University, Xi’an 710119, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(14), 8089; https://doi.org/10.3390/app13148089

Submission received: 10 June 2023 / Revised: 2 July 2023 / Accepted: 3 July 2023 / Published: 11 July 2023

(This article belongs to the Special Issue Privacy-Preserving Methods and Applications in Big Data Sharing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

With the widespread popularity of Wireless Mobile Networks (WMNs) in our daily life, the huge risk to disclose personal privacy of massive graph structure data in WMNs receives more and more attention. Particularly, as a special type of graph data in WMNs, the directed graph contains an amount of sensitive personal information. To provide secure and reliable privacy preservation for directed graphs in WMNs, we develop a node differential privacy-based method, which combines differential privacy with graph modification. In the method, the original directed graph is first divided into several sub-graphs after it is transformed into a weighted graph. Then, in each sub-graph, the node degree sequences are obtained by using an exponential mechanism and micro-aggregation is adopted to get the noised node degree sequences, which is used to generate a synthetic directed sub-graph through edge modification. Finally, all synthetic sub-graphs are merged into a synthetic directed graph that can preserve the original directed graph. The theoretical analysis proves that the proposed method satisfies differential privacy. The results of the experiments demonstrate the effectiveness of the presented method in privacy preservation and data utility.

Keywords:

wireless mobile networks; directed graph; differential privacy; graph modification

1. Introduction

Over the past several years, wide applications of 4G mobile wireless networks have brought us tremendous convenience. For example, through the 4G mobile wireless networks, we can enjoy a large number of online services, including mobile shopping/payment, mobile office, mobile gaming, etc. [1]. Nowadays, with the wide popularity of various innovative applications, such as Vehicle-to-Everything (V2X), AR, holographic communications, etc. [2], mobile wireless networks make our daily life more convenient. However, mobile wireless networks also present us with a great challenge while providing tremendous convenience for us. For instance, as a large amount of data including sensitive information is published or shared in mobile wireless networks without privacy preservation, a lot of individual privacy is leaked, which results in many social security problems [3]. In particular, [4] points out that data leakage is one of the most frequent mobile security threats. Therefore, it is crucial to pay close attention to individual privacy in mobile wireless networks.

More importantly, there is a large amount of personal privacy information, including identity privacy, semantic attribute privacy, and link privacy, in the graph structure data in mobile wireless networks [5]. To address the privacy issue in the graph structure data, many graph modification methods have been proposed, which are divided into three categories: edge/node modification, clustering, and uncertain graph [6]. In the edge/node modification method, the edge randomization method randomly adds or deletes edges in the original graph while retaining the characteristics of an original graph as much as possible [7]. To overcome the shortcoming of this method, many k-anonymity methods that can resist attacks based on the structure of the graph have been devised, such as (k,l)-anonymity [8], k2-anonymity [9] and k-neighborhood sub-graph anonymity [10]. Clustering-based methods, called generalization methods, usually group nodes and edges into super-nodes and super-edges, which hide the detail of nodes and edges in the graph [11]. Furthermore, the method combining k-anonymity with node clustering is designed, which can provide sufficient privacy preservation while retaining data utility [12]. Compared with the two methods mentioned above, the uncertain graph method, which rejects the uncertainty on the edges of a graph to generate an uncertain graph, can get better data utility than them. Although graph modification can preserve the graph structure data, it is not able to resist attacks based on background knowledge.

As a gold-standard notion of privacy that can provide a strict privacy guarantee [13], differential privacy has been adopted to preserve graph structure data [14]. For instance, differential privacy has been extensively applied to preserve various statistical values of the graph, such as the degree distribution [15], frequent graphics patterns [16], and triangle count [17]. In addition, it can also be used to generate a synthetic graph to preserve the original graph. In [18], a synthetic graph is released by using a differential private estimator of the parameters of a special model, which is an exponential family model with the degree sequence as a sufficient statistic. To improve the data utility of a differential private synthetic graph [19], devises a differential private graph generator based on the dK-graph model. Different from graph modification, differential privacy usually employs noise to achieve privacy preservation, which results in insufficient data utility.

However, the methods introduced above mainly concentrate on undirected graphs. As a special graph, directed graphs, such as the who-follows-whom social graph on Twitter, not only possess the relations in graphs but also have the direction information. Therefore, it is hard to adopt these methods to preserve directed graphs when it is published or shared. By considering the direction information of edges, a few k-anonymity methods have been designed in [20,21]. But the k-anonymity method is not able to resist attacks based on background knowledge and only withstand some special attacks, these methods cannot provide sufficient privacy preservation for directed graphs. As a result, it is a great challenge to preserve directed graphs.

To solve this problem, we propose a useful method that combines differential privacy with graph modification to preserve directed graphs. In particular, compared with edge differential privacy, node differential privacy can provide stronger privacy preservation. Thus, node differential privacy is used to add noise on degree sequences, and edge modification utilizes noised degree sequences to generate a synthetic directed graph, which provides strong privacy preservation for the original directed graph. Additionally, to improve data utility, the original directed graph is divided into many sub-graphs, and the perturbations are only added in each sub-graph, which is useful to maintain the whole graph structure. In particular, the exponent mechanism is adopted to truncate degree sequences, which can ensure that the minimum noise is added to the degree sequences. Moreover, the ranking micro-aggregation effectively reduces the noise added to the degree sequences. According to the noised degree sequences, the relationship between two nodes is utilized to modify the edges of nodes, which can retain the original graph structure. Therefore, the designed method not only provides strong privacy preservation but also maintains the data utility.

In this paper, our contributions can be summarized as follows:

We propose a method based on node differential privacy to preserve directed graphs in wireless mobile networks. Particularly, node differential privacy and edge modification are combined to generate a synthetic directed graph that provides strong privacy preservation for the original directed graph.

We present four algorithms to maintain data utility in the proposed method. First of all, the Louvain algorithm is used to divide the original directed graph into several sub-graphs. Then, the node degree sequences in each sub-graph are generated by the GSEM (generating degree sequence based on exponent mechanism) algorithm and ADPRA (adding noise based on differential privacy with the ranking micro-aggregation) algorithm adds less noise on these node degree sequences. In the end, GGM (generating synthetic graphs based on graph modification) algorithm generates a synthetic directed sub-graph that maintains the properties of an original sub-graph.

We demonstrate the performances of the proposed method on several different real data sets, and the experimental results show that the proposed method is effective in privacy preservation and data utility.

The rest of this paper is organized as follows. Section 2 reviews the related methods to preserve the graph structure data. In Section 3, the preliminaries are introduced. Then, the proposed method is described in detail in Section 4. Section 5 demonstrates the performance of the proposed method in privacy preservation and data utility. In Section 6, the existing challenges and promising future directions are discussed.

2. Related Works

With individual privacy on MWNs attracting more and more attention, various techniques have been proposed to provide privacy preservation. In this section, we will focus on methods that include two categories: graph modification and differential privacy.

In graph modification, there are three important graph modification methods: edge and node modification methods, generalization or clustering methods, and uncertain graph methods. In edge and node modification methods, to improve data utility, Ying X. [7] proposed two algorithms to preserve the original graph while keeping the spectral properties of the graph unchanged as much as possible. In [22], Casas-Roma designed a method to protect the most important edges, which obtained a better trade-off between privacy preservation and data utility. In generalization methods that focus on how to generate so-called super-nodes and super-edges, Yu F. [23] developed a clustering perturbation algorithm that adopted some perturbations to maintain the whole structure of the social net work and reduce privacy leakages. In uncertain graph methods, Boldi in [24] designed a (k, )-obfuscation method based on injecting uncertainty to get an uncertain graph, which was similar to the original graph. To prevent link attacks based on background knowledge, Hu J developed an uncertain graph method based on edge-differential privacy, which also had better data utility in [25].

In addition, k-anonymity [5] had been widely used to generate anonymous graphs to preserve graph data. Considering the number of mutual friends (NMF) between two users, [26] developed a k-anonymity method that made use of the mutual friend sequence to ensure the existence of at least k elements holding the same value for better data utility. In [27], the new (k, l)-degree anonymity algorithm was devised to modify the original networks based on a sequence of edge editing operations. In this algorithm, a location entropy metric was considered to select the important edges so it could achieve minimum edge modification to increase data utility. Meanwhile, to resist insider attacks in collaborative social networks, [28] developed a k-anonymity method based on the clustering, in which a scalable non-deterministic clustering was utilized to prevent the structure attacks.

In differential privacy methods, many methods based on differential privacy have been presented for graph data since C. Dwork came up with differential privacy, which was classified into two kinds: preserving specific sensitive statistics of graphs and generating differential private graphs. For publishing higher order network statistics, i.e., joint degree distribution, Iftikhar [29] designed a general framework for releasing dK-distributions under node differential privacy, in which sensitivity was regulated by a graph projection algorithm, which transformed graphs into bounded graphs. To accurately estimate sub-graph counts, [30] proposed a novel multi-phase framework under DDP (decentralized differential privacy), which was able to control the minimum local noise scale to preserve the sub-graph counts. Furthermore, some statistical data in graph data, such as triangle counts, centrality and shortest paths were preserved by differential privacy before they were released [31,32]. Apart from preserving the statistical data, differential privacy is also applied to generate a synthetic graph. In [33], Vishesh Karwa developed an algorithm to attain a graphical degree partition of a graph preserved by differential privacy, which could also be used to construct synthetic graphs. Ref. [34] proposed an LDPGen, which could generate a synthetic graph after structurally similar users were clustered together according to optimal parameters.

3. Preliminaries Knowledge

In this paper, a directed network is regarded as a simple, directed graph G = (V, E), where V = (

v_{1}

,

v_{2}

, ...,

v_{n}

) is the set of nodes, and E is the links table, each link (i, j) denotes a relationship from

v_{i}

to

v_{i}

.

Definition 1

(The undirected graph and the directed graph). As shown in Figure 1, the Figure 1a is an undirected graph, while the Figure 1b represents a directed graph, where each edge denotes a relationship from one node to another node. In the Figure 1b, the edge (

v_{1}

,

v_{4}

) denotes a link relation from node

v_{1}

to node

v_{4}

.

Definition 2

(Neighboring directed graph). For two directed graphs

G_{1}

= (

V_{1}

,

E_{1}

),

G_{2}

= (

V_{2}

,

E_{2}

), if |

V_{1}

⊕

V_{2}

|+|

E_{1}

⊕

E_{2}

| = 1, where ⊕ is Exclusive—OR operation, we can say

G_{1}

and

G_{2}

are neighbors.

As shown in Figure 2, compared with the Figure 2b, Figure 2a has one more different node with three directed edges. So the Figure 2a,b are neighboring graphs.

Definition 3

(Differential Privacy). For all outputs S belong to Range(Z), if we can obtain the result as follows:

P_{r} [Z (G a) \in S] = e^{ϵ} \times P_{r} [Z (G b) \in S]

(1)

where Ga and Gb are neighbors, ϵ is a privacy preservation level, we can see that the algorithm Z satisfies ϵ-differential privacy.

In order to achieve ϵ-differential privacy, we must perturb the outputs of queries in two ways, which include the Laplace mechanism and the exponential mechanism.

Definition 4

(Laplace Mechanism). For a sequence of queries F: G→G, if the following holds:

Z (G) = F (G) + L a p (Δ f / ϵ)

(2)

where μ = 0, b = Δf/ε and Lap(Δf/ε) represents the Laplace noise, the way that makes an algorithm Z satisfies ε-differential privacy by adding Laplace noise is the Laplace mechanism.

In the Laplace mechanism, the Laplace noise distribution is shown in Equation (4).

n (x) = 1 / 2 b * e x p (- | x - μ | / b)

(3)

where μ is a position parameter, b denotes a scale parameter, and x is a random variable.

Definition 5

(Exponential Mechanism). Given a dataset D, an output range T, a privacy budget ϵ, and a utility function U: (D, t) →R, a mechanism M that selects an output t∈T with probability proportional to exp(

\frac{ϵ \cdot U (D, t)}{2 Δ U}

) satisfies ϵ-differential privacy.

Definition 6

(Parallel composition properties). Given a sequence of algorithms {

A_{1}

,

A_{2}

, ...,

A_{n}

}, and each algorithm

A_{i}

satisfies

ϵ_{i}

differential privacy, if these algorithms are applied independently on a disjoint subset of the input database D, this data process is called the parallel composition properties of differential privacy, which satisfies max

ϵ_{i}

differential privacy.

4. The Proposed Method

4.1. The Framework of Method

To preserve the directed graph in wireless mobile networks, we propose a novel method based on differential privacy, which combines node differential privacy and graph modification to provide sufficient privacy preservation while retaining data utility. In addition, we assume that the original directed graph is a simple connected static directed graph without node attributes.

As shown in Figure 3, the model of the developed method consists of four steps. In step 1, after the original directed graph is converted into a weight graph, the weight graph is divided into some sub-graphs according to the optimal modularity [35]. Then, the node differential privacy is utilized to generate two differential private degree sequences (an in-degree sequence and an out-degree sequence) from each sub-graph in step 2. In particular, the exponent mechanism is used to get the degree sequences of nodes in each sub-graph, while the ranking micro-aggregation is applied to add noise to them. Next, in step 3, each sub-graph is modified by adding or deleting edges according to the noised degrees. Simultaneously, the relationship between nodes is considered in edge modification. At last, all modified sub-graphs are merged into a differential private directed graph which provides privacy preservation for the original directed graph in Section 4.

In particular, as node differential privacy provides stronger privacy preservation than graph modification and edge differential privacy, it is adopted to provide better privacy preservation for the directed graphs in mobile wireless networks. At the same time, to maintain data utility, the Louvain algorithm is used to divide the original directed graph into many sub-graphs, and graph modifications are limited in them. Then, the exponent mechanism truncates the degree sequences to add the minimum noise to the degree sequences when the privacy budget is given. Next, the ranking micro-aggregation effectively reduces the noise added to the degree sequences, which is proved by the mathematical analysis. Finally, the relationship between two nodes is utilized to modify the edges of nodes, which can retain the original graph structure. Therefore, the proposed method achieves the trade-off between privacy and data utility.

To summarize, we develop a novel method that can achieve privacy preservation for the directed graph while maintaining the data utility.

4.2. DGNDP (Synthetic Directed Graph Based on Node Differential Privacy) Algorithm

In Algorithm 1, the goal is to generate a differential private directed graph. At first, to better gather the nodes of the directed graph, we convert this directed graph into a weighted graph and utilize the Louvain algorithm to divide the weighted graph into several sub-graphs. In each sub-graph, the GSEM algorithm is used to get an out-degree sequence and an in-degree sequence. Then, the ADPRA algorithm adds noise to these degree sequences. After that, the GGM algorithm generates a synthetic directed sub-graph according to the noised degrees. Finally, all synthetic directed sub-graphs are merged into a differential private directed graph.

Algorithm 1 DGNDP algorithm

Input: an original directed graph G
Output: a synthetic directed graph G’
1: a weighted graph $G_{w}$ ← converting an original directed graph G
2: a set of directed sub-graph Ssub ← decomposing a weighted graph $G_{w}$
2: a set of $S_{G n}$ = { }
3: for $S_{G i}$ in $S_{s u b}$ :
4: $S_{d o u t}$ = GSEM algorithm ( $S_{G i}$ , $ϵ_{1}$ )
5: $S_{d i n}$ = GSEM algorithm ( $S_{G i}$ , $ϵ_{1}$ )
6: an out-degree sequence $S_{d o u t n}$ ← ADPRA algorithm ( $S_{G i}$ , $ϵ_{2}$ )
7: an in-degree sequence $S_{d i n n}$ ← ADPRA algorithm ( $S_{G i}$ , $ϵ_{2}$ )
8: $S n_{G i}$ ← GGM algorithm ( $S G_{i}$ , $S_{d o u t n}$ , $S_{d i n n}$ )
9: $S_{G n}$ adding $S n_{G i}$
10: G’← merging $S_{G n}$
11: Return a synthesis directed graph G $^{^{'}}$

4.2.1. GSEM (Generating Degree Sequence Based on Exponent Mechanism) Algorithm

For an in-degree sequence or an out-degree sequence in a directed sub-graph, when the Laplace noise is added to this degree sequence, the smaller the degree of the node, the greater the damage caused by the added noise. To reduce noise added to the degree sequence, the noise is only added on nodes with large degrees in this sequence and nodes with small degrees are deleted from this sequence.

In particular, after a degree sequence of a sub-graph sorted from large to small is truncated according to a certain threshold t, the Laplace noise is only added to the rest of the degree sequence. Thus, there are two kinds of errors: One is the reconstruction error caused by the truncated part of the degree sequence, and the other is Laplace noise added on the rest of this degree sequence. The larger the certain threshold t is, the more parts of this sequence are deleted, which will result in a larger reconstruction error. On the contrary, the smaller the certain threshold t, the more Laplace noise is added. Therefore, the exponent mechanism is applied to get an optimal t, which can be used to add minimum noise to the degree sequence.

Given a directed sub-graph

S_{G a}

= (

V_{a}

,

E_{a}

), the out degree sequence of (

S_{G a}

) is Seqout(SGa

) :

[

d_{1}

,

d_{2}

, ...,

d_{n}

], where n is the number of nodes, then a query function f is given:

f \to S e q_{o u t} (S_{G a})

After the

S e q_{o u t}

(

S_{G a}

) is sorted from small to large and is truncated by t, there are two kind of errors in

S e q_{o u t}

(

S_{G a}

): the reconstruction error and the Laplace noise.

\begin{matrix} E r r o r (S e q_{o u t} (S_{G a})) \\ = R E (S e q_{o u t} (S_{G a})) + L E (S e q_{o u t} (S_{G a})) \end{matrix}

where RE(

S e q_{o u t}

(

S_{G a}

)) is the error caused by the truncation, LE(

S e q_{o u t}

(

S_{G a}

)) represents the error brought by the Laplace noise.

R E (S e q_{o u t} (S_{G a})) = \sqrt{\sum_{i = 1}^{t} {| d_{i} |}^{2}}

L E (S e q_{o u t} (S_{G i})) = E {(\sqrt{\sum_{i = t + 1}^{m} l a p (\frac{Δ f}{ε}})}^{2})

where m is

n - t

.

\begin{matrix} R E (S e q_{o u t} (S_{G a})) + L E (S e q_{o u t} (S_{G a})) \\ = \sqrt{\sum_{i = 1}^{t} {| d_{i} |}^{2}} + \sqrt{2 * (n - t)} * \frac{Δ f}{ε} \end{matrix}

To gain a minimum value of Error(Seout

(S_{G a})

), the exponent mechanism is utilized to select a best threshold t. Then, there is a scoring function:

U (S_{G a}, t) = \sqrt{\sum_{i = 1}^{t} {| d_{i} |}^{2}} + \sqrt{2 * (n - t)} * \frac{Δ f}{ε}

As the node differential privacy is employed to achieve differential privacy, the ΔU is:

Δ U = U (S_{G a}, t) - U (S_{G a^{^{'}}}, t) = Δ R E + Δ L E

where there is only one node difference between

S_{G a}

and

S_{G a^{^{'}}}

.

\begin{matrix} Δ R E = & \leq m a x (\sqrt{\sum_{i = 1}^{t} {| d {(S_{G a})}_{i} |}^{2}} - \sqrt{\sum_{i = 1}^{t} {| d {(S_{G a^{^{'}}})}_{i} |}^{2}}) \\ \leq m a x (\sum_{i = 1}^{t} | d {(S_{G a})}_{i} | - \sum_{i = 1}^{t} | d {(S_{G a^{^{'}}})}_{i} |) \\ \leq d_{m a x} \end{matrix}

\begin{matrix} Δ L E = Δ f \end{matrix}

\begin{matrix} Δ f_{t} = | f_{t} (S_{G a}) - f_{t} (S_{G a^{^{'}}}) | = m a x (d e g r e e (S_{G a})) \\ = d_{m a x} \end{matrix}

where Δf is the sensitive of a query function f.

Therefore, ΔU is:

\begin{matrix} Δ U = Δ R E + Δ L E \leq 2 d_{m a x} \end{matrix}

The probability that the threshold t can be selected is

p_{r} (t) = \frac{exp (- \frac{ε_{1} U (S_{G a}, t)}{2 Δ U})}{\sum_{i = 1}^{d_{m a x} - 1} exp (- \frac{ε_{1} U (S_{G a}, t)}{2 Δ U})}

Finally, a threshold t is obtained through the exponent mechanism and used to generate an out-degree sequence

S_{d o u t}

. In the same way, an in-degree sequence

S_{d i n}

is also obtained.

Thus, to ensure that the minimal noise is added to a degree sequence when the privacy budget is given, the noise is added to the truncated degree sequence.

As shown in Algorithm 2, in line 1, according to a directed sub-graph

S_{G a}

, an out degree sequence

S e q t_{o u t}

is generated. Then, the

d_{m a x}

is obtained from the out degree sequence

S e q t_{o u t}

in line 2. From line 3 to line 5, the exponent mechanism is used to get a threshold t. In line 6, an out-degree sequence

S e q_{o u t}

is truncated to get an out-degree sequence

S_{d o u t}

, and it is returned in line 7.

Algorithm 2 GSEM algorithm

Input: a directed $S_{G a}$ = ( $V_{a}$ , $E_{a}$ ), a privacy budget $ϵ_{1}$
Output: an out-degree sequence $S_{d o u t}$
1: an out degree sequence $S e q_{o u t}$ ← a directed graph $S_{G a}$ = ( $V_{a}$ , $E_{a}$ )
2: $d_{m a x}$ = the maximum degree of $S e q_{o u t}$
3: for 1 to t:
4: the scoring function

$U (S_{G a}, t) = \sqrt{\sum_{i = 1}^{t} {| d_{i} |}^{2}} + \sqrt{2 * (n - t)} * \frac{Δ f}{ϵ}$
5: selecting t with probability

$p_{r} (t) \propto e x p (- (ϵ_{1} U (S_{G 1}, t) / 2 * Δ U)$
6: an out-degree sequence $S_{d o u t}$ ← an out degree sequence $S e q_{o u t}$
7: Return an out-degree sequence $S_{d o u t}$

4.2.2. ADPRA (Adding Noise Based on Differential Privacy with the Ranking Micro-Aggregation) Algorithm

Given an ordered degree sequence d = [

d_{1}

,

d_{2}

, ...,

d_{n}

], it is aggregated into

\frac{n}{k}

clusters. In each cluster, there are k continuous degree values, except perhaps one cluster that contains up to

2 k - 1

consecutive values. Then, there is a sequence of the centroid of these clusters denoted by [

d c_{1}

,

d c_{2}

,...,

d c_{n / k}

]. In this ordered degree sequence, if any single

d_{i}

in d is replaced by

\bar{d}

, |

d_{i}

−

\bar{d}

| ≤ Δ, then there is a new sequence of the centroid of these new clusters which is described as [

\bar{d c_{1}}

,

\bar{d c_{2}}

, ...,

\bar{d c_{n / k}}

]. As a result, it holds that

\sum_{m = 1}^{[n / k]} |\bar{d c_{m}} - d c_{m}| \leq Δ / k

Compared with edge differential privacy, node differential privacy can provide stronger privacy preservation. Nevertheless, node differential privacy results in insufficient data utility. To mitigate this problem, the ranking micro-aggregation is introduced to improve data utility. In particular, with the help of the ranking micro-aggregation, this algorithm generates two differential private degree sequences with effective data utility, which are useful for the graph modification in the next step.

Without losing generality, assume

\bar{d_{i}}

>

d_{i}

, and n can be divided by k. Thus, there are n/k clusters, with each cluster m having consecutive values from

d_{(m - 1) k + 1}

to

d_{m k}

. In particular, each

d_{i}

belongs to a cluster

m_{i}

.

Then two cases are discussed.

Case 1: if

\bar{d_{i}}

≤

d_{m_{i} k + 1}

, then is still in cluster

\bar{m_{i}}

. Except for the cluster

m_{i}

, the centroids of other clusters are unchanged. The centroids of the cluster

\bar{m_{i}}

increase

\frac{Δ}{k}

, because

\bar{d_{i}}

=

d_{i}

+ Δ. Therefore, this case meets the requirements of the ranking micro-aggregation.

Case 2: if

\bar{d_{i}}

≥

d_{m_{i} k + 1}

, then

\bar{d_{i}}

is not in cluster

m_{i}

. Therefore, two and more changes for the

\bar{d_{i}}

replace the

d_{i}

: the cluster

m_{i}

lose

d_{i}

and a cluster

\bar{m_{i}}

obtain

\bar{d_{i}}

(for

\bar{m_{i}}

>

m_{i}

). For keeping the number of clusters

m_{i}

unchanged, the cluster

m_{i}

gains

d_{m_{i} k + 1}

; in return, the cluster

m_{i}

+1 loses

d_{m_{i} k + 1}

and obtains

d_{m_{i + 1} k + 1}

, until the cluster

\bar{m_{i}}

gives its smallest value

d_{(\bar{m_{i}} - 1) k + 1}

to cluster

\bar{m_{i}}

− 1 and obtains

\bar{d_{i}}

. From cluster

\bar{m_{i}}

+ 1 to the end cluster, there is nothing that takes place. The change of the centroids is shown as follows:

\begin{matrix} Δ & \sum_{m = 1}^{[n / k]} |\bar{d c_{m}} - d c_{m}| \\ = \sum_{m = m_{i}}^{{\bar{m}}_{i}} |\bar{d c_{m}} - d c_{m}| \\ = \frac{d_{m, k + 1} - d_{i}}{k} + \frac{d_{(m_{i} + 1) k + 1} - d_{m, k + 1}}{k} + \\ \dots + \frac{\bar{d_{i}} - d_{({\bar{m}}_{i} - 1) k + 1}}{k} \\ = \frac{\bar{d_{i}} - d_{i}}{k} = \frac{Δ}{k} \end{matrix}

when n is not a multiple of k, there are

n / k

clusters and one of them holds values between

k + 1

and

2 k - 1

. If in case 1, when the larger cluster is cluster

m_{i}

, the change of centroid of

m_{i}

is less than

Δ / k

. While if the larger cluster is another cluster, nothing will happen and it will meet the requirements of the theorem. If in case 2, a changed cluster is the larger cluster, one of the fractions in the third term of expression shown above has a denominator that is greater than k and the overall sum is less than

Δ / k

. Therefore, the lemma is set up; if the larger cluster is not affected, the lemma also holds.

In the ADPRA algorithm, given a query function

f (S_{G i}) \to A S_{d o u t}

because the node differential privacy is used in this algorithm, according to the analysis mentioned above, the sensitivity of the query function f is

Δ f = \max_{S_{G i}, S_{G i}^{^{'}}} {∥ f (S_{G i}) - f (S_{G i}^{^{'}}) ∥}_{1} = \frac{d_{max}}{k}

Therefore, due to the ranking micro-aggregation, the sensitivity of the query function decreases, so that the noise added to the degree sequence is reduced. To sum up, the ADPRA algorithm provides differential privacy for the degree sequence while maintaining data utility.

As illustrated in the Algorithm 3, line 1 sorts an out-degree sequence from small to large. After this sequence is aggregated in line 2, line 3 adds the Laplace noise to obtain a differential private out-degree sequence. In the same way, from line 4 to line 6, a differential private in-degree sequence is also got.

Algorithm 3 ADPRA algorithm

Input: an out-degree sequence $S_{d o u t}$ , an in-degree sequence $S_{d i n}$ , a privacy budget $ϵ_{2}$ , the number of elements in a cluster, k
Output: a noised out-degree sequence $N S_{d o u t}$ , a noised in-degree sequence $N S_{d i n}$
1: Sorting $S_{d o u t}$ from large to small
2: $A S_{d o u t}$ ← micro-aggregating $S_{d o u t}$
3: $N S_{d o u t}$ ← adding the Laplace noise (( $Δ f$ )/k∗ $ϵ_{2}$ ) on $A S_{d o u t}$
4: Sorting $S_{d i n}$ from large to small
5: $A S_{d i n}$ ← micro-aggregating $S_{d i n}$
6: $N S_{d i n}$ ← adding a Laplace noise (( $Δ f$ )/k∗ $ϵ_{2}$ ) on $A S_{d i n}$
7: Return a noised out-degree sequence $N S_{d o u t}$ , a noised in-degree sequence $N S_{d i n}$

4.2.3. GGM (Generating Synthetic Graph Based on Graph Modification) Algorithm

To generate a synthetic directed graph by using a noised degree sequence, the graph modification method is adopted to present the GGM algorithm which consists of three steps. In this algorithm, step 1 compares two out-degree sequences

N S_{d o u t}

and

S_{d o u t}

as well as two in-degree sequences

N S_{d i n}

and

S_{d i n}

, and records the difference between them. In step 2, some edges are added into a sub-graph

S n_{G i}

as the value of the out and in the degree of a node increases. To reduce the perturbation caused by adding edges, the nodes with increased out-degree and the nodes with increased in-degree are paired to add edges between them. In the last step, an out or in-degree sequence is selected to delete edges between these nodes in it and their neighborhood nodes. In particular, the relationships between nodes are considered when edges are added or deleted, which can preserve the original structure as much as possible. In the end, the GGM algorithm generates a differential private synthesis directed graph, which can effectively preserve the original directed graph.

The detail of the Algorithm 4 is demonstrated as follows. In line 2 to line 3, after the

N S_{d o u t}

and

S_{d o u t}

are compared,

D o_{1}

and

D o_{2}

, which record nodes and their increased/ decreased out-degree values are obtained. Then,

D i_{1}

and

D i_{2}

, which record nodes and their increased/decreased in-degree values are obtained. Starting on line 4, some edges are added into the graph

S n_{G i}

as few as possible. For each node, i in sorted

D o_{1}

, k nodes in

D i_{1}

, which are closer to node i than other nodes, are selected and edges are added between them from line 5 to 8. After that, the value in

D i_{1}

is modified from line 9 to line 11. In order to delete edges the least amount possible,

D o_{2}

and

D i_{2}

are compared in line 12. If

D o_{2}

is selected, as for each node i in

D o_{2}

, a set of nodes

I n a

, which is the intersection of

N_{a}

and

D i_{2}

, is gained. Then, if the number of

I N_{a}

is more than zero, we delete min {k,|

I N_{a}

|} edges from

S n_{G i}

; otherwise, the min {k,|

N_{a}

|}edges are removed from

S n_{G i}

. If the

D i_{2}

is chosen, some edges are deleted in the same way as from line 22 to line 29. Therefore, through the graph modification method, a synthetic directed graph is generated.

Algorithm 4 GGM algorithm

Input: $N S_{d o u t}$ , $N S_{d i n}$ , $S_{d o u t}$ , $S_{d i n}$ , a directed sub-graph $S_{G i}$ = ( $V_{i}$ , $E_{i}$ )
Output: a synthesis directed sub-graph $S n_{G i}$
1: $S n_{G i}$ ← $S_{G i}$
2: ( $D o_{1}$ , $D o_{2}$ ) ← comparing $S_{d o u t}$ with $N S_{d o u t}$
3: ( $D i_{1}$ , $D i_{2}$ ) ← comparing $S_{d i n}$ with $N S_{d i n}$
4: Sorting $D o_{1}$ and $D i_{1}$ from large to small
5: for node i in $D o_{1}$ :
6: k = $D o_{1}$ [i]
7: if | $D i_{1}$ | > 0
8: selecting k nodes $D_{i 1}$ and adding edges from node i to those k nodes in $S_{G i}$
9: modifying the values of those k nodes in $D i_{1}$
10: if the values of node j in $D i_{1}$ nodes < 0:
11: abandoning node j from $D i_{1}$
12: if sum ( $D o_{2}$ ) > sum ( $D i_{2}$ )
13: for node i in $D o_{2}$ :
14: k = $D o_{2}$ [i]
15: a set of node $N_{a}$ = neighborhood nodes of node i
16: a set of node $I N_{a}$ = intersection of $N_{a}$ and nodes in $D i_{2}$
17: if | $I N_{a}$ | > 0
18: selecting min {k,| $I N_{a}$ |} nodes from $I N_{a}$ and deleting edges from node i to those nodes
19: else:
20: selecting min {k,| $N_{a}$ |} nodes from $N_{a}$ and deleting edges from node i to those nodes
21: else:
22: for node i in $D i_{2}$ :
23: k = $D i_{2}$ [i]
24: a set of node $N a$ = predecessors nodes of node i
25: a set of node $I n a$ = intersection of Na and nodes in $D o_{2}$
26: if number of $I n a$ > 0
27: selecting min {k,| $I N_{a}$ |} nodes from $I N_{a}$ and deleting edges from node i to those nodes
28: else:
29: selecting min {k,| $N_{a}$ |} nodes from $N_{a}$ and deleting edges from node i to those nodes
30: Return $S n_{G i}$

4.2.4. Analysis of DGNDP Algorithm

Theorem 1.

The GSEM algorithm satisfies ε-differential privacy.

Proof of Theorem 1.

As discussed before, the probability to select the threshold t is

\begin{matrix} p_{r} (t) = \frac{exp (- \frac{ε U (S g, t)}{2 Δ U})}{\sum_{t^{^{'}} \in O} exp (- \frac{ε U (S g, t^{^{'}})}{2 Δ U})} \end{matrix}

In this algorithm, we assumed

S g

and

S g^{^{'}}

are neighborhood graphs, where there is one node difference between them. For any variable t, the following result is obtained.

\begin{matrix} \frac{p_{r} (E (S g, t))}{p_{r} (E (S g^{^{'}}, t))} = \frac{\frac{exp (- \frac{ε U (S g, t)}{2 Δ U})}{\sum_{t^{^{'}} \in O} exp (- \frac{ε U (S g, t^{^{'}})}{2 Δ U})}}{\frac{exp (- \frac{ε U (S g^{^{'}}, t)}{2 Δ U})}{\sum_{t^{^{'}} \in O} exp (- \frac{ε U (S g^{^{'}}, t^{^{'}})}{2 Δ U})}} \\ = (\frac{exp (- \frac{ε U (S g, t)}{2 Δ U})}{exp (- \frac{ε_{1} U (S g^{^{'}}, t)}{2 Δ U})}) \times (\frac{\sum_{t^{^{'}} \in O} exp (- \frac{ε U (s G^{^{'}}, t^{^{'}})}{2 Δ U})}{\sum_{t^{^{'}} \in O} exp (- \frac{ε U (S g, t^{^{'}})}{2 Δ U})}) \end{matrix}

\begin{matrix} \leq exp (\frac{ε}{2}) \times (\frac{\sum_{t^{^{'}} \in 0} exp (\frac{ε}{2}) \times exp (- \frac{ε U (S g, t^{^{'}})}{2 Δ U})}{\sum_{t^{^{'}} \in 0} exp (- \frac{ε U (s G, t^{^{'}})}{2 Δ U})}) \\ \leq exp (\frac{ε}{2}) \times exp (\frac{ε}{2}) \times (\frac{\sum_{t^{^{'}} \in 0} exp (- \frac{ε U (S g, t^{^{'}})}{2 Δ U})}{\sum_{t^{^{'}} \in 0} exp (- \frac{ε U (S g, t^{^{'}})}{2 Δ U})}) \\ = exp (ε) \end{matrix}

It is clear that the process of selecting the threshold t satisfies differential privacy. In addition, in view of the principle of post-processing, it satisfies differential privacy to obtain an out-degree sequence and an in-degree sequence from the input graph through the exponent mechanism. Therefore, the GSEM algorithm satisfies differential privacy. □

Theorem 2.

The ADPRA algorithm satisfies ε-differential privacy.

In this algorithm, the important task is to add the Laplace noise to the degree sequence by using differential privacy with micro-aggregation. In a graph

S g

, let S be a query function:

S g

→

D s_{1}

, where

D s_{1}

is a degree sequence from SDEM algorithm. Given that there is only one node difference between

S g

and

S g^{^{'}}

, the sensitivity of S is

\begin{matrix} Δ S & = max_{S g, S g^{'}} {|S (S g) - S (S g^{'})|}_{1} \\ = max_{S g, S g^{'}} {| D s 1 - D s 1^{^{'}} |}_{1} \\ = d_{max} \end{matrix}

After the micro-aggregation degree sequence

D m s_{1}

is obtained from

D s_{1}

, given a query function

S m

:

S g_{1}

→

D m s_{1}

, when there is only one node difference between

S g

and

S g^{^{'}}

, there is

S m (S g)

=

D m s_{1}

,

S m (S g)

=

D m s_{2}

. The sensitivity of

S m

is shown as follows.

\begin{matrix} Δ S m & = max_{S g, S g^{'}} {|S m (S g) - S m (S g^{'})|}_{1} \\ = max_{S g, S g^{'}} {| D m s 1 - D m s 2 |}_{1} \end{matrix}

According to the previous analysis, if the difference between two degree sequences is

d m a x

, the difference between two micro-aggregation degree sequences is

\frac{d m a x}{k}

, k is the number of entities contained in a cluster. Thus, the sensitivity of

S m

is

\frac{d m a x}{k}

. Because the sensitivity is reduced, less noise is added on the

D m s_{1}

, which improves the data utility of this algorithm. In summary, the ADPRA algorithm satisfies differential privacy, which is proved as follows.

Let

P r [S g]

represents the probability density function of

L A (S g, S m, ϵ

), and

P r [S g^{^{'}}]

indicates the probability density function of

L A (

Sg’

, S m, ϵ

).

\begin{matrix} \frac{p_{r} [L A (S g)]}{p_{r} [L A (S g)]} = \frac{p_{r} [D - S m (S g)]}{p_{r} [D - S m (S g^{^{'}})]} \\ = \frac{\frac{1}{2 \frac{Δ S m}{ε}} exp (- \frac{| D - S m (S g) |}{\frac{Δ S m}{ε}})}{\frac{1}{2 \frac{Δ S m}{ε}} exp (- \frac{| D - S m (S g^{^{'}}) |}{\frac{Δ S m}{ε}})} \\ = \frac{exp (- \frac{| D - S m (S g) |}{\frac{Δ S m}{ε}})}{exp (- \frac{| D - Sin (S g^{^{'}}) |}{\frac{Δ S m}{ε}})} \\ = exp (\frac{ε | D - S m (S g) |}{Δ S m} - \frac{ε | D - S m (S g^{^{'}}) |}{Δ S m}) \end{matrix}

\begin{matrix} = exp (\frac{ε (| D - S m (S g) | - | D - S m (S g^{^{'}}) |)}{Δ S m}) \\ \leq exp (\frac{ε (| S m (S g) - S m (S g^{^{'}}) |)}{Δ S m}) \\ \leq exp (\frac{ε \cdot Δ S m}{Δ S m}) = ϵ \end{matrix}

Theorem 3.

The DGNDP algorithm satisfies ε-differential privacy.

Proof of Theorem 3.

In this algorithm, each sub-graph is handled by the GSEM algorithm and ADPRA algorithm, which all satisfy differential privacy. According to the principle of sequence combination in differential privacy, each sub-graph is preserved by differential privacy. After all the sub-graphs are merged into a complete directed graph, on the basis of the principle of parallel processing in differential privacy, it is evident that the GDNDP algorithm satisfies differential privacy. □

5. Experiments and Results

In this paper, the proposed method focuses on a special directed graph, which is a simple connected directed graph without self-cycles and node attributes. In this section, five real-world data sets, which describe five directed graphs are applied to demonstrate the efficiency of the proposed method. In privacy preservation, the change rate of the edge is utilized to evaluate the performance of methods. In data utility, the metrics of the graph are used to measure the effectiveness of methods. In addition, we compare the proposed method with other methods in [5,20]. The experiments are conducted on a Laptop with an Intel i7 3.5 Ghz and 8GB RAM, which works with Windos10 and Python 2.6.

5.1. Data Sets

(1): Physicians: This directed network captures innovation spread among 246 physicians in towns in Illinois, Peoria, Bloomington, Quincy, and Galesburg. A node represents a physician and an edge between two physicians shows that the left physician told that the right physician is his friend or that he turns to the right physician if he needs advice or is interested in a discussion. There are 240 nodes and 1098 edges.
(2): Blogs: This directed network contains front-page hyperlinks between blogs in the context of the 2004 US election. A node represents a blog and an edge represents a hyperlink between two blogs. There are 1224 nodes and 19,025 edges.
(3): Wikipedia−link: This network consists of the wikilinks of Wikipedia in the Gagauz language (gag). Nodes are Wikipedia articles and directed edges are wikilinks, i.e., hyperlinks within one wiki. There are 2929 nodes and 118,603 edges.
(4): Gnutella: This is a network of Gnutella hosts from 2002. The nodes represent Gnutella hosts, and the directed edges represent connections between them. There are 12,717 nodes and 51,525 edges.
(5): Twitter lists: This directed network contains Twitter user–user following information. A node represents a user. An edge indicates that the user represented by the left node follows the user represented by the right node. There are 23,370 nodes and 1,231,177 edges.

5.2. Metrics and Parameters

5.2.1. Metrics and Parameters in Privacy Preservation

To evaluate privacy preservation, a metric, the change rate of the edge is shown as follows.

C R E = \frac{M_{e}}{S_{e}} \times 100 %

where

M_{e}

denotes the sum of all edges that are added and deleted in this method and

S_{e}

represents the sum of edges of the synthesis graph. This metric indicates how much the original graph has been modified to generate a synthesis graph. The larger

C R C

, the better privacy preservation.

Moreover, three methods including the independent (

k i

,

k o

)-degree anonymity method in [5], k-anonymity method in [20] and the GDGMP method without micro-aggregation are used to compare with the proposed method. The method in reference [5] is a k-degree anonymity method without considering the direction of edges, which minimizes changes in degree sequences as much as possible. Compared with the method in reference [5], the method in reference [20] focuses on directed graphs and provides a k-degree anonymity method. As node differential privacy can provide stronger privacy preservation than k-anonymity methods, the proposed method achieves better privacy preservation for directed graphs.

Correspondingly, the privacy budget in experiments is set as the sum of

ε

1 and

ε

2, where

ε

1 =

ε

2 =

ε

. Meanwhile,

ε

is in [0.2, 0.5, 1.0, 1.5, 2] and k, the number of elements in a cluster, is the integer between 2 and 5, which is also used in k-anonymization. Due to the uncertainty of the noise, all data sets are executed 10 times by using the proposed method and other methods to average out the results.

5.2.2. Metrics and Parameters in Data Utility

In the graph structure measure, the edge intersection

E I

is the ratio of the edges in the original graph to edges in the perturbed graph, as shown below.

E I = \frac{| E \cap E^{^{'}} |}{m a x (| E |, | E^{^{'}} |} \times 100 %

In the properties of nodes, the betweenness centrality(Cb) is the fraction of the shortest paths that go through each node. Then, the closeness centralities based on the in-degree(in-Cc) and out-degree(out-Cc) are used to measure how many steps are required to access every other node from a given node.

5.3. Results and Discussion

5.3.1. Analysis of Privacy Preservation

At first, the proposed method is conducted in the five data sets and the results are kept in Table 1. As shown in Table 1, when k is 3 and

ε

is 1, the value of CRE in the Hamsterster friendships data set is 47.62, while that in the Gnutella data set is 49.76. In particular, the value of CRE increases along with the decrease of

ε

when k is fixed. For example, when k is 3, the value of CRE in the Wikipedia−link data set increases from 25.74 to 69.13 with

ε

decreasing from 2 to 0.2, while that in the Gnutella data set also changes from 36.32 to 71.38. The results show that the smaller the

ε

, the larger CRE, which indicates that the proposed method can gain better privacy preservation for data sets. In Table 2, when

ε

is 1, if the k increases from 2 to 5, the value of CRE in Wikipedia-link will decrease from 50.89 to 36.74, as does that in other data sets, which indicates that the value of k can affect the privacy preservation. It is clearer that the smaller k, the better privacy preservation.

Then, the performance of the proposed method in Gnutella and Twitter lists is illustrated in Figure 4 and Figure 5. In Figure 4, with

ε

increasing from 0.2 to 2, the value of CRE decreases from about 70 to about 30 regardless of the value of k, which indicates that the

ε

controls the degree of privacy preservation. In addition, no matter what the value of

ε

is, the value of CRE increases with k decreasing from 5 to 2, which shows that the micro-aggregation can control privacy preservation. In addition, the same results as that in Figure 4 are demonstrated in Figure 5, which implies that the proposed method also can be applied in the big network.

In the end, the proposed method is compared with other methods, and the results are illustrated in Figure 6 and Figure 7. In Figure 6, when

ε

is 1, the value of CRE obtained by the proposed method is larger than that in (ki,ko)-degree anonymity method and k-anonymity method regardless of the value of k. Although the values in the (ki,ko)-degree anonymity method and k-anonymity method increase with the value of k rising, the proposed method provides better privacy preservation than these two methods. However, compared with the value of CRE in the proposed method without micro-aggregation, the value of CRE in the proposed method is smaller regardless of the value of k, which shows that the micro-aggregation can weaken privacy preservation. Moreover, in the data set Twitter lists, the same results as that in Figure 6 are shown in Figure 7. Therefore, the proposed method provides better privacy preservation than the two anonymity methods and has better data utility than the method without micro-aggregation.

To sum up, the experiment results show that the proposed method can preserve directed graphs. In addition, micro-aggregation can be applied to control privacy preservation in this method.

5.3.2. Analysis of Data Utility

As shown in Figure 8, when the

ε

is 1, with k rising, the values of EI in five data sets all increase, which means more and more edges in the original graph retained in the synthetic directed graph generated by the proposed method. In Figure 9, as the

ε

is 1, the value of Δav-Cb decreases gradually with the value of k increasing. The result indicates the property of nodes in the synthetic directed graph is close to that of the original directed graph. As illustrated in Figure 10 and Figure 11, it is clear that the Δav-in Cc and the Δ av-out Dc decline with k rising. To sum up, the results show that the proposed method can provide effective data utility.

In particular, with the size of the network and the amount of data increasing, the computational overhead significantly increases because more and more edges and nodes are modified. In order to preserve a large directed graph, it is divided into many sub-graphs. Compared with the original directed graph, each sub-graph is much smaller. Therefore, the proposed algorithms can be well applied in these sub-graphs. In real-world deployments, for the scalability of the method, the scalability of the Louvain algorithm is mainly considered, which determines the scalability of the proposed method.

6. Conclusions

In this paper, to preserve directed graphs in MWNs, the DGNDP method is designed, which combines node differential privacy and graph modification. In this method, as node differential privacy can provide stronger privacy preservation than edge differential privacy and graph modification, it is used to add noise on degree sequences. Then, edge modification utilizes noised degree sequences to generate a synthetic directed graph, which can strongly preserve the original directed graph. Additionally, to improve data utility, the original directed graph is divided into many sub-graphs, and the perturbations are only added in each sub-graph. In particular, the exponent mechanism is adopted to truncate degree sequences, which can ensure that the minimum noise is added to the degree sequences. Moreover, the ranking micro-aggregation effectively reduces the noise added to the degree sequences. According to the noised degree sequences, the relationship between two nodes is utilized to modify the edges of nodes, which can retain the original graph structure. Moreover, the theoretical analysis and the performance of experiments show that the DGNDP method not only satisfies

ϵ

-differential privacy but also retains data utility.

In this paper, we only focus on the simple static directed graph without considering node attributes. However, node attributes play an important role in the directed graphs. Thus, in the future, we will concentrate on the application of node differential privacy in complex attribute graphs. In addition, there is still a demand to achieve privacy preservation for dynamic directed graphs.

Author Contributions

Conceptualization, J.Y. and Y.Z.; methodology, J.Y.; software, J.Y.; validation, J.Y., Y.Z. and L.L.; formal analysis, J.Y.; investigation, J.Y.; resources, J.Y.; data curation, J.Y.; writing—original draft preparation, J.Y.; writing—review and editing, Y.Z.; visualization, J.Y.; supervision, Y.Z.; project administration, L.L.; funding acquisition, L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (62001273, 61962033) the Fundamental Research Funds for the Central Universities (No. 2019CBLY004 and No. GK201903091), the Scientific and Technological Project of Shangluo (No. 2021-C-0004) and Shangluo Universities Key Disciplines Project, Discipline name: Mathematic.

Conflicts of Interest

The authors declare no conflict of interest.

References

Letaief, K.B.; Chen, W.; Shi, Y.; Shi, Y.; Zhang, J.; Zhang, Y.J.A. The roadmap to 6G: AI empowered wireless networks. IEEE Commun. Mag. 2019, 57, 84–90. [Google Scholar] [CrossRef] [Green Version]
Sharma, T.; Chehri, A.; Fortier, P. Review of optical and wireless backhaul networks and emerging trends of next generation 5G and 6G technologies. Trans. Emerg. Telecommun. Technol. 2021, 32, e4155. [Google Scholar] [CrossRef]
Van Hoboken, J.; Fathaigh, R.O. Smartphone platforms as privacy regulators. Comput. Law Secur. Rev. 2021, 41, 105557. [Google Scholar] [CrossRef]
Weichbroth, P.; Łysik, Ł. Mobile security: Threats and best practices. Mob. Inf. Syst. 2020, 2020, 8828078. [Google Scholar] [CrossRef]
Liu, K.K.; Terzi, E. Towards Identity Anonymization on Graphs. In Proceedings of the ACM SIGMOD International Conference on Management of Data, Vancouver, BC, Canada, 9–12 June 2008. [Google Scholar]
Casas-Roma, J.; Herrera-Joancomartı, J.; Torra, V. A survey of graph-modification techniques for privacy-preserving on networks. Artif. Intell. Rev. 2017, 47, 341–366. [Google Scholar] [CrossRef]
Ying, X.; Wu, X. Randomizing Social Networks: A Spectrum Preserving Approach. In Proceedings of the SIAM International Conference on Data Mining, SDM, Atlanta, GA, USA, 24–26 April 2008. [Google Scholar]
Mortazavi, R.; Erfani, S.H. GRAM: An efficient (k, l) graph anonymization method. Expert Syst. Appl. 2020, 153, 113454. [Google Scholar] [CrossRef]
Tai, C.H.; Yu, P.S.; Yang, D.N.; Chen, M.S. Privacy-Preserving Social Network Publication Against Friendship Attacks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 April 2011. [Google Scholar]
Zhou, B.; Pei, J.; Luk, W.S. A Brief Survey on Anonymization Techniques for Privacy Preserving Publishing of Social Network Data. ACM Sigkdd Explor. Newsl. 2008, 10, 12–22. [Google Scholar] [CrossRef]
Tian, Y.L.; Zhang, Z.Y.; Xiong, J.; Chen, L.; Ma, J.F. Achieving graph clustering privacy preservation based on structure entropy in social IoT. IEEE Internet Things J. 2021, 9, 2761–2777. [Google Scholar] [CrossRef]
Zhang, H.; Lin, L.; Xu, L.; Wang, X. Graph partition based privacy-preserving scheme in social networks. J. Netw. Comput. Appl. 2021, 195, 103214. [Google Scholar] [CrossRef]
Dwork, C. Differential Privacy. In International Colloquium on Automata, Languages, and Programming; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
Jiang, H.; Pei, J.; Yu, D.; Yu, J.; Gong, B.; Cheng, X. Applications of differential privacy in social network analysis: A survey. IEEE Trans. Knowl. Data Eng. 2021, 35, 108–127. [Google Scholar] [CrossRef]
Lan, S.; Xin, H.; Yingjie, W.; Yongyi, G. Sensitivity reduction of degree histogram publication under node differential privacy via mean filtering. Concurr. Comput. Pract. Exp. 2021, 33, e5621. [Google Scholar] [CrossRef]
Cheng, X.; Su, S.; Xu, S.; Xiong, L.; Xiao, K.; Zhao, M. A two-phase algorithm for differentially private frequent subgraph mining. IEEE Trans. Knowl. Data Eng. 2018, 30, 1411–1425. [Google Scholar] [CrossRef] [PubMed]
Ding, X.; Zhang, X.; Bao, Z.; Jin, H. Privacy-Preserving Triangle Counting in Large Graphs. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018. [Google Scholar]
Karwa, V.; Slavković, A. Inference using noisy degrees: Differentially private β-model and synthetic graphs. Ann. Stat. 2016, 44, 87–122. [Google Scholar] [CrossRef]
Iftikhar, M.; Wang, Q.; Lin, Y. dK-Microaggregation: Anonymizing Graphs with Differential Privacy Guarantees. In Pacific-Asia Conference on Knowledge Discovery and Data Mining; Springer: Cham, Switzerland, 2020. [Google Scholar]
Casas-Roma, J.; Salas, J.; Malliaros, F.D.; Vazirgiannis, M. k-Degree anonymity on directed networks. Knowl. Inf. Syst. 2019, 61, 1743–1768. [Google Scholar] [CrossRef]
Zhang, X.L.; Liu, J.; Bi, H.J.; Li, J.; Wang, A.Y. Personalized K-InOut-Degree Anonymity Method for Large-scale Social Networks Based on Hierarchical Community Structure. Int. J. Netw. Secur. 2021, 23, 314–325. [Google Scholar]
Casas-Roma, J. Privacy-Preserving on Graphs Using Randomization and Edge-Relevance. In Proceedings of the Modeling Decisions for Artificial Intelligence, Tokyo, Japan, 29–31 October 2014. [Google Scholar]
Yu, F.; Chen, M.; Yu, B.; Li, W.; Ma, L.; Gao, H. Privacy preservation based on clustering perturbation algorithm for social network. Multimed. Tools Appl. 2018, 77, 11241–11258. [Google Scholar] [CrossRef]
Boldi, P.; Bonchi, F.; Gionis, A.; Tassa, T. Injecting uncertainty in graphs for identity obfuscation. Proc. Vldb Endow. 2012, 5, 1376–1387. [Google Scholar] [CrossRef] [Green Version]
Hu, J.; Yan, J.; Liu, Z.W.Z.H.; Zhou, Y.H. A Privacy-Preserving Approach in Friendly-Correlations of Graph Based on Edge-Differential Privacy. J. Inf. Sci. Eng. 2019, 35, 821–837. [Google Scholar]
Macwan, K.R.; Patel, S.J. k-NMF Anonymization in Social Network Data Publishing. Computer J. 2018, 61, 601–613. [Google Scholar] [CrossRef]
Medková, J. Anonymization of Geosocial Network Data by the (k, l)-Degree Method with Location Entropy Edge Selection. In Proceedings of the 15th International Conference on Availability, Reliability and Security, Online, 25–28 August 2020. [Google Scholar]
Kadhiwala, B.; Patel, S.J. A Novel k-Anonymization Approach to Prevent Insider Attack in Collaborative Social Network Data Publishing. In Proceedings of the International Conference on ISS, Hyderabad, India, 16–20 December 2019. [Google Scholar]
Iftikhar, M.; Wang, Q. dK-Projection: Publishing Graph Joint Degree Distribution with Node Differential Privacy. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Delhi, India, 11–14 May 2021; Springer: Cham, Switzerland, 2021. [Google Scholar]
Sun, H.; Xiao, X.; Khalil, I.; Yang, Y.; Qin, Z.; Wang, H.; Yu, T. Analyzing Subgraph Statistics from Extended Local Views with Decentralized Differential Privacy. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019. [Google Scholar]
Lv, T.; Li, H.; Tang, Z.; Fu, F.; Cao, J.; Zhang, J. Publishing Triangle Counting Histogram in Social Networks Based on Differential Privacy. Secur. Commun. Netw. 2021, 2021, 7206179. [Google Scholar] [CrossRef]
Task, C.; Clifton, C. What Should We Protect? Defining Differential Privacy for Social Network Analysis. In State of the Art Applications of Social Network Analysis; Springer: Cham, Switzerland, 2014. [Google Scholar]
Karwa, V.; Slavković, A.B. Differentially Private Graphical Degree Sequences and Synthetic Graphs. In Proceedings of the International Conference on Privacy in Statistical Databases, Palermo, Italy, 26–28 September 2012. [Google Scholar]
Qin, Z.; Yu, T.; Yang, Y.; Khalil, I.; Xiao, X.; Ren, K. Generating Synthetic Decentralized Social Graphs with Local Differential Privacy. Proceedings of Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017. [Google Scholar]
Xueqin, Z.; Qianru, Z.; Chunhua, G. Published Weighted Social Networks Privacy Preservation Based on Community Division. In Proceedings of the Conference on Communication and Network Security, Tokyo, Japan, 24–26 November 2017. [Google Scholar]

Figure 1. The undirected graph and the directed graph.

Figure 2. The example of neighboring graph of node.

Figure 3. The framework of method.

Figure 4. The comparison in the different privacy budgets in Wikipedia.

Figure 5. The comparison of the different privacy budgets in Twitter lists.

Figure 6. The comparison of the different methods in Wikipedia.

Figure 7. The comparison of the different methods in Twitter lists.

Figure 8. The comparison of EI in the different data sets.

Figure 9. The comparison of betweenness in the different data sets.

Figure 10. The comparison of the average of in Cc in the different data sets.

Figure 11. The comparison of the average of our Dc in the different data sets.

Table 1. The value of CRC in proposed method when k = 3.

K	$ε$	Physicians	Hamsterster Friendships	Wikipedia−Link	Gnutella	Twitter Lists
3	0.2	72.32	74.38	69.13	71.38	68.18
3	0.5	61.25	58.23	57.79	60.32	56.67
3	1	50.13	47.62	57.21	49.76	45.23
3	1.5	43.62	41.87	38.11	43.98	40.62
3	2	38.78	36.05	25.74	36.32	34.17

Table 2. The value of CRC in proposed method when

ε

= 1.

Table 2. The value of CRC in proposed method when

ε

= 1.

K	$ε$	Physicians	Hamsterster Friendships	Wikipedia−Link	Gnutella	Twitter Lists
2	1	56.39	52.14	50.89	54.67	51.22
3	1	50.13	47.62	45.21	49.76	45.23
4	1	45.82	43.68	40.31	45.33	41.87
5	1	41.22	39.98	36.74	40.79	38.49

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yan, J.; Zhou, Y.; Lu, L. A Node Differential Privacy-Based Method to Preserve Directed Graphs in Wireless Mobile Networks. Appl. Sci. 2023, 13, 8089. https://doi.org/10.3390/app13148089

AMA Style

Yan J, Zhou Y, Lu L. A Node Differential Privacy-Based Method to Preserve Directed Graphs in Wireless Mobile Networks. Applied Sciences. 2023; 13(14):8089. https://doi.org/10.3390/app13148089

Chicago/Turabian Style

Yan, Jun, Yihui Zhou, and Laifeng Lu. 2023. "A Node Differential Privacy-Based Method to Preserve Directed Graphs in Wireless Mobile Networks" Applied Sciences 13, no. 14: 8089. https://doi.org/10.3390/app13148089

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Node Differential Privacy-Based Method to Preserve Directed Graphs in Wireless Mobile Networks

Abstract

1. Introduction

2. Related Works

3. Preliminaries Knowledge

4. The Proposed Method

4.1. The Framework of Method

4.2. DGNDP (Synthetic Directed Graph Based on Node Differential Privacy) Algorithm

4.2.1. GSEM (Generating Degree Sequence Based on Exponent Mechanism) Algorithm

4.2.2. ADPRA (Adding Noise Based on Differential Privacy with the Ranking Micro-Aggregation) Algorithm

4.2.3. GGM (Generating Synthetic Graph Based on Graph Modification) Algorithm

4.2.4. Analysis of DGNDP Algorithm

5. Experiments and Results

5.1. Data Sets

5.2. Metrics and Parameters

5.2.1. Metrics and Parameters in Privacy Preservation

5.2.2. Metrics and Parameters in Data Utility

5.3. Results and Discussion

5.3.1. Analysis of Privacy Preservation

5.3.2. Analysis of Data Utility

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI