A Practical Multiparty Private Set Intersection Protocol Based on Bloom Filters for Unbalanced Scenarios

Ruan, Ou; Yan, Changwang; Zhou, Jing; Ai, Chaohao

doi:10.3390/app132413215

Open AccessArticle

A Practical Multiparty Private Set Intersection Protocol Based on Bloom Filters for Unbalanced Scenarios

¹

School of Computer Science, Hubei University of Technology, Wuhan 430068, China

²

Digital Art Industry Institute, Hubei University of Technology, Wuhan 430068, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2023, 13(24), 13215; https://doi.org/10.3390/app132413215

Submission received: 27 October 2023 / Revised: 9 December 2023 / Accepted: 9 December 2023 / Published: 13 December 2023

Download

Browse Figures

Versions Notes

Abstract

:

Multiparty Private Set Intersection (MPSI) is dedicated to finding the intersection of datasets of multiple participants without disclosing any other information. Although many MPSI protocols have been presented, there are still some important practical scenarios that require in-depth consideration such as an unbalanced scenario, where the server’s dataset is much larger than the clients’ datasets, and in cases where the number of participants is large. This paper proposes a practical MPSI protocol for unbalanced scenarios. The protocol uses the Bloom filter, an efficient data structure, and the ElGamal encryption algorithm to reduce the computation of clients and the server; adopts randomization technology to solve the encryption problem of the 0s in the Bloom filter; and introduces the idea of the Shamir threshold secret-sharing scheme to adapt to multiple environments. A formal security proof and three detailed experiments are given. The results of the experiments showed that the new protocol is very suitable for unbalanced scenarios with a large number of participants, and it has a significant improvement in efficiency compared with the typical related protocol (TIFS 2022).

Keywords:

Multiparty Private Set Intersection; unbalanced scenario; ElGamal cryptography; Shamir threshold key-sharing scheme; Bloom filter

1. Introduction

In the era of Big Data, the intersection of large amounts of data becomes inevitable. Private Set Intersection (PSI) addresses the current need for a solution to compute the intersection of two or more datasets without disclosing information about the data of each party, except for the intersection itself. PSI has wide applications such as social discovery [1,2,3], document-like detection [4], joint learning in neural network models [5], suspect detection [6], privacy-preserving data mining [7], privacy-preserving retrieval systems [8], cloud-based applications [9], and so on. Many high-performance PSI protocols have been proposed recently [10,11,12,13,14,15,16]. Additionally, there are some PSI protocols [10,17,18,19,20,21] specifically designed for unbalanced scenarios, where the server has a large dataset and the client has a dataset much smaller than the server’s size.

Multiparty Private Set Intersection (MPSI) protocols are designed to solve the intersection problem among multiple participants. Traditional approaches for designing MPSI protocols mainly use the homomorphic encryption technology of public keys [22,23,24,25,26,27,28] and oblivious transfer methods [29,30,31,32,33,34,35]. For MPSI protocols, the unbalanced scenario is also a common real-life situation in which the server possesses a large dataset, while multiple clients have smaller datasets. For example, some WeChat (version 3.9.8) users want to find their mutual friends in order to create a friend circle or group chat. In this scenario, there are a server that has information about all registered users and their phone numbers and multiple clients with their own phone address books who want to perform MPSI with the server in order to find common phone numbers among them. We refer to this as a Multiparty Private Set Intersection for an unbalanced scenario, where the clients’ address books consist of small datasets, while the server’s database is very large.

However, most of the existing MPSI protocols are not applicable to such unbalanced scenarios. For example, the protocol in [22] can only derive the approximate number of intersections; the protocols in [23,24] require the participants’ datasets to be of equal size; the protocol in [25] is only applicable to three participants and small datasets; the protocol in [28] can be applied to such scenarios, but the computation for the clients and the server is high.

This paper presents a practical MPSI protocol for unbalanced scenarios in which the server with high computational power handles the majority of the computations, while the clients only need to perform a small amount of computation. Moreover, the protocol has a minimal impact on the execution time of clients as the number of participants increases.

1.1. Related Works

Freedman et al. [22] proposed the first MPSI protocol based on the additive homomorphic encryption technique. This protocol represents the set elements as the roots of polynomials and implements two-party PSI under the semi-honest adversary model. Additionally, they introduced the idea of constructing a Multiparty PSI protocol for the case of malicious adversaries. In 2005, Kissner et al. [23] used additive homomorphic encryption and private key secret-sharing techniques to design an MPSI protocol, whose computational and communication complexity was twice the size of the set and the number of participants. The computational complexity of such algorithms does not meet the requirements of practical applications. After that, reducing the computational and communication overhead of MPSI protocols based on homomorphic cryptography has become an important research goal.

In 2017, Miyaji et al. [26] gave an MPSI protocol based on Bloom filters and homomorphic encryption. This protocol maps the set into a Bloom filter, encrypts it, and performs a homomorphic multiplication operation on the encrypted Bloom filter. The intersection set is then obtained by comparing the results based on the corresponding decryption operation. However, the protocol of Miyaji et al. [26] has a rather obvious drawback that the Bloom filter to be encrypted by each participant is equally large, even if the original set is small.

In the same year, Davidson et al. [27] also proposed a PSI protocol based on Bloom filters and Paillier homomorphic encryption, which overcomes the drawback of Miyaji et al. [26]. In the protocol [27], the operations of mapping the aggregate data and encrypting the Bloom filter are basically the same as in [26]. The difference is that the participant who wants to obtain the intersection maps his/her aggregate data into the encrypted Bloom filter using the same hash functions as those used in the mapping. The intersection is obtained by performing the interactive decryption operation and utilizing the homomorphic properties of the Paillier encryption scheme.

In 2022, Bay et al. [28] extended the protocol [27] to multiple participants by performing threshold Paillier additive homomorphic encryptions on the Bloom filters of each of them. They conducted interactive decryption operations, utilized homomorphic properties, and obtained the intersection as long as the number of decryptors exceeded the threshold. Although their protocol has a longer runtime, it is an open-source Multiparty PSI protocol, and it has a greater advantage in scenarios where multiple participants have small datasets.

1.2. Our Contribution

This paper presents a secure MPSI protocol based on the Bloom filter and Shamir threshold secret-sharing scheme for a large number of participants and unbalanced scenarios.

Our protocol follows the methods of [27,28], which were based on the Bloom filter and homomorphic encryption. The Bloom filter is an efficient data structure, which can be used to reduce the computation for both clients and server. The main difference is that we used the relatively efficient ElGamal multiplicative homomorphic encryption algorithm instead of the Paillier additive homomorphic encryption algorithm. The ElGamal encryption algorithm cannot encrypt 0. To address this issue, we randomized the 0s in the Bloom filter, i.e., we obtained the encrypted Bloom filter by following this process: if an item in the Bloom filter is 1, it is encrypted directly; otherwise, if the item is 0, we select a random value to represent it and, then, encrypt the random value. To ensure that participants can obtain the intersection properly, we adopted the idea of the Shamir t-threshold secret-key-sharing scheme, in which the private key is divided into n copies and at least t shares are needed for decryption.

The contributions of this paper are as follows:

(1) A secure MPSI protocol based on the Bloom filter and threshold ElGamal encryption scheme for unbalanced scenarios is proposed. In this protocol, the runtime of the client hardly varies with the number of participants, and it is linearly correlated with the data size of the server.

(2) We present three comprehensive experiments, and the results showed that, in our protocol, the number (t) of participants has a minimal influence on the computation and runtime of the client. When

t \geq 2^{4}

, the client’s runtime is approximately

1 / (2^{l o g (t)})

of the server’s runtime. Therefore, our protocol is highly suitable for unbalanced scenarios with a large number of participants. Furthermore, our protocol exhibited significant efficiency improvement compared to the related protocol [28], and the sever and clients’ runtimes were approximately

1 / 6

of [28].

(3) We provide a formal security proof of our protocol against semi-honest adversaries.

2. Materials and Methods

This section first describes the notations used in the paper and, then, provides an overview of the preliminaries about the protocol.

2.1. Notations

We show the notations used in the paper in Table 1.

2.2. Preliminaries

2.2.1. ElGamal Encryption Algorithm

In 1985, Taher ElGamal [36] proposed an asymmetric encryption algorithm based on the Diffie–Hellman key exchange. The security of the system primarily relies on the difficulty of solving the discrete logarithm problem in a finite field. The ElGamal encryption algorithm can be divided into three main parts: key generation, encryption, and decryption.

Key generation: First, randomly select a large prime number, p, such that

p - 1

has large prime factors. Next, choose a primitive element g modulo p. Finally, choose d (

2 \leq d \leq p - 2

) as the private key, then

y = g^{d}

m o d

p is the public key.

Encryption: Suppose the plaintext is x. Calculate the ciphertext pair

C_{1} = g^{r}

m o d

p and

C_{2} = x \cdot y^{r}

m o d

p, where r is a random number with

2 \leq r \leq p - 2

.

Decryption: Compute the plaintext

x = \frac{C_{2}}{C_{1}^{d}} = \frac{x \cdot y^{r}}{g^{r \cdot d}} = \frac{x \cdot g^{r \cdot d}}{g^{r \cdot d}}

m o d

p.

Multiplicative homomorphism: Assuming that the encrypted plaintexts are

m_{1}

and

m_{2}

, the ciphertext pair generated by encrypting

m_{1}

is

(C_{1}^{1}, C_{2}^{1})

, and the ciphertext pair generated by encrypting

m_{2}

is

(C_{1}^{2}, C_{2}^{2})

, where

C_{1}^{1} = g^{r_{1}}

mod p,

C_{2}^{1} = m_{1} \cdot g^{d \cdot r_{1}}

mod p,

C_{1}^{2} = g^{r_{2}}

mod p,

C_{2}^{2} = m_{2} \cdot g^{d \cdot r_{2}}

mod p, then:

\begin{matrix} C_{1}^{1} \times C_{1}^{2} & = g^{r_{1}} \times g^{r_{2}} = g^{r_{1} + r_{2}} m o d p \end{matrix}

(1)

\begin{matrix} C_{2}^{1} \times C_{2}^{2} & = m_{1} \cdot g^{d \cdot r_{1}} \times m_{2} \cdot g^{d \cdot r_{2}} \\ = m_{1} \cdot m_{2} \cdot g^{d \cdot (r_{1} + r_{2})} m o d p \end{matrix}

(2)

Therefore, two ciphertexts encrypted by random numbers

r_{1}

and

r_{2}

are modulo multiplied equivalent to the result of multiplying the corresponding plaintexts of these two ciphertexts choosing the random numbers

r_{1} + r_{2}

for encryption, and they decrypt the same result as shown in Equation (3):

D e c (E n c (m_{1} \cdot m_{2})) = D e c (E n c (m_{1}) \cdot E n c (m_{2}))

(3)

2.2.2. Bloom Filters

A Bloom filter,

B F = {B F [1], B F [2], \dots, B F [m]}

, is an array of bits of length m with random k hash functions

{h_{1}, h_{2}, \dots, h_{k}}

, which was proposed by Bloom [37] in 1970. According to the analysis by Dong [38], when a set’s size is n and the error rate is

P_{e r r}

, we can generally set m and k as follows:

m \geq - n \cdot {log}_{2} e \cdot l o g_{2} P_{e r r}, k \geq - l o g_{2} P_{e r r}

(4)

Bloom filter for dataset X: Each bit of the Bloom filter is initialized to 0. Each

x_{i}

in the dataset

X = {x_{1}, \dots, x_{n}}

is hashed k times as (

h_{1} (x_{i}), \dots, h_{k} (x_{i})

), and set

B F [h_{u} (x_{i})] = 1

for

u \in {1, 2, \dots, k}

.

Verify whether an element x belongs to a set X: If

B F [h_{u} (x)] = 1

is held for each u,

u \in {1, 2, \dots, k}

, x is considered to belong to the set X at an acceptable error rate

P_{e r r}

.

Randomized Bloom filter (

R B F

): The

B F

after performing the following operation is denoted as the

R B F

: for each element

B F [i]

of the

B F

,

i \in {1, 2, \dots, m}

, modify

B F [i] = r a n d ()

if

B F [i] = 0

; if

B F [i] = 1

, then no other operation is performed.

Encrypted randomized Bloom filters (

E R B F

): The encrypted randomized Bloom filter is denoted as

E R B F

, where

E R B F = {E n c (R B F [1]), E n c (R B F [2]), \dots, E n c (R B F [m])}

.

2.2.3. Lagrange Interpolation

Lagrange interpolation is a polynomial interpolation method named after Joseph Lagrange. Given n points (

(x_{1}, y_{1}), \dots, (x_{n}, y_{n})

), a polynomial passing through all points and having a polynomial degree at most

n - 1

can be expressed as follows:

f (x) = \sum_{j = 1}^{n} y_{j} \cdot f_{j} (x)

(5)

where

f_{j} (x) = \prod_{i = 1, i \neq j}^{n} \frac{x - x_{i}}{x_{j} - x_{i}}

(6)

then

f (x) = \sum_{i = 1}^{n} (y_{i} \cdot \prod_{j = 1, j \neq i}^{n} \frac{x - x_{j}}{x_{i} - x_{j}})

(7)

2.2.4. Shamir Threshold Secret-Sharing Scheme

The Shamir threshold secret-sharing scheme [39], also known as Shamir secret sharing, is a threshold secret-sharing scheme based on the Lagrange interpolation formula.

There are two important parameters in the threshold secret-sharing scheme: t and n. In this scheme, n represents the number of parties involved in splitting the secret, and t is the minimum number of participants needed to recover the secret. The basic principle of the threshold secret-sharing scheme is as follows: a secret s is divided into n sub-secrets, and each party holds one sub-secret. When the secret s needs to be restored, at least t parties need to retrieve the sub-secret and restore the secret s.

The Shamir threshold secret-sharing scheme is as follows:

(i) Choose a large prime p, and assume that

s (s \in Z_{p}^{*})

is the secret that requires at least t parties to recover.

(ii) Construct a random-degree

t - 1

polynomial

f (x) = a_{0} + a_{1} \cdot x + \dots + a_{t - 1} \cdot x^{t - 1}

, where

a_{0} = s

and

a_{1}, \dots, a_{t - 1} \in Z_{p}^{*}

.

(iii) Each participant i,

i \in {1, \dots, n}

, holds a sub-secret

(i, f (i))

, which can be viewed as a point

(x_{i}, y_{i})

in the Lagrangian interpolation formula.

(iv) Since

a_{0} = s

, calculating

f (0) = a_{0}

restores the secret. Using Lagrange interpolation,

f (x)

is restored with the sub-secrets of any t parties, and then,

f (0)

is computed as follows:

f (x) = \sum_{i = 1}^{t} (f (i) \cdot \prod_{j = 1, j \neq i}^{t} \frac{x - j}{i - j}) m o d p

(8)

f (0) = \sum_{i = 1}^{t} (f (i) \cdot \prod_{j = 1, j \neq i}^{t} \frac{j}{j - i}) m o d p

(9)

2.2.5. Threshold ElGamal Encryption Scheme

Based on the Shamir secret-sharing scheme [39] and the threshold Paillier encryption scheme proposed by Bay et al. [28], we present a threshold ElGamal encryption scheme. The basic idea of the threshold ElGamal encryption scheme is to divide the private key of the ElGamal encryption algorithm into n copies using the Shamir threshold secret-sharing scheme. Decryption can then be achieved when at least t copies are gathered together.

For a

(t, n)

threshold ElGamal encryption algorithm, where n is the total number of participants and t is the minimum number of participants required for decryption, the basic scheme is described as follows:

Key generation: Randomly select a large prime p, and then, choose a prime element g modulo p. Randomly pick a

s k = d

as the private key for the ElGamal algorithm such that

2 \leq d \leq p - 2

. Compute

y = g^{d}

m o d

p and

p k = y

as the public key. The private key is shared as follows: let

a_{0} = d

, and generate the polynomial

f (x) = \sum_{i = 0}^{t - 1} a_{i} x^{i}

m o d

p; the key sharingof the j-th participant is

s k_{j} = f (j)

mod p, where

j \in {1, \dots, n}

.

Encryption: Assume that the message to be encrypted is M. Choose a random

r \in Z_{p}^{*}

, and compute the ciphertexts

c_{1} = g^{r}

and

c_{2} = M y^{r} = M g^{d r}

.

Share decryption: Among the t participants involved in decryption, the i-th participant calculates

c_{1, i} = {(c_{1})}^{Δ_{i} \cdot s k_{i}}

, where

Δ_{i} = \prod_{j = 1, j \neq i}^{t} \frac{j}{j - i}

.

Combination calculation (Comb): The results of decryption are obtained by performing ElGamal homomorphic multiplication operations on the shared decryption results of the t participants:

\prod_{i = 1}^{t} c_{1, i} = {(c_{1})}^{\sum_{i = 1}^{t} Δ_{i} \cdot f (i)}

(10)

From the Shamir threshold secret-sharing scheme (Section 2.2.4), it is known that restoring the key

s k = d

is calculated as follows:

s k = f (0) = d = \sum_{i = 1}^{t} (f (i) \cdot \prod_{j = 1, j \neq i}^{t} \frac{j}{j - i}) = \sum_{i = 1}^{t} Δ_{i} \cdot f (i)

(11)

where

Δ_{i} = \prod_{j = 1, j \neq i}^{t} \frac{j}{j - i}

.

Thus, the decryption result can be obtained as follows:

M = \frac{c_{2}}{c_{1}^{d}} = \frac{c_{2}}{{(c_{1})}^{\sum_{i = 1}^{t} Δ_{i} \cdot f (i)}} = \frac{c_{2}}{\prod_{i = 1}^{t} c_{1, i}}

(12)

2.2.6. Security Model

Negligible function: For a security parameter

λ

, we call a function

n e g l (λ)

negligible if it satisfies the following condition:

n e g l (λ) < \frac{1}{p (λ)}

for all possible polynomials p and sufficiently large

λ

.

Computational indistinguishability: For a sufficiently large

λ

,

X = {X_{λ}}_{λ \in N}

and

Y = {Y_{λ}}_{λ \in N}

represent two distributions of length

λ

. X and Y are computationally indistinguishable if, for any possible polynomial-time algorithm T, the following condition is satisfied:

| P r [T (X_{λ}) \to 1] - P r [T (Y_{λ}) \to 1] | \leq n e g l (λ)

(13)

where

P r [E]

represents the probability of the event E occurring.

We call it

X ≃ Y

.

Semi-honest model: Each participant strictly adheres to the normal operation of the protocol, including the inputs, outputs, and intermediate processes. In the semi-honest adversary model, however, an adversary attempts to obtain any information about other participants.

Security model under semi-honest adversaries: In a general multiparty computation, one of the participants is called the server

P_{t}

, and the others are called the clients

(P_{1}, \dots, P_{t - 1})

. Let

S = S_{1}, \dots, S_{t}

represent the set of inputs from the clients and the server and

f (S) = (⊥, \cap)

represent a function in which the server

P_{t}

obtains the intersection and the clients obtain nothing. Assume a protocol

\prod_{t}

with t participants to compute the function

f (S)

. During the execution of the protocol

\prod_{t}

, the server’s view is

V i e w_{P_{t}}^{\prod_{t}} = (S_{t}, r_{t}, ℓ_{t}, o u t_{t})

, and the clients’ views are

V i e w_{P_{i}}^{\prod_{t}} = (S_{i}, r_{i}, ℓ_{i}, o u t_{i})

, in which

i \in {1, \dots, t - 1}

,

r_{i}

and

r_{t}

represent the random numbers generated by each of the clients

P_{i}

and server

P_{t}

,

ℓ_{i}

and

ℓ_{t}

are the messages received by each of the clients

P_{i}

and server

P_{t}

, respectively, and

o u t_{i}

and

o u t_{t}

represent the outputs of each of the clients

P_{i}

and server

P_{t}

.

If

\prod_{t}

satisfies the following conditions: there exist polynomial-time simulators

S i m_{t} (S_{t}, \cap)

and

S i m_{i} (S_{i}, ⊥)

such that

S i m_{t} (S_{t}, \cap)

and

V i e w_{P_{t}}^{\prod_{t}}

and

S i m_{i} (S_{i}, ⊥)

and

V i e w_{P_{i}}^{\prod_{t}}

are computationally indistinguishable, where

i \in {1, \dots, t - 1}

,

\prod_{t}

securely computes the function

f (S)

.

\begin{matrix} S i m_{t} (S_{t}, \cap) & ≃ V i e w_{P_{t}}^{\prod_{t}} \\ S i m_{i} (S_{i}, ⊥) & ≃ V i e w_{P_{i}}^{\prod_{t}} \end{matrix}

(14)

From Equation (14), it is clear that such a secure multiparty computation protocol

\prod_{t}

is secure in the presence of a semi-honest adversary.

3. Our Multiparty Private Set Intersection Protocol

In the protocol, we used Bloom filters to reduce the computational complexity, ElGamal multiplicative homomorphic encryption to achieve message confidentiality, and Shamir secret sharing and Lagrange interpolation to apply to multiparty scenarios.

3.1. Protocol Description

This section presents our new MPSI protocol, which is based on the Bloom filter and a threshold ElGamal encryption scheme.

Firstly, the protocol uses an efficient data structure called the Bloom filter, which was also utilized in [27,28], to minimize the computation on both the clients and the server. Secondly, the protocol adopts a relatively efficient ElGamal encryption algorithm instead of the Paillier algorithm. The ElGamal encryption algorithm cannot be used to encrypt 0, but we can avoid this issue by randomizing the 0s in the Bloom filter. Thirdly, in order to be suitable for multiparty environments, a threshold ElGamal encryption scheme was designed by incorporating the concept of the Shamir threshold secret-key-sharing scheme. This scheme divides the private key into n copies for n participants and requires no less than t participants to decrypt the data accurately. Finally, we incorporated time-consuming encryptions into a preprocessing phase, which facilitates efficient execution during the online interaction stage.

The protocol is shown in Figure 1, which is described as follows:

Data input: Participant

P_{i}

’s dataset

S_{i}

,

i \in {1, 2, \dots, t}

.

Data output: The intersection set

S = {S_{1} \cap S_{2} \cap \dots \cap S_{t}}

.

Initialization:

(1) A trusted third-party generates an ElGamal public–private key pair

(p k, s k)

and a

(t - 1)

-th polynomial

f (x) = s k + a_{1} x + \dots + a_{t - 1} x^{t - 1}

for

s k

, then distributes

p k

to

P_{1}, . . ., P_{t}

and

s k_{i} = f (i)

to

P_{i}

,

i \in {1, 2, \dots, t}

.

(2) Then, he/she picks k hash functions

h_{1}, \dots, h_{k}

and sends them to participants

P_{1}, \dots, P_{t}

.

Preprocessing stage:

(1) Each client

P_{i}

,

i \in {1, 2, \dots, t - 1}

, generates a hash-mapped Bloom filter

B F_{i}

for its own private set

S_{i}

, which is then randomized to obtain

R B F_{i}

, and finally,

E R B F_{i}

is computed by encrypting each value in the randomized Bloom filter

R B F_{i}

using public key

p k

.

(2) For each

j \in {1, 2, \dots, n_{t}}

, the server

P_{t}

generates a random number

r_{j}

and, then, calculates

w_{j} = E n c (r_{j}) \times E n c (S_{t} [j])

m o d

p.

Online stage: The server

P_{t}

receives the encrypted randomized Bloom filters

{E R B F_{1}, E R B F_{2}, \dots, E R B F_{t - 1}}

and starts the intersection calculation.

(1) i. For each of the data

S_{t} [j]

of the server

P_{t}

, the hash value

h_{u} (S_{t} [j])

is computed, where

j \in {1, \dots, n_{t}}

,

u \in {1, \dots, k}

. Eventually,

n_{t}

sets of data are obtained, each with k values, as follows:

\begin{matrix} {h_{1} (S_{t} [1]) & \dots & h_{u} (S_{t} [1]) & \dots & h_{k} (S_{t} [1])} \\ ⋮ & ⋮ & ⋮ \\ {h_{1} (S_{t} [j]) & \dots & h_{u} (S_{t} [j]) & \dots & h_{k} (S_{t} [j])} \\ ⋮ & ⋮ & ⋮ \\ {h_{1} (S_{t} [n_{t}]) & \dots & h_{u} (S_{t} [n_{t}]) & \dots & h_{k} (S_{t} [n_{t}])} \end{matrix}

(15)

ii. Substitute

h_{u} (S_{t} [j])

into each client’s encrypted randomized Bloom filter

E R B F_{i}

to obtain

c_{j, i}^{u}

, where

i \in {1, \dots, t - 1}

,

j \in {1, \dots, n_{t}}

,

u \in {1, \dots, k}

and

c_{j, i}^{u} = E R B F_{i} [h_{u} (S_{t} [j])]

, as follows:

\begin{matrix} {{c_{1, 1}^{1}, \dots, c_{1, 1}^{k}} & \dots & {c_{1, i}^{1}, \dots, c_{1, i}^{k}} & \dots & {c_{1, t - 1}^{1}, \dots, c_{1, t - 1}^{k}}} \\ ⋮ & ⋮ & ⋮ \\ {{c_{j, 1}^{1}, \dots, c_{j, 1}^{k}} & \dots & {c_{j, i}^{1}, \dots, c_{j, i}^{k}} & \dots & {c_{j, t - 1}^{1}, \dots, c_{j, t - 1}^{k}}} \\ ⋮ & ⋮ & ⋮ \\ {{c_{n_{t}, 1}^{1}, \dots, c_{n_{t}, 1}^{k}} & \dots & {c_{n_{t}, i}^{1}, \dots, c_{n_{t}, i}^{k}} & \dots & {c_{n_{t}, t - 1}^{1}, \dots, c_{n_{t}, t - 1}^{k}}} \end{matrix}

(16)

iii. For each set of data in Equation (16), a homomorphic multiplication is performed to compute

c_{j, i} = c_{j, i}^{1} \times \dots \times c_{j, i}^{k}

, where

i \in {1, \dots, t - 1}

,

j \in {1, \dots, n_{t}}

. The resulting data are as follows:

\begin{matrix} {c_{1, 1} & \dots & c_{1, i} & \dots & c_{1, t - 1}} \\ ⋮ & ⋮ & ⋮ \\ {c_{j, 1} & \dots & c_{j, i} & \dots & c_{j, t - 1}} \\ ⋮ & ⋮ & ⋮ \\ {c_{n_{t}, 1} & \dots & c_{n_{t}, i} & \dots & c_{n_{t}, t - 1}} \end{matrix}

(17)

iv. For each set of data in Equation (17), homomorphic multiplication is performed to compute

c_{j} = c_{j, 1} \times \dots \times c_{j, t - 1} \times w_{j}

, where

j \in {1, \dots, n_{t}}

.

(2)

P_{i}

performs the computation of the shareddecryption after obtaining the data

c_{j}

, which is computed as follows:

s h_{j, i} = {(c_{j})}^{Δ_{i} \cdot s k_{i}}

(18)

where

Δ_{i} = \prod_{j^{^{'}} = 1, j^{^{'}} \neq i}^{t} \frac{j^{^{'}}}{j^{^{'}} - i}

,

i \in {1, \dots, t}

,

j \in {1, 2, \dots, n_{t}}

.

(3)

P_{t}

performs the combination computation, which is denoted as

C o m b (s h_{j, 1}, \dots, s h_{j, t})

. Then,

D e c (c_{j})

can be obtained by

C o m b (s h_{j, 1}, \dots, s h_{j, t})

. If

D e c (c_{j}) = w_{j} = S_{t} [j] {\dot{r}}_{j}

, the server adds the corresponding

S_{t} [j]

to the intersection

S = {S_{t} [j]} \cup S

. The range of j above is

{1, 2, \dots, n_{t}}

.

3.2. Protocol Correctness

In the protocol, the server obtains the final intersection result, from which we start to illustrate the correctness of our protocol.

For

j \in {1, 2, \dots, n_{t}}

:

\begin{matrix} D e c (c_{j}) & = D e c (c_{1, j} \times \dots \times c_{t - 1, j} \times w_{j}) \\ = D e c ((c_{j, 1}^{1} \times \dots \times c_{j, 1}^{k}) \times \dots \times (c_{j, t - 1}^{1} \times \dots \times c_{j, t - 1}^{k}) \\ \times E n c (r_{j}) \times E n c (S_{t} [j])) \\ = D e c ((E R B F_{1} (h_{1} (S_{t} [j])) \times \dots \times E R B F_{1} (h_{k} (S_{t} [j]))) \times \dots \\ \times (E R B F_{t - 1} (h_{1} (S_{t} [j])) \times \dots \times E R B F_{t - 1} (h_{k} (S_{t} [j]))) \\ \times E n c (r_{j}) \times E n c (S_{t} [j])) \\ = (R B F_{1} (h_{1} (S_{t} [j])) \times R B F_{1} (h_{k} (S_{t} [j]))) \times \dots \\ \times (R B F_{t - 1} (h_{1} (S_{t} [j])) \times R B F_{t - 1} (h_{k} (S_{t} [j]))) \times r_{j} \times S_{t} [j] \end{matrix}

If there is

D e c (c_{j}) = r_{j} \times S_{t} [j]

m o d

p, then

\forall i \in {1, \dots, t - 1}, B F_{i} (h_{1} (S_{t} [j])) \times \dots \times B F_{i} (h_{k} (S_{t} [j])) = 1

(19)

In this case, by the nature of Bloom filters, we know that

S_{t} [j]

is in the sets of all {

S_{1}, \dots, S_{t - 1}

}, i.e.,

S_{t} [j]

is an intersection element of the participants {

P_{1}, \dots, P_{t}

}.

If there is

D e c (c_{j}) \neq r_{j} \times S_{t} [j]

m o d

p, then

\exists i \in {1, \dots, t - 1}, R B F_{i} (h_{1} (S_{t} [j])) \times \dots \times R B F_{i} (h_{k} (S_{t} [j])) \neq 1

(20)

In this case, there exists

i \in {1, . . ., t - 1}

such that the values in the randomized Bloom filter

R B F_{i}

that

S_{t} [j]

maps to are not all 1, i.e.,

S_{t} [j]

is not an intersection element of the participants {

P_{1}, \dots, P_{t}

}.

In summary, we can determine whether an element belongs to the intersection by using the nature of the Bloom filter. Therefore, the protocol for secure computation of the intersection of non-equilibrium multiparty sets based on the threshold ElGamal encryption scheme is correct.

3.3. Security Analysis

We employed the widely accepted simulation paradigm [40] to demonstrate the security of the new protocol. The essence of the simulation paradigm is that an actual multiparty computation protocol is considered secure if the participants do not gain more information from it than they would from an ideal protocol.

We simulated the protocols for two scenarios: one where the adversary-controlled participant includes the server and another where the adversary-controlled participant does not include the server. We demonstrate that the information obtained from the actual protocol in both scenarios is computationally indistinguishable from that obtained from the ideal protocol, assuming a semi-honest adversary model.

Theorem 1.

If the threshold ElGamal encryption scheme in Section 2.2.5 is secure, then the unbalanced MPSI protocol ∏ in this paper is secure in the case of semi-honest adversaries.

Proof.

To analyze the security of our protocol, we assumed that there are ℓ participants controlled by an adversary, where

ℓ < t

. In the execution of our protocol, there were t participants, and we only considered the case where

ℓ < t

. If all participants are controlled by the adversary, the analysis is meaningless. We divided the analysis into two cases: one is that the adversary controls ℓ clients and the server

P_{t}

is not among the ℓ participants controlled by the adversary; the other is that the server

P_{t}

is among the ℓ participants controlled by the adversary, and the adversary controls

ℓ - 1

clients and the server

P_{t}

.

Scenario 1: Server

P_{t}

is not among the ℓ participants controlled by the adversary.

Suppose there are ℓ participants

P_{1}, \dots, P_{ℓ}

is controlled by the adversary, and the adversary can access the inputs of these ℓ participants, as well as the intermediates generated by the computation. From the Shamir threshold secret-sharing scheme, it is clear that, when

ℓ < t

, the private key corresponding to the public key cannot be inferred from the information of the ℓ participants in our protocol. We denote the participants’,

P_{1}, \dots, P_{ℓ}

, views of the real protocol as

V i e w_{R N}^{\prod}

, shown in Equation (21), where

{S_{1}, \dots, S_{ℓ}}

denote the original datasets of the participants

{P_{1}, \dots, P_{ℓ}}

,

{E R B F_{1}, \dots, E R B F_{ℓ}}

are the encrypted randomized Bloom filters, and finally,

{s h_{1}, \dots, s h_{ℓ}}

denote the intermediate results of the computation.

V i e w_{R N}^{\prod} = ({S_{1}, \dots, S_{ℓ}}, {E R B F_{1}, \dots, E R B F_{ℓ}}, {s h_{1}, \dots, s h_{ℓ}})

(21)

Construct a simulator

S i m_{N}

to simulate a situation where the adversary controls ℓ participants, with inputs from these ℓ participants, and performs the following steps:

(1) Create an empty view

S i m_{N} ({S_{1}, \dots, S_{ℓ}}, ⊥)

, and add the sets

{S_{1}, \dots, S_{ℓ}}

and the encrypted randomized Bloom filters

{E R B F_{1}, \dots, E R B F_{ℓ}

} to the view.

(2) Simulate the inputs of the participants

{P_{ℓ + 1}, \dots, P_{t}}

, and create random sets

{S_{ℓ + 1}^{^{'}}, \dots, S_{t - 1}^{^{'}}}

and a random set

S_{t}^{^{'}}

containing

n_{t}

elements.

(3) Use the sets

{S_{ℓ + 1}^{^{'}}, \dots, S_{t - 1}^{^{'}}}

to generate the Bloom filters

{B F_{ℓ + 1}^{^{'}}, \dots, B F_{t - 1}^{^{'}}}

and the encrypted randomized Bloom filters

{E R B F_{ℓ + 1}^{^{'}}, \dots, E R B F_{t - 1}^{^{'}}}

.

(4) Generate random values

r_{j}^{^{'}}

, then compute

w_{j}^{^{'}}

, where

j \in {1, 2, \dots, n_{t}}

.

w_{j}^{^{'}} = E n c (r_{j}^{^{'}}) \times E n c (S_{t} {[j]}^{^{'}}) m o d p

(22)

(5) For each

S_{t} {[j]}^{^{'}}

,

j \in {1, 2, \dots, n_{t}}

, and each

E R B F_{i}

(

i \in {1, . . ., ℓ}

) and each

E R B F_{i}^{'}

(

i \in {ℓ + 1, . . ., t - 1}

), compute

{{(c_{j, i}^{1})}^{^{'}}, \dots, {(c_{j, i}^{k})}^{^{'}}}

and

c_{j, i}^{^{'}}

:

c_{j, i}^{^{'}} = {(c_{j, i}^{1})}^{^{'}} \times \dots \times {(c_{j, i}^{k})}^{^{'}} m o d p

(23)

(6) For

j \in {1, 2, \dots, n_{t}}

, yield

c_{j}^{^{'}}

by a homomorphic multiplication operation on all the mapped values in the encrypted Bloom filters corresponding to

S_{t} {[j]}^{^{'}}

.

c_{j}^{^{'}} = c_{j, 1}^{^{'}} \times \dots \times c_{j, t - 1}^{^{'}} \times w_{j}^{^{'}} m o d p

(24)

(7) Compute

s h_{i}^{^{'}} = {s h_{1, i}^{^{'}}, \dots, s h_{n_{t}, i}^{^{'}}}

by the shareddecryption of

c_{j}^{^{'}}

with private key sharing

s k_{i}

, which is expressed as Equation (25), where

Δ_{i} = \prod_{j^{^{'}} = 1, j^{^{'}} \neq i}^{t} \frac{j^{^{'}}}{j^{^{'}} - i}

,

i \in {1, \dots, ℓ}

,

j \in {1, 2, \dots, n_{t}}

.

s h_{j, i}^{^{'}} = {(c_{j}^{^{'}})}^{Δ_{i} \cdot s k_{i}} m o d p

(25)

(8) Insert

{s h_{1}^{^{'}}, \dots, s h_{ℓ}^{^{'}}}

into the view. Thus, the view of simulator

S i m_{N}

is:

\begin{matrix} S i m_{N} ({S_{1}, \dots, S_{ℓ}}, ⊥) = & ({S_{1}, \dots, S_{ℓ}}, {E R B F_{1}, \dots, E R B F_{ℓ}}, \\ {s h_{1}^{^{'}}, \dots, s h_{ℓ}^{^{'}}}) \end{matrix}

(26)

Because the simulator

S i m_{N}

controls only ℓ participants other than the server, it can only obtain the key shares of these ℓ participants, and the joint computation of the data in the threshold ElGamal encryption requires at least t participants’ shares of decrypted data, so the simulator

S i m_{N}

cannot perform the joint computation in the threshold ElGamal to obtain the final decryption result.

V i e w_{R N}^{\prod} = ({S_{1}, \dots, S_{ℓ}}, {E R B F_{1}, \dots, E R B F_{ℓ}}, {s h_{1}, \dots, s h_{ℓ}})

represents the view of the real protocol with participants

P_{1}, \dots, P_{ℓ}

, and

S i m_{N} ({S_{1}, \dots, S_{ℓ}}, ⊥) = ({S_{1}, \dots, S_{ℓ}}, {E R B F_{1}, \dots, E R B F_{ℓ}}, {s h_{1}^{^{'}}, \dots, s h_{ℓ}^{^{'}}})

represents the view of the simulator

S i m_{N}

, where

{S_{1}, \dots, S_{ℓ}}

and

{E R B F_{1}, \dots, E R B F_{ℓ}}

are the same. Since the threshold ElGamal encryption algorithm and random values are used,

{s h_{1}, \dots, s h_{ℓ}}

and

{s h_{1}^{^{'}}, \dots, s h_{ℓ}^{^{'}}}

are computationally indistinguishable. Thus,

V i e w_{R N}^{\prod} ≃ S i m_{N} ({S_{1}, \dots, S_{ℓ}}, ⊥)

(27)

Scenario 2: Server

P_{t}

is among the ℓ participants controlled by the adversary.

Suppose that clients {

P_{1}, \dots, P_{ℓ - 1}

} and server

P_{t}

are controlled by the adversary and the adversary can access the inputs and outputs of these ℓ participants. As in case 1, it is known that, when

ℓ < t

, the private key corresponding to the public key cannot be inferred from the information of the ℓ participants. In the real protocol, the views of these ℓ participants are represented as

V i e w_{R Y}^{\prod}

, as in Equation (28), where

{S_{1}, \dots, S_{ℓ - 1}}

denote the input sets of clients {

P_{1}, \dots, P_{ℓ - 1}

},

S_{t}

is the original set of server

P_{t}

,

{E R B F_{1}, \dots, E R B F_{ℓ - 1}}

are the encrypted randomized Bloom filters,

{s h_{1}, \dots, s h_{ℓ - 1}, s h_{t}}

denote the intermediate results of the computation, and ∩ is the final intersection obtained by server

P_{t}

.

\begin{matrix} V i e w_{R Y}^{\prod} = & ({S_{1}, \dots, S_{ℓ - 1}, S_{t}}, {E R B F_{1}, \dots, E R B F_{ℓ - 1}}, \\ {s h_{1}, \dots, s h_{ℓ - 1}, s h_{t}}, \cap) \end{matrix}

(28)

Construct a simulator

S i m_{Y}

to simulate a situation where the adversary controls ℓ participants, including the server, with inputs and outputs from these ℓ participants, and performs the following steps:

(1) Create an empty view

S i m_{Y} ({S_{1}, \dots, S_{ℓ - 1}, S_{t}}, \cap)

, and add the sets

{S_{1}, . . ., S_{ℓ - 1}, S_{t}}

, the encrypted randomized Bloom filters

{E R B F_{1}, . . ., E R B F_{ℓ - 1}, E R B F_{t}}

, and the intersection “∩” obtained by the server to the view.

(2) Simulate the inputs of the clients {

P_{ℓ}, \dots, P_{t - 1}

}, and create random sets

{S_{ℓ}^{^{'}}, \dots, S_{t - 1}^{^{'}}}

that satisfy the intersection with the server as “∩”.

(3) Use the sets

{S_{ℓ}^{^{'}}, \dots, S_{t - 1}^{^{'}}}

to generate the Bloom filters

{B F_{ℓ}^{^{'}}, \dots, B F_{t - 1}^{^{'}}}

and the encrypted randomized Bloom filters

{E R B F_{ℓ}^{^{'}}, \dots, E R B F_{t - 1}^{^{'}}}

.

(4) Generate random values

r_{j}^{^{'}}

, and calculate

w_{j}^{^{'}}

, where

j \in {1, 2, \dots, n_{t}}

.

w_{j}^{^{'}} = E n c (r_{j}^{^{'}}) \times E n c (S_{t} [j]) m o d p

(29)

(5) For each

S_{t} [j]

(

j \in {1, 2, \dots, n_{t}}

), each

E R B F_{i}

(

i \in {1, . . ., ℓ - 1}

), and each

E R B F_{i}^{'}

(

i \in {ℓ, . . ., t - 1}

), compute

{c_{1}^{j, i^{'}}, \dots, c_{k}^{j, i^{'}}}

and

c_{j, i}^{^{'}}

:

c_{j, i}^{^{'}} = {(c_{j, i}^{1})}^{^{'}} \times \dots \times {(c_{j, i}^{k})}^{^{'}} m o d p

(30)

(6) For

j \in {1, 2, \dots, n_{t}}

, compute

c_{j}^{^{'}}

.

c_{j}^{^{'}} = c_{j, 1}^{^{'}} \times \dots \times c_{j, t - 1}^{^{'}} \times w_{j}^{^{'}} m o d p

(31)

(7) Compute

s h_{i}^{^{'}} = {s h_{1, i}^{^{'}}, \dots, s h_{n_{t}, i}^{^{'}}}

by the shared decryption of

c_{j}^{^{'}}

with private key sharing

s k_{i}

, which is expressed as Equation (32), where

Δ_{i} = \prod_{j^{^{'}} = 1, j^{^{'}} \neq i}^{t} \frac{j^{^{'}}}{j^{^{'}} - i}

,

i \in {1, \dots, ℓ - 1, t}

,

j \in {1, 2, \dots, n_{t}}

.

s h_{j, i}^{^{'}} = {(c_{j}^{^{'}})}^{Δ_{i} \cdot s k_{i}} m o d p

(32)

(8) Insert

{s h_{1}^{^{'}}, \dots, s h_{ℓ - 1}^{^{'}}, s h_{t}^{^{'}}}

into the view. Thus, the view of the simulator

S i m_{Y}

is:

\begin{matrix} S i m_{Y} ({S_{1}, \dots, S_{ℓ - 1}, S_{t}}, \cap) = & ({S_{1}, \dots, S_{ℓ - 1}, S_{t}}, \\ {E R B F_{1}, \dots, E R B F_{ℓ - 1}}, & {s h_{1}^{^{'}}, \dots, s h_{ℓ - 1}^{^{'}}, s h_{t}^{^{'}}}) \end{matrix}

(33)

Because the simulator controls ℓ participants, it can only obtain the key shares of these ℓ participants, but the joint computation of the data in the threshold ElGamal encryption requires at least t participants’ shares, so the simulator cannot perform the joint computation of the threshold ElGamal encryption and cannot obtain the final decryption results. In this case, the simulator can obtain the final intersection because it controls server

P_{t}

.

In the view

V i e w_{R Y}^{\prod}

, as well as in the simulator’s view

S i m_{Y} ({S_{1}, \dots, S_{ℓ - 1}, S_{t}}, \cap)

,

{S_{1}, \dots, S_{ℓ - 1}, S_{t}}

and the encrypted Bloom filters

{E R B F_{1}, \dots, E R B F_{ℓ - 1}}

are identical.

{s h_{1}, \dots, s h_{ℓ - 1}}

and

{s h_{1}^{^{'}}, \dots, s h_{ℓ - 1}^{^{'}}, s h_{t}^{^{'}}}

are computationally indistinguishable due to the security of the threshold ElGamal encryption algorithm. Thus,

V i e w_{R Y}^{\prod} ≃ S i m_{Y} ({S_{1}, \dots, S_{ℓ - 1}, S_{t}}, \cap)

(34)

From the above two scenarios, it can be seen that, in the face of a semi-honest adversary who controls ℓ participants when

ℓ < t

, the adversary cannot infer the inputs of the honest participants through the intermediate process and the final result, so the proposed protocol in this paper is secure in the face of a semi-honest adversary. □

4. Evaluation and Results’ Discussion

This section presents the C++ code implementation of the protocol in the thesis and a comparative analysis with Bay’s protocol [28].

4.1. Protocol Implementation

We implemented our protocol in C++ on a Linux platform and compared it with the related protocol [28]. In our experiments, we set the false probability

P_{e r r}

of the Bloom filter to

2^{- 30}

and set the length m of the Bloom filter to

- s i z e \cdot l o g_{2} e \cdot l o g_{2} P_{e r r}

, as well as the number of hash functions k to

- l o g_{2} P_{e r r}

, where

s i z e

is the server’s set size. For the ElGamal encryption used in the protocol, we used the security parameter of

k = 1024

bit.

4.2. Performance Analysis

We compared our MPSI protocol with the related protocol [28] by executing it multiple times in the same environment and calculating the average value. There were three experiments: Experiment 1, Experiment 2, and Experiment 3. In order to ensure the accuracy of the test, we modified the multithreaded computation in the source code of [28] to a single-threaded one. Additionally, we did not measure the time it took to generate encrypted Bloom filters for the clients of both protocols.

4.2.1. Experiment 1

The main difference between our protocol and [28] is the encryption algorithm. The Paillier encryption algorithm was used in [28], while we adopted the ElGamal encryption algorithm. Firstly, with the same security parameters, the ElGamal encryption algorithm produces two ciphertexts, while the Paillier encryption algorithm produces only one ciphertext. However, the ciphertexts produced by the Paillier encryption algorithm are twice as long as those produced by the ElGamal encryption algorithm. Therefore, both algorithm have the same communication complexity.

Secondly, we conducted experiments with these two algorithms using the NTL [41] library. We set the value of the encrypted data to

10^{7}

and chose the security parameter of

k = 1024

bit for public key encryption. The size of the dataset was increased from

10^{2}

to

10^{5}

, and the encryption process used the same random value. The experimental results are presented in Table 2, which demonstrate that the ElGamal encryption algorithm is more efficient than the Paillier encryption algorithm. The former encrypts approximately twice as fast as the latter, and the former decrypts about four-times as fast as the latter.

4.2.2. Experiment 2

We set the amount of data for the client and server to

2^{8}

and increased the number of participants from

2^{4}

to

2^{9}

for these experiments, which aimed at testing the runtime of individual clients and the server. The experimental results are shown in Table 3 and Figure 2. From the results, it is evident that the runtime of the clients remained constant in both our protocol and the protocol [28], regardless of the increase in the number (t) of participants. Moreover, in our protocol, the client’s runtime was approximately

1 / (2^{l o g (t)})

of the server’s runtime when

t \geq 2^{4}

. Our protocol ran more efficiently than the protocol [28]. This is mainly because the clients in [28] performed the

S h D e c 0 ()

operation, which involves exponential operations on a large integer

n_{t}

with large random values. The client

P_{i}

in our protocol needs to compute

Δ_{i} = \prod_{j^{^{'}} = 1, j^{^{'}} \neq i}^{t} \frac{j^{^{'}}}{j^{^{'}} - i}

only once and

s h_{j, i} = {(c_{j})}^{Δ_{i} \cdot s k_{i}}

n_{t}

times, and the computation related to the number of participants is only once for

Δ_{i}

, which is insignificant in the whole runtime of the client.

4.2.3. Experiment 3

We fixed the size of clients’ datasets at

2^{8}

and the number of participants at

2^{5}

. Additionally, we increased the size of the server’s set from

2^{10}

to

2^{14}

for the experiment to test the runtime of the clients and the server. The experimental results are shown in Table 4 and Figure 3. It can be seen that the client–server runtime of both our protocol and [28] was linearly related to the size of the server dataset. But, our protocol had a significant improvement in efficiency compared to the protocol of Bay et al. [28], and the runtimes of the server and clients were approximately

1 / 6

of [28].

4.3. Discussion

From the analysis and comparison of the above experiments, we can conclude that our protocol offers the following advantages:

(1) In our protocol, the number (t) of participants has almost no impact on the computation and runtime of the client. When

t \geq 2^{4}

, the client’s runtime was about

1 / (2^{l o g (t)})

of the server’s runtime. Therefore, our protocol is very suitable for unbalanced scenarios with a large number of participants.

(2) Compared to the typical related protocol [28], our protocol demonstrated a significant improvement in efficiency. The sever and client runtimes were approximately

1 / 6

of [28].

5. Conclusions

This paper proposed an MPSI protocol based on the Bloom filter and Shamir threshold secret-sharing scheme, which is highly suitable for unbalanced scenarios with a large number of participants. Compared to the typical related protocol [28], our protocol demonstrated a significant improvement in efficiency. The server’s and clients’ runtimes were approximately

1 / 6

of [28]. Extending the approach to the model of malicious adversaries is our future work.

Author Contributions

Conceptualization, O.R., C.Y., J.Z. and C.A.; methodology, O.R., C.Y., J.Z. and C.A.; software, C.Y. and C.A.; validation, O.R., C.Y., J.Z. and C.A.; formal analysis, O.R., C.Y., J.Z. and C.A.; investigation, O.R., C.Y., J.Z. and C.A.; resources, O.R., C.Y., J.Z. and C.A.; data curation, O.R., C.Y., J.Z. and C.A.; writing—original draft preparation, C.Y. and C.A.; writing—review and editing, O.R. and J.Z.; visualization, O.R., C.Y., J.Z. and C.A.; supervision, O.R. and J.Z.; project administration, O.R.; funding acquisition, O.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Foundation of China under Grant 62202146 and Enterprise Technology Innovation Development Project of Hubei Province of China Grant Number 2021BAB009.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; nor in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

MPSI	Multiparty Private Set Intersection
TIFS	IEEE Transactions on Information Forensics & Security

References

Demmler, D.; Rindal, P.; Rosulek, M.; Trieu, N. PIR-PSI: Scaling Private Contact Discovery. Proc. Priv. Enhancing Technol. 2018, 4, 159–178. [Google Scholar] [CrossRef]
Nagy, M.; De Cristofaro, E.; Dmitrienko, A.; Asokan, N.; Sadeghi, A.-R. Do i know you? Efficient and privacy-preserving common friend-finder protocols and applications. In Proceedings of the 29th Annual Computer Security Applications Conference, New Orleans, LA, USA, 9–13 December 2013; pp. 159–168. Available online: https://ia.cr/2013/620 (accessed on 15 May 2023).
Yuan, X.; Wang, X.; Wang, C.; Squicciarini, A.; Ren, K. Enabling privacy-preserving image-centric social discovery. In Proceedings of the 2014 IEEE 34th International Conference on Distributed Computing Systems, Madrid, Spain, 30 June–3 July 2014; pp. 198–207. [Google Scholar] [CrossRef]
Kim, S.P.; Gil, M.S.; Kim, H.; Choi, M.-J.; Moon, Y.-S.; Won, H.-S. Efficient two-step protocol and its discriminative feature selections in secure similar document detection. Secur. Commun. Netw. 2017, 2017, 6841216. [Google Scholar] [CrossRef]
Phuong, T.T. Privacy-preserving deep learning via weight transmission. IEEE Trans. Inf. Forensics Secur. 2019, 14, 3003–3015. [Google Scholar] [CrossRef]
Fischlin, M.; Pinkas, B.; Sadeghi, A.R.; Schneider, T.; Visconti, I. Secure set intersection with untrusted hardware tokens. In Proceedings of the CT-RSA 2011, LNCS, San Francisco, CA, USA, 14–18 February 2011; Volume 6558, pp. 1–16. [Google Scholar] [CrossRef]
Bogdanov, D.; Niitsoo, M.; Toft, T.; Willemson, J. High-performance secure multi-party computation for data mining applications. Int. J. Inf. Secur. 2012, 11, 403–418. [Google Scholar] [CrossRef]
Wang, Y.-W.; Wu, J.-L. A Privacy-Preserving Symptoms Retrieval System with the Aid of Homomorphic Encryption and Private Set Intersection Schemes. Algorithms 2023, 16, 244. [Google Scholar] [CrossRef]
Fan, C.; Jia, P.; Lin, M.; Wei, L.; Guo, P.; Zhao, X.; Liu, X. Cloud-Assisted Private Set Intersection via Multi-Key Fully Homomorphic Encryption. Mathematics 2023, 11, 1784. [Google Scholar] [CrossRef]
Resenede, A.C.D.; de Freitas Aranha, D. Faster unbalanced Private Set Intersection in the semi-honest setting. J. Cryptogr. Eng. 2021, 11, 21–38. [Google Scholar] [CrossRef]
Falk, B.H.; Noble, D.; Ostrovsky, R. Private set intersection with linear communication from general assumptions. In Proceedings of the 18th ACM Workshop on Privacy in the Electronic Society. London: Association for Computing Machinery, London, UK, 11 November 2019; pp. 14–25. [Google Scholar] [CrossRef]
Le, P.H.; Ranellucci, S.; Gordon, S.D. Two-party private set intersection with an untrusted third party. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 2403–2420. [Google Scholar] [CrossRef]
Ciampi, M.; Orlandi, C. Combining private set-intersection with secure two-party computation. In Security and Cryptography for Networks (SCN 2018); Catalano, D., De Prisco, R., Eds.; Lecture Notes in Computer Science; Springer: Amalfi, Italy, 2018; Volume 11035, pp. 464–482. [Google Scholar]
Wang, Z.S.; Banawan, K.; Ulukus, S. Multi-party private set intersection: An information-theoretic approach. IEEE J. Sel. Areas Inf. Theory 2021, 2, 366–379. [Google Scholar] [CrossRef]
Debnath, S.K.; Sakurai, K.; Dey, K.; Kundu, N. Secure outsourced private set intersection with linear complexity. In Proceedings of the 2021 IEEE Conference on Dependable and Secure Computing (DSC), Aizuwakamatsu, Japan, 30 January–2 February 2021; pp. 1–8. [Google Scholar] [CrossRef]
Blanton, M.; Aguiar, E. Private and Oblivious Set and Multiset Operations; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar] [CrossRef]
Chen, H.; Huang, Z.; Laine, K.; Rindal, P. Labeled PSI from fully homomorphic encryption with malicious security. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 1223–1237. [Google Scholar] [CrossRef]
Chen, H.; Laine, K.; Rindal, P. Fast private set intersection from homomorphic encryption. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, New York, NY, USA, 30 October–3 November 2017; pp. 1243–1255. [Google Scholar] [CrossRef]
Lv, S.; Ye, J.; Yin, S.; Cheng, X.; Feng, C.; Liu, X.; Li, R.; Li, Z.; Liu, Z.; Zhou, L. Unbalanced private set intersection cardinality protocol with low communication cost. Future Gener. Comput. Syst. 2020, 102, 1054–1061. [Google Scholar] [CrossRef]
Ma, J.P.K.; Chow, S.S.M. Secure-Computation-Friendly Private Set Intersection from Oblivious Compact Graph Evaluation. In Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, Nagasaki, Japan, 30 May–3 June 2022; pp. 1086–1097. [Google Scholar] [CrossRef]
Resende, A.C.D.; Aranha, D.F. Faster unbalanced private set intersection. In Proceedings of the International Conference on Financial Cryptography and Data Security, Nieuwpoort, Curaçao, 26 February–2 March 2018; pp. 203–221. [Google Scholar] [CrossRef]
Freedman, M.J.; Nissim, K.; Pinkas, B. Efficient private matching and set intersection. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques, Interlaken, Switzerland, 2–6 May 2004; pp. 1–19. [Google Scholar] [CrossRef]
Kissner, L.; Song, D. Privacy-preserving set operations. In Proceedings of the 25th Annual International Cryptology Conference on Advances in Cryptology, Santa Barbara, CA, USA, 14–18 August 2005; pp. 241–257. [Google Scholar] [CrossRef]
Sang, Y.; Shen, H. Efficient and secure protocols for privacypreserving set operations. ACM Trans. Inf. Syst. Secur. 2009, 13, 1–35. [Google Scholar] [CrossRef]
Zhang, L.; He, C.; Wei, L. Efficient and malicious secure three-party private set intersection computation protocols for small sets. J. Comput. Res. Dev. 2022, 59, 2286–2298. [Google Scholar] [CrossRef]
Miyaji, A.; Nakasho, K.; Nishida, S. Privacy-preserving integration of medical data: A practical Multiparty Private Set Intersection. J. Med Syst. 2017, 41, 1–10. [Google Scholar] [CrossRef]
Davidson, A.; Cid, C. An efficient toolkit for computing private set operations. In Proceedings of the Information Security and Privacy: 22nd Australasian Conference, ACISP 2017, Auckland, New Zealand, 3–5 July 2017; Proceedings, Part II 22. Springer International Publishing: Berlin/Heidelberg, Germany, 2017; pp. 261–278. [Google Scholar] [CrossRef]
Bay, A.; Erkin, Z.; Hoepman, J.-H.; Samardjiska, S.; Vos, J. Practical Multi-Party Private Set Intersection Protocols. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1–15. [Google Scholar] [CrossRef]
Kolesnikov, V.; Matania, N.; Pinkas, B.; Rosulek, M.; Trieu, N. Practical multi-party private set intersection from symmetric-key techniques. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1257–1272. [Google Scholar] [CrossRef]
Kavousi, A.; Mohajeri, J.; Salmasizadeh, M. Efficient scalable multi-party private set intersection using oblivious PRF. In Proceedings of the 17th International Workshop on Security and Trust Management, Darmstadt, Germany, 8 October 2021; pp. 81–99. [Google Scholar] [CrossRef]
Inbar, R.; Omri, E.; Pinkas, B. Efficient scalable multiparty private set-intersection via garbled Bloom filters. In Proceedings of the 11th International Conference on Security and Cryptography for Networks, Amalfi, Italy, 5–7 September 2018; pp. 235–252. [Google Scholar] [CrossRef]
Zhang, E.; Liu, F.; Lai, Q.; Jin, G.; Li, Y. Efficient multi-party private set intersection against malicious adversaries. In Proceedings of the 2019 ACM SIGSAC Conference on Cloud Computing Security Workshop, London, UK, 11–15 November 2019; pp. 93–104. [Google Scholar] [CrossRef]
Ben-Efraim, A.; Nissenbaum, O.; Omri, E.; Paskin-Cherniavsky, A. PSImple: Practical multiparty maliciously-secure private set intersection. In Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, Nagasaki, Japan, 30 May–3 June 2022; pp. 1098–1112. [Google Scholar] [CrossRef]
Nevo, O.; Trieu, N.; Yanai, A. Simple, fast malicious Multiparty Private Set Intersection. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Seoul, Republic of Korea, 15–19 November 2021; pp. 1151–1165. [Google Scholar] [CrossRef]
Gordon, S.D.; Hazay, C.; Le, P.H. Fully Secure PSI via MPC-in-the-Head [EB/OL]. 2022. Available online: https://eprint.iacr.org/2022/379 (accessed on 15 May 2023).
ElGamal, T. A public key cryptosystem and a signature scheme based on discrete logarithms. IEEE Trans. Inf. Theory 1985, 31, 469–472. [Google Scholar] [CrossRef]
Bloom, B.H. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 1970, 13, 422–426. [Google Scholar] [CrossRef]
Dong, C.; Chen, L.; Wen, Z. When private set intersection meets big data: An efficient and scalable protocol. In Proceedings of the 2013 ACM SIGSAC conference on Computer & Communications Security, Berlin, Germany, 4–8 November 2013; pp. 789–800. [Google Scholar] [CrossRef]
Shamir, A. How to share a secret. Commun. ACM 1979, 22, 612–613. [Google Scholar] [CrossRef]
Lindell, Y. How to simulate it—A tutorial on the simulation proof technique. In Tutorials on the Foundations of Cryptography; Lindell, Y., Ed.; Information Security and Cryptography; Springer: Berlin/Heidelberg, Germany, 2017; pp. 277–346. [Google Scholar] [CrossRef]
Shoup, V. NTL: A Library for Doing Number Theory. [Online]. 2020. Available online: https://www.shoup.net/ntl/ (accessed on 15 May 2023).

Figure 1. Our Multiparty Private Set Intersection protocol.

Figure 2. Comparison for different MPSI protocols with different numbers of participants.

Figure 3. Comparison for different MPSI protocols with different volumes of the server’s dataset.

Table 1. Table of notations.

Notation	Meaning
p	p is a large prime number.
$Z_{p}^{*}$	$Z_{p}^{*}$ is a modulo-p multiplicative group.
$E n c (M)$	The result of encrypting the plaintext M.
$D e c (C)$	The result of decrypting the ciphertext C.
t	The number of participants in the protocol is t.
$P_{i}$	The i-th participant. $P_{1}, \dots, P_{t - 1}$ are the clients, and $P_{t}$ is the server.
$S_{i}$	Vector of datasets for participant $P_{i}$ , $i \in {1, 2, \dots, t}$ .
$n_{i}$	The dataset size of $S_{i}$ , $i \in {1, 2, \dots, t}$ .
$S_{i} [j]$	The j-th element in the dataset vector $S_{i}$ of participant $P_{i}$ , $i \in {1, 2, \dots, t}$ , $j \in {1, 2, \dots, n_{i}}$ .
m	The length of the Bloom filter.
k	The number of hash functions used by the Bloom filter.
$h_{u}$	The u-th hash function, $u \in {1, 2, \dots, k}$ .
$h_{u} (x)$	The hash value of x by using the $h_{u}$ that is in ${1, 2, \dots, m}$ , where $u \in {1, 2, \dots, k}$ .
$B F_{i}$	The Bloom filter obtained by mapping the dataset $S_{i}$ , $i \in {1, 2, \dots, t}$ .
$B F_{i} [l]$	The l-th bit of $B F_{i}$ , $i \in {1, 2, \dots, t}$ , $l \in {1, 2, \dots, m}$ .
$R B F_{i}$	The randomized Bloom filter obtained by randomizing $B F_{i}$ , $i \in {1, 2, \dots, t}$ .
$R B F_{i} [l]$	The l-th element of $R B F_{i}$ , $i \in {1, 2, \dots, t}$ , $l \in {1, 2, \dots, m}$ .
$E R B F_{i}$	The encrypted randomized Bloom filter obtained by encrypting $R B F_{i}$ , $i \in {1, 2, \dots, t}$ .
$E R B F_{i} [l]$	The l-th element of $E R B F_{i}$ , $i \in {1, 2, \dots, t}$ , $l \in {1, 2, \dots, m}$ .
$r a n d ()$	Generate a random number between 1 and $p - 1$ .
$p k$	The public key.
$s k$	The private key.
$s k_{i}$	The share of private key $s k$ distributed to the i-th participant.
$c_{j, i}^{u}$	$c_{j, i}^{u} = E R B F_{i} [h_{u} (S_{t} [j])]$ , where $i \in {1, 2, \dots, t}$ , $j \in {1, 2, \dots, n_{t}}$ , $u \in {1, 2, \dots, k}$ .
$C o m b (s h_{j, 1}, \dots, s h_{j, t})$	For $j \in {1, 2, \dots, n_{t}}$ , joint decryption on $(s h_{j, 1}, \dots, s h_{j, t})$ .
$Δ_{i}$	$Δ_{i} = \prod_{j^{^{'}} = 1, j^{^{'}} \neq i}^{t} \frac{j^{^{'}}}{j^{^{'}} - i}$ , where $i \in {1, 2, \dots, t}$ .
$s h_{j, i}$	$s h_{j, i} = {(c_{j})}^{Δ_{i} \cdot s k_{i}}$ , where $i \in {1, 2, \dots, t}$ , $j \in {1, 2, \dots, n_{t}}$ .
$s h_{i}$	$s h_{i} = {s h_{1, i}, \dots, s h_{j, i}, \dots, s h_{n_{t}, i}}$ , where $i \in {1, 2, \dots, t}$ .
×	Homomorphic multiplication calculations. All computations between ciphertexts in the article are homomorphic multiplicative computations.

Table 2. Comparison of Paillier encryption algorithm and ElGamal encryption algorithm (in seconds).

	Data Size	$10^{2}$	$10^{3}$	$10^{4}$	$10^{5}$
Algorithm		$10^{2}$	$10^{3}$	$10^{4}$	$10^{5}$
Paillier	encryption	0.502	4.158	39.910	328.851
Paillier	decryption	0.404	3.735	39.926	366.583
ElGamal	encryption	0.184	1.957	16.695	160.772
ElGamal	decryption	0.105	0.950	9.656	92.483

Select

k = 1024

bit security parameter for public key encryption. The value of the encrypted data was fixed at

10^{7}

. The encryption process used the same random number. The data volume increased from

10^{2}

to

10^{5}

.

Table 3. Comparison for different MPSI protocols with different numbers of participants (in seconds).

	No. of Participants	$2^{4}$	$2^{5}$	$2^{6}$	$2^{7}$	$2^{8}$	$2^{9}$
Protocols		$2^{4}$	$2^{5}$	$2^{6}$	$2^{7}$	$2^{8}$	$2^{9}$
Bay et al. [28]	Client	2.234	2.445	2.176	2.441	2.279	2.405
Bay et al. [28]	Server	3.881	5.913	10.202	15.242	25.371	41.615
Ours	Client	0.356	0.422	0.387	0.401	0.390	0.381
Ours	Server	0.389	0.785	1.504	3.350	6.845	12.807

The server and client dataset size was fixed at

2^{8}

. The number of participants gradually increased from

2^{4}

to

2^{9}

.

Table 4. Comparison for different MPSI protocols with different volumes of the server’s dataset (in seconds).

	Server Data Size	$2^{10}$	$2^{11}$	$2^{12}$	$2^{13}$	$2^{14}$
Protocols		$2^{10}$	$2^{11}$	$2^{12}$	$2^{13}$	$2^{14}$
Bay et al. [28]	Client	9.516	18.893	37.601	76.489	154.536
Bay et al. [28]	Server	22.962	43.629	87.133	166.917	345.266
Ours	Client	1.486	3.000	5.903	13.320	28.393
Ours	Server	3.096	6.078	13.614	27.983	56.617

The client’s dataset was fixed at

2^{8}

. The number of participants was fixed at

2^{5}

. The volume of the server dataset increased from

2^{10}

to

2^{14}

.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ruan, O.; Yan, C.; Zhou, J.; Ai, C. A Practical Multiparty Private Set Intersection Protocol Based on Bloom Filters for Unbalanced Scenarios. Appl. Sci. 2023, 13, 13215. https://doi.org/10.3390/app132413215

AMA Style

Ruan O, Yan C, Zhou J, Ai C. A Practical Multiparty Private Set Intersection Protocol Based on Bloom Filters for Unbalanced Scenarios. Applied Sciences. 2023; 13(24):13215. https://doi.org/10.3390/app132413215

Chicago/Turabian Style

Ruan, Ou, Changwang Yan, Jing Zhou, and Chaohao Ai. 2023. "A Practical Multiparty Private Set Intersection Protocol Based on Bloom Filters for Unbalanced Scenarios" Applied Sciences 13, no. 24: 13215. https://doi.org/10.3390/app132413215

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Practical Multiparty Private Set Intersection Protocol Based on Bloom Filters for Unbalanced Scenarios

Abstract

1. Introduction

1.1. Related Works

1.2. Our Contribution

2. Materials and Methods

2.1. Notations

2.2. Preliminaries

2.2.1. ElGamal Encryption Algorithm

2.2.2. Bloom Filters

2.2.3. Lagrange Interpolation

2.2.4. Shamir Threshold Secret-Sharing Scheme

2.2.5. Threshold ElGamal Encryption Scheme

2.2.6. Security Model

3. Our Multiparty Private Set Intersection Protocol

3.1. Protocol Description

3.2. Protocol Correctness

3.3. Security Analysis

4. Evaluation and Results’ Discussion

4.1. Protocol Implementation

4.2. Performance Analysis

4.2.1. Experiment 1

4.2.2. Experiment 2

4.2.3. Experiment 3

4.3. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI