Blockchain-Based Unbalanced PSI with Public Verification and Financial Security

Wang, Zhanshan; Ma, Xiaofeng

doi:10.3390/math12101544

Open AccessArticle

Blockchain-Based Unbalanced PSI with Public Verification and Financial Security

by

Zhanshan Wang

and

Xiaofeng Ma

^*

Department of Control Science and Engineering, Tongji University, Shanghai 201804, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(10), 1544; https://doi.org/10.3390/math12101544

Submission received: 4 April 2024 / Revised: 8 May 2024 / Accepted: 13 May 2024 / Published: 15 May 2024

(This article belongs to the Special Issue Applied Mathematics in Blockchain and Intelligent Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Private set intersection (PSI) enables two parties to determine the intersection of their respective datasets without revealing any information beyond the intersection itself. This paper particularly focuses on the scenario of unbalanced PSI, where the sizes of datasets possessed by the parties can significantly differ. Current protocols for unbalanced PSI under the malicious security model exhibit low efficiency, rendering them impractical in real-world applications. By contrast, most efficient unbalanced PSI protocols fail to guarantee the correctness of the intersection against a malicious server and cannot even ensure the client’s privacy. The present study proposes a blockchain-based unbalanced PSI protocol with public verification and financial security that enables the client to detect malicious behavior from the server (if any) and then generate an irrefutable and publicly verifiable proof without compromising its secret. The proof can be verified through smart contracts, and some economic incentive and penalty measures are executed automatically to achieve financial security. Furthermore, we implement the proposed protocol, and experimental results demonstrate that our scheme exhibits low online communication complexity and computational overhead for the client. At the same time, the size of the generated proof and its verification complexity are both

O (l o g n)

, enabling cost-effective validation on the blockchain.

Keywords:

blockchain; private set intersection; smart contract; RSA blind signature; public verification

MSC:

94A60

1. Introduction

Private set intersection (PSI) can be regarded as a special case within secure multi-party computation (SMPC) wherein two parties each hold a set of private data and desire to compute the intersection of these sets without disclosing any information outside the intersection to each other. PSI has been widely adopted across various real-world applications, including private contact discovery [1], private location-based services in the Internet of Vehicles [2], privacy-aware social network relationship inference [3], and privacy-protected password checks [4].

PSI protocols can be broadly classified into two distinct categories, which are differentiated by the relative sizes of the datasets held by the participating parties. In scenarios where both parties have datasets of roughly equivalent magnitude, this configuration is termed as ‘balanced’. Conversely, when there is a pronounced discrepancy in dataset sizes, with one party’s dataset being considerably smaller than the other’s, the scenario is characterized as ‘unbalanced’. This paper mainly focuses on unbalanced PSI, where the client—typically the party with the smaller dataset—often operates with constrained device resources, including limitations in storage capacity and computational power when juxtaposed with the server, which conventionally holds a larger dataset. Additionally, the inter-party communication may be subject to bandwidth constraints, further complicating the PSI process.

While some PSI protocols in the literature [5,6,7,8,9,10,11] extended from an oblivious transfer (OT) extension [12] and oblivious key-value storage (OKVS) structure [11] have achieved high computation efficiency, these typically involve substantial data transmission between a server and client and might necessitate multiple rounds of communication, which is suboptimal in unbalanced settings characterized by bandwidth constraints. The protocol of Jarecki and Liu [13] has better communication efficiency but often incurs considerable computational workload, which is particularly challenging for clients with weak devices. In contrast, most of the efficient unbalanced PSI protocols in the literature [1,14,15,16,17,18] are fully simulatable only in the model of semi-honest (also known as passive) adversaries or only achieve security in the presence of malicious (also known as active) adversaries with one-sided simulatability [19]. The former, where the parties are required to follow the protocol, is unrealistic. In the latter case, although all parties’ privacy is preserved, the correctness of the result cannot be guaranteed, which may cause losses to the parties receiving the intersection.

To get a trade-off between security and efficiency, several publicly verifiable covert (PVC) protocols have been proposed [20,21]. PVC protocols not only offer a heightened level of efficiency when compared to those designed for the malicious setting, but they also introduce a robust layer of security. They are capable of detecting an adversary’s malicious behavior with a certain degree of probability, which is a significant enhancement over the semi-honest model. Furthermore, these protocols have the added advantage of generating proofs that are publicly verifiable. This feature serves as a powerful deterrent against rational adversaries as it introduces the risk of their malicious actions being exposed, thereby potentially dissuading them from engaging in such behavior in practical applications. However, current PVC protocols are designed mainly for SMPC tasks rather than PSI scenarios and exhibit low efficiency when directly applied to PSI, particularly in unbalanced settings. Additionally, a trusted third party is typically required to verify generated proofs in PVC protocols, which can entail considerable judicial costs. Although Zhu et al. [22] eliminated the need for a third party by leveraging smart contracts for proof verification and to achieve financial security, the large size and high verification complexity of proofs make validations on smart contracts costly in practice.

In this work, we propose a blockchain-based unbalanced PSI protocol that leverages the immutable nature of blockchain technology to enable public verification of the result and achieves financial security through smart contracts. In addition, our protocol can maintain high efficiency even under conditions of bandwidth constraints and weaker client-side device capabilities. Unlike conventional general-purpose PVC protocols, our design has lower size and verification complexities of proofs, which enables proofs to be verified on smart contracts at a low cost. To the best of our knowledge, our protocol is the first to introduce blockchain and smart contracts into the unbalanced PSI to obtain public verification and financial security against a malicious server. More specifically, the primary contributions of this work are as follows:

On-chain anchoring of key data and generation of publicly verifiable proofs—in case of disputes over the final intersection, the client can readily generate publicly verifiable proofs based on tamper-proof Merkle roots of key data on the blockchain to accuse the server of dishonest behavior, thereby exerting a deterrent effect on the server.
Smart contract-based automatic verification and financial security—proofs are validated over smart contracts, and corresponding economic rewards or penalties are automatically executed to achieve financial security based on the verification result. Meanwhile, all the verification results about a server in history have been permanently recorded on the blockchain and can be publicly accessed by anyone, which further enhances the deterrent effect on the server.
Integration of Cuckoo filters—clients’ storage overhead and final query time are reduced, and Cuckoo filters have inherent support for deletion, facilitating dynamic updates of data stored on the client to avoid redundant transmissions.
Implementation and experimentation—the proposed unbalanced PSI protocol and associated smart contracts are realized, and experimental results demonstrate linear dependence of online communication based on clients’ dataset size. Moreover, the transaction cost of executing verification on smart contracts is very low.

The paper is structured as follows: Section 2 presents an overview of the related work. Section 3 defines the required notation and introduces the fundamental concepts. Section 4 begins with the presentation of the basic protocol and details the enhancements we have implemented to improve both its efficiency and security. This is followed by a thorough description of the full protocol and a theoretical analysis. In Section 5, we present and analyze our experimental findings. Finally, Section 6 concludes, provides an insightful analysis of the challenges, and proposes potential avenues for future research.

2. Related Work

Freedman et al. [23] first formally defined PSI and introduced a PSI protocol based on oblivious polynomial evaluation. Since then, many efficient PSI protocols have been proposed, among which protocols based on OT extension present the most competitive performance [24]. Pinkas et al. [5] proposed the first OT-based PSI protocol under the semi-honest model, followed by a series of improvements in works [6,7,8,9]. In these protocols, parties first hash elements to a data structure and then evaluate an oblivious pseudorandom function (OPRF) for each bin through OT extension. Pinkas et al. [25] proposed a malicious PSI protocol by combining a data structure called PaXoS with the actively secure OOS protocol [7]. However, one inherent property of OT-based PSI is that communication is linear with the size of the larger set and can require multiple rounds of interactions, which is not suitable for unbalanced PSI settings: especially scenarios with limited bandwidth. Extended from the PaXoS structure in [25], the OKVS-based PSI protocol in [10,11] is also inherently encumbered by a significant communication overhead. This intrinsic limitation impedes the protocol’s scalability and efficiency in unbalanced PSI settings.

With the advent of cloud computing, PSI protocols relying on a third party have been introduced in the literature [26,27]. In these protocols, participants encode their private data through a random function and send the encoded values to a third party, who computes the intersection and returns it to each party. Although these protocols are highly user-friendly and do not require all parties to be online simultaneously, their security largely depends on the trustworthiness of the third party.

As a specific case of SMPC, PSI can also be achieved by employing generic SMPC. Huang et al. [28] proposed the first PSI protocol based on garbled circuits, which was improved in work [29]. The advantage of such protocols lies in the ability to perform privacy-preserving computations on the obtained intersection, such as calculating the cardinality of the intersection or computing the sum of all elements in the intersection. Nonetheless, these methods require greater communication than OT-based PSI protocols, and clients may consume substantial memory when evaluating circuits. Therefore, PSI based on garbled circuits does not present significant advantages in scenarios where client devices have limited capabilities and no subsequent computations on the intersection are required.

At present, PSI protocols with high communication efficiency are mainly based on public key and homomorphic encryption. Meadow [30] and Huberman et al. [31] constructed PSI protocols using the Diffie–Hellman (DH) key exchange before PSI was formally defined in work [23]. Resende et al. [14] then reduced communication and improved computation efficiency by applying Cuckoo filters and using elliptic curve groups instead of prime-order groups. Nevertheless, the number of public key operations required by clients during the online phase is still linear with its set size. More importantly, zero-knowledge proof is usually required when extending these protocols to malicious security [13], which further increases clients’ workload. Cristofaro et al. [15] proposed a PSI protocol based on RSA blind signatures, where complex public key operations on clients can be completed offline independently of servers. However, this protocol similarly remains vulnerable to malicious servers.

Chen et al. [1] introduced a leveled fully homomorphic encryption (FHE) scheme into PSI by employing a series of optimization techniques, including batching, partitioning, and hashing. In this protocol, the communication is solely dependent on the set size of clients. Therefore, only a minimal amount of data need to be transmitted. Chen et al. [16] later extended this protocol to be secure against a malicious client by incorporating an OPRF preprocessing phase before FHE. However, their method cannot guarantee that the final intersection is correct. To address verification issues, Jiang et al. [32] introduced a homomorphic hash function to ensure output correctness. Due to the extensive use of pairing operations during the verification process, the client is burdened with a significant computation workload.

To the best of our knowledge, the protocol based on a hash proof system presented in [17] is currently the most communication-efficient unbalanced PSI protocol. However, the security of this protocol is only guaranteed under the semi-honest model. Similarly, the unbalanced PSI protocol proposed within [18], despite its commendable reduction in communication cost, presents a vulnerability in scenarios wherein the client operates with malicious intent.

On another front, Aumann et al. [33] introduced the notion of covert adversaries, who are allowed to behave maliciously but face a certain probability (deterrence factor) of being caught by the other party. They show that SMPC protocols designed against covert adversaries can achieve better efficiency than those designed against malicious adversaries. At the same time, this security model is meaningful in many real-world scenarios, such as business, finance, and politics, where entities might have an incentive to cheat yet cannot afford the loss of reputation or negative publicity from being caught cheating.

While the protocol of Aumann et al. [33] can ensure catching cheaters with a certain probability, it encounters significant challenges in persuading a third party (e.g., a court) to lend credence to such allegations. To address this issue, Asharov et al. [20] proposed publicly verifiable covert (PVC) security to enable the honest party to generate publicly verifiable proofs upon catching cheating, which can be verified by any third party without revealing the honest party’s private information. Nonetheless, there exists the potential for collusion between one party and the designated third party responsible for verifying the proof. Zhu et al. [22] combined PVC with smart contracts, proposing a new notion called financial security. Although they optimized the size of proofs and verification algorithms, gas costs remain too high to be practical. Furthermore, the majority of existing PVC protocols are implemented based on garbled circuits. Although they can indeed be adapted for PSI, they suffer from the same drawbacks as PSI protocols based on garbled circuits [28,29] when it comes to unbalanced PSI scenarios, as discussed earlier.

3. Preliminaries

In this section, we formalize some notations and basic definitions to be used in this paper.

3.1. Summary of Notations

$X, Y \subseteq {0, 1}^{σ}$ are the server’s and client’s input sets, with sizes $v = | X |$ and $w = | Y |$ , respectively.
$n_{1}$ is the number of elements sampled and validated by the client.
$r \leftarrow $ S$ indicates r was sampled from S with uniform distribution.
$κ, λ$ are the computational and statistical security parameters, respectively.
$e, d, n$ denote an RSA key pair, where $(e, n)$ constitutes the public key, and $(d, n)$ constitutes the private key. It is required that $e > 1$ , $g c d (e, φ (n)) = 1$ , and $e d \equiv 1 mod n$ , where $φ (n)$ represents Euler’s totient function of n.
$H_{1} : {0, 1}^{σ} \to Z_{n}^{*}, H_{2} : Z_{n}^{*} \to {0, 1}^{l}$ are hash functions modeled as random oracles, where l is the length of $H_{2}$ ’s output.
$M_{1}, M_{2}$ correspond to Merkle roots that are separately uploaded to the blockchain by the client and the server, respectively.
$t_{1}, t_{2}$ denote, respectively, deadlines for the server to upload $M_{2}$ and the client to submit the proof, subject to the condition that $t_{1} < t_{2}$ .

3.2. Blockchain and Smart Contract

The core concept of a blockchain was initially introduced by Satoshi Nakamoto [34]; it represents a technology solution that enables data storage, validation, and transmission without relying on a third party. It functions as a decentralized distributed ledger, embodying the principles of data integrity and trust. In a blockchain network, each node possesses equal status and rights, contributing to data consistency through a distributed network architecture and achieving consensus among nodes. This inherent decentralization is achieved by linking each block in the blockchain with the hash value of the previous block (except for the initial genesis block) [35]. Any malicious attempt to modify a transaction would require altering all subsequent blocks to gain acceptance from other nodes. The consensus mechanisms employed in blockchain systems, such as proof of work or proof of stake, impose significant costs on such manipulation attempts, thereby ensuring the immutability of the recorded data on the blockchain.

Currently, blockchains have evolved from basic distributed ledger databases into robust and reliable platforms. Ethereum, building upon the foundation laid by Bitcoin, introduced the concept of smart contracts. These smart contracts are executable through the Ethereum Virtual Machine (EVM), which supports Turing-complete computations. Unlike conventional programs, the execution outcome of a smart contract undergoes validation and requires consensus among all nodes before it is stored on the blockchain. By leveraging smart contracts, the traditional process of transaction endorsements can be transformed from manual legal agreements to automated code. Once the contract conditions are met, the contract’s terms are automatically enforced, reducing administrative costs within conventional contract execution.

Although the Ethereum Virtual Machine (EVM) theoretically supports the execution of highly complex smart contracts, its stack-based RISC architecture inherently limits execution efficiency. Additionally, as mentioned earlier, each node must redundantly store the source code of smart contracts and execute computations to achieve consensus. Consequently, overly intricate smart contracts can impose a substantial burden on the Ethereum network. To address this issue, Ethereum implements strict pricing mechanisms for instructions and storage. For example, a single multiplication instruction consumes 5 gas, while executing an exponentiation instruction requires

10 + 50 * l e n_{e x p}

(where

l e n_{e x p}

represents the number of bytes occupied by the exponent on the stack). The transaction fee is determined by the cumulative operations involved, meaning that the more data need to be stored or updated on the blockchain and the higher the computation complexity, the more expensive the transaction fee becomes. Therefore, in practical applications, it is crucial to minimize the complexity of smart contracts to avoid incurring substantial transaction costs for users and potential failed invocations of smart contracts due to Ethereum’s block gas limit (currently set at 30 million gas per block).

3.3. RSA Blind Signature

In the standard signing process, signers are privy to the original message being signed. To safeguard user privacy, David Chaum first introduced the concept of blind signatures [36]. Blind signatures can be viewed as a distinctive variant of digital signatures for which the signer can endorse the original message without actually knowing its specific content. Presently, blind signatures are commonly employed in domains such as electronic cash and electronic voting to guarantee anonymity.

Definition 1

(Blind Signature Scheme). A blind signature scheme consists of the following probabilistic polynomial-time (PPT) algorithms:

The key-generation algorithm $K e y G e n (1^{κ})$ takes as input a security parameter $1^{κ}$ and outputs a pair of keys $(s k, p k)$ .
The blind signing algorithm $< U s e r (p k, m), S i g n e r (s k) >$ is an interactive protocol between $U s e r$ and $S i g n e r$ . $U s e r$ is given a message m and a public key $p k$ , and $S i g n e r$ is given a secret key $s k$ . At the end of this protocol, $U s e r$ outputs either $σ$ , a signature on m, or ⊥ if the interaction is not successful.
The verification algorithm $V e r i f y (p k, m, σ)$ is deterministic and takes as input a public key $p k$ , a message m, and a signature $σ$ . It outputs 1 if $σ$ is valid on m under $p k$ and 0 otherwise.

We require that for every

κ

, every

(s k, p k)

output by

K e y G e n (1^{κ})

, and every message m in the appropriate underlying plaintext space, it holds that

V e r i f y (p k, m, < U s e r (p k, m), S i g n e r (s k) >) = 1

Regarding security, a blind signature scheme must satisfy blindness and unforgeability. Let

B S = (K e y G e n (1^{κ}), < U s e r (p k, m), S i g n e r (s k) >, V e r i f y (p k, m, σ)

be a blind signature scheme and

A

be an adversary. We consider the following two experiments:

The experiment

B l i n d_{A, B S} (κ)

:

$b \leftarrow $ {0, 1}, (p k, m_{0}, m_{1}, st) \leftarrow A (1^{k})$ .
$st \leftarrow A^{(U s e r (p k, m_{b}), \cdot), (U s e r (p k, m_{1 - b}), \cdot)} (st), σ_{b} \leftarrow U s e r (p k, m_{b}), σ_{1 - b} \leftarrow U s e r (p k, m_{1 - b})$ . If $σ_{0} = ⊥$ or $σ_{1} = ⊥$ , then let $(σ_{0}, σ_{1}) = (⊥, ⊥)$ .
$b^{*} \leftarrow A (st, σ_{0}, σ_{1})$ .

Definition 2

(Blindness). A blind signature scheme

B S

is blind if for all

P P T

adversaries

A

with one-time access to two

U s e r

oracles, there exists a negligible function

n e g l

such that

Pr (b^{*} = b) - \frac{1}{2} < n e g l (κ)

The experiment

F o r g e_{A, B S} (κ)

:

$(p k, s k) \leftarrow K e y G e n (1^{κ})$ .
${(m_{i}, σ_{i})}_{i = 1}^{k + 1} \leftarrow A^{(S (s k), \cdot)} (p k)$ .
Let event $S u c c e s s$ be: $m_{i} \neq m_{j} \land V e r i f y (m_{i}, σ_{i}, p k) = 1, \forall i, j \in [k + 1], i \neq j$ .

Definition 3

(Unforgeability). A blind signature scheme

B S

is unforgeable if for all

P P T

adversaries

A

with access to a

S i g n e r

oracle, there exists a negligible function

n e g l

such that

Pr (S u c c e s s) < n e g l (κ)

4. Protocol

We start with a succinct presentation of the basic protocol upon which our work relies, followed by the optimizations implemented. Subsequently, we outline the complete procedure of our protocol and proceed to conduct a theoretical analysis of its security and efficiency.

4.1. The Basic Protocol

Cristofaro et al. [37] proposed a secure PSI protocol under the one-more-RSA assumption [37]. This protocol is depicted in Figure 1 and works as follows: for each element

x_{i} \in X

, the server utilizes its private key

(d, n)

to sign the hash value of every element to obtain the signed value

s x_{i} = H_{1} {(x_{i})}^{d} mod n

. To prevent the client from reconstructing

H_{1} (x_{i})

from the signature, the server further applies a secondary hash function

H_{2}

to the signed values to get

h x_{i} = H_{2} (s x_{i})

. The client initiates by generating a random number

r_{j}

for each element

y_{j} \in Y

, which is employed to blind the original hash value

H_{1} (y_{j})

to get

b_{j} = H_{1} (y_{j}) \cdot {r_{j}}^{e} mod n

. The blinded value

b_{j}

is then transmitted to the server. Subsequently, the server signs each blinded value to derive

s b_{j} = b_{j}^{d} mod n

and sends both

s b_{j}

and

h x_{i}

back to the client. Upon receiving the server’s signatures on the blinded values, the client first deblinds the signatures to recover the signature on the original element’s hash value as

s y_{j} = s b_{j} / r_{j} mod n

. Finally, the intersection between both sets can be determined by comparing all

h x_{i}

against hashed versions of the recovered signatures, i.e.,

h y_{j} = H_{2} (s y_{j})

.

The correctness of the protocol is obvious. The signature associated with the client’s element

y_{j}

can be derived by removing the corresponding random value

r_{j}

, as demonstrated by Equation (1).

s y_{j} = \frac{s b_{j}}{r_{j}} mod n = \frac{{(H_{1} (y_{j}) * {r_{j}}^{e})}^{d}}{r_{j}} mod n = H_{1} {(y_{j})}^{d} mod n

(1)

Suppose there exist two elements

x_{i}^{*} \in X

and

y_{i}^{*} \in Y

such that

x_{i}^{*} = y_{i}^{*}

; the following Equation (2) holds true. Therefore, the elements in the intersection of sets

{h x_{1}, h x_{2}, . . ., h x_{v}}

and

{h y_{1}, h y_{2}, . . ., h y_{w}}

can be mapped back to the common elements in the intersection of sets X and Y.

h x_{i}^{*} = H_{2} (s x_{i}^{*}) = H_{2} (H_{1} {(x_{i}^{*})}^{d} mod n) = H_{2} (H_{1} {(y_{j}^{*})}^{d} mod n) = H_{2} (s y_{j}^{*}) = h y_{j}^{*}

(2)

In terms of security, this protocol is only secure against a malicious client. Informally, since the client only has access to some hash values of the server’s signatures, the only way for the client to obtain the server’s input is by first brute-forcing the value of

s x_{i}

from

h x_{i}

and then utilizing the public key

(e, n)

to compute

H_{1} (x_{i}) = {(s x_{i})}^{e} mod n

. At last, the client can retrieve the server’s private input

x_{i}

by enumerating possible input domains of

H_{1} (x_{i})

. However, given that

s x_{i}

ranges over

Z_{n}^{*}

, the probability of the client brute-forcing

s x_{i}

from

h x_{i}

can be considered computationally negligible as long as the hash function

H_{2}

is cryptographically secure. Regarding the client’s privacy, since the data it transmits to the server consists solely of

b_{j}

, which contains a random number, the client’s privacy is guaranteed statistically. A formal proof of the security properties can be found in reference [15].

4.2. Optimizations

In this section, we present optimizations for the basic protocol in Section 4.1 by incorporating Cuckoo filters to reduce the storage at the client side and leveraging the immutability of the blockchain to enhance the security against a malicious server.

4.2.1. Reduce Storage

In the basic protocol, the client is required to store the entire set

H X

for comparison. As a result, the storage space required increases with the size of the server’s set. In the case of unbalanced PSI, the server’s set is typically larger, which imposes significant storage overhead on the client. According to the birthday paradox, the approximate probability of experiencing a collision when mapping

(v + w)

elements to a domain of size

2^{l}

is

{(v + w)}^{2} / 2^{l}

. Therefore, if the probability of a hash collision occurring is required to be no more than

2^{- λ}

, the output length of

H_{2}

should be at least

l = 2 {log}_{2} (v + w) + λ - 1

. Assuming

λ = 40

,

v = 2^{30}

, and

w = 2^{8}

, approximately 11.25 GB of space is required to store set

H X

, which is highly prohibitive for clients with limited storage resources.

We introduce Cuckoo filters [38] to reduce the client’s storage. A Cuckoo filter can be seen as a compact variant of Cuckoo hashing [39], but rather than storing a complete element, each entry holds the fingerprint of the element. The fingerprint generally refers to a segment of the bit string derived from hashing the original element. When inserting a new element x, two candidate bucket locations

i_{1} = h a s h (x)

and

i_{2} = i_{1} \oplus h a s h (f)

are computed, where f represents the fingerprint of f. If one of these buckets has an empty entry, the fingerprint f is inserted into that vacant slot. Otherwise, one of the existing fingerprints

f^{'}

in the ith bucket is replaced by the birthday paradox, where

i \leftarrow $ {i_{1}, i_{2}}

. The displaced fingerprint

f^{'}

is moved to a bucket indexed by

i^{'} = i \oplus h a s h (f^{'})

. If no empty entry is found within a threshold number of attempts, insertion fails. Experiments by Fan et al. [38] show that with bucket sizes of four and fingerprints that are six bits or longer, Cuckoo filters can achieve a load factor of 95% and accommodate up to 4 billion elements. In summary, Cuckoo filters manage to maintain high occupancy rates while storing smaller fingerprints, thereby resulting in low storage per element on average.

Following the fundamental configuration of Fan et al. [38], where each element has two candidate buckets and the bucket size is four, the average space occupied per element at maximum load is:

C \leq 1.05 \times (3 - {log}_{2} ϵ)

(3)

where

ϵ

is the target false positive rate. Cuckoo filters can reduce the size of set

H X

from 11.25 GB to approximately 1.75 GB when

ϵ

is set to 0.001, thus substantially decreasing the storage demanded by the client.

Another significant advantage of the Cuckoo filter is its convenient lookup capability. When a client wishes to test whether a signature value

h y_{j}

is present in the set

H X

, it needs only check for the fingerprint of

h y_{j}

in the two candidate buckets. Therefore, the Cuckoo filter results in constant-time lookup complexity, which improves the client’s computation efficiency at the same time.

Apart from supporting general insertion and lookup operations, Cuckoo filters allow for dynamic deletion. The deletion process is straightforward: locate the bucket containing the element required to be deleted through the lookup algorithm and remove the corresponding fingerprint from the bucket. Hence, the deletion complexity is also

O (1)

. Leveraging this property, when the server’s data undergo only minor changes, the client does not need to re-download the entire dataset. Instead, the server can send the updated data along with the corresponding instructions (whether to add or remove) to the client. The client can then execute the appropriate operation based on the instruction to obtain an updated filter reflecting the server’s new set.

4.2.2. Improve Security

Although the server cannot obtain the client’s private input in the basic protocol, it can cause the client’s output to deviate from the correct result. For instance, during the signing on the client’s blinded value

b_{i}

, the server might sign an arbitrary random value rather than the actual

b_{i}

sent by the client. The server could even economize its computational resources by simply responding with a random value instead of utilizing its private key for signing.

We organize critical data in the intersection process into a Merkle tree structure and upload the corresponding Merkle root to the blockchain for anchoring. This enables the client to produce a publicly verifiable and non-repudiable proof.

Specifically, after obtaining set

B = {b_{1}, b_{2}, \dots, b_{w}}

, the client first computes the Merkle root

M_{1}

of B. Next, the client sends each item in B according to its position in the Merkle tree to the server orderly and uploads

M_{1}

to the blockchain. Upon receiving B and confirming that

M_{1}

has been recorded on the blockchain, the server computes the Merkle root of B and verifies whether it is equal to

M_{1}

. If there is a match, the server proceeds with the subsequent signing; otherwise, it aborts the protocol. Given the inherent properties of the Merkle tree structure, any difference between the data received by the server and set B will result in different Merkle roots in the end.

In the same way, the server computes the Merkle root

M_{2}

of set

S_{B} = {s b_{1}, s b_{2}, \dots, s b_{w}}

in the same order as B. It then sends the ordered set

S_{B}

to the client and uploads

M_{2}

to the blockchain. Upon receipt of

S_{B}

, the client initially verifies whether the Merkle root of the set received equals

M_{2}

. If not, the client terminates the protocol; otherwise, the client samples a proportion of data from

S_{B}

based on its expected deterrence effect. Let

s b_{t} \in S_{B}

denote one of the sampled elements: the client needs to verify whether Equation (4) holds true.

b_{t} = {(s b_{t})}^{e} mod n

(4)

If all sampled elements satisfy Equation (4), the result is considered correct. However, if there exists any element

s b_{t} \in S_{B}

such that

b_{t} \neq {(s b_{t})}^{e} mod n

, it can be inferred that the server did not faithfully execute the protocol. In this case, the client can generate a publicly verifiable proof to accuse the server of malicious behavior during the intersection process. The proof consists of the following components:

The elements $b_{t} \in B$ and $s b_{t} \in S_{B}$ that do not satisfy Equation (4);
The index $t \in [1, w]$ of $b_{t}$ in set B (or $s b_{t}$ in set $S B$ );
The paths $r o a d_{b_{t}}$ and $r o a d_{s b_{t}}$ for $b_{t}$ in the Merkle trees with roots $M_{1}$ and $M_{2}$ , respectively.

The verification algorithm for

P r o o f = {t, b_{t}, s b_{t}, r o a d_{b_{t}}, r o a d_{s b_{t}}}

consists of the following three steps.

Verify whether the element $b_{t}$ is located at index t in the Merkle tree with root $M_{1}$ according to $r o a d_{b_{t}}$ . If the verification succeeds, proceed to the next step; otherwise, return False.
Verify whether the element $s b_{t}$ is located at index t in the Merkle tree with root $M_{2}$ according to $r o a d_{s b_{t}}$ . If the verification succeeds, proceed to the next step; otherwise, return False.
Verify whether the equality $b_{t} = {(s b_{t})}^{e} mod n$ holds using the server’s public key $(e, n)$ . If it does, return False; otherwise, return True.

A validation result of True indicates that the server did not adhere to the protocol’s execution, whereas a result of False implies that the proof is invalid.

4.3. Full Protocol

In this section, we present our full protocol, which is based on the basic protocol with the optimizations proposed in Section 4.2. The overview of the system architecture and interaction logic is depicted in Figure 2.

The blockchain-based PSI system consists of four types of entities: server, client, blockchain, and IPFS (optional). The server and client are parties holding data. The blockchain is used to record the Merkle root, verify proofs submitted by the client, and execute corresponding economic measures according to verification results. IPFS is optional and is where the server uploads its encrypted data. In addition to IPFS, any cloud can be used to store the server’s encrypted set. The client can also download data directly from the server when bandwidth permits. For the sake of simplicity, we assume that the server directly transmits its encrypted data to the client in the following protocol description.

The description of our full protocol is illustrated in Figure 3 and consists of four main processes as follows:

Setup is used to perform some pre-processes and prepare for subsequent calculations.
(a)
The server publishes the public key $(e, n)$ on the blockchain and stakes the required deposit.
(b)
For each $x_{i} \in X$ , the server computes $s x_{i} = H_{1} {(x_{i})}^{d} mod n$ and $h x_{i} = H_{2} (s_{x}_{i})$ . It then generates a Cuckoo filter $G F_{X}$ that inserts set $H X = {h x_{1}, h x_{2}, \dots, h x_{v}}$ .
(c)
The client generates w or more random numbers and encrypts them using the server’s public key to obtain $R_{i} = r_{j}^{e} mod n$ .
(d)
The client pledges a certain amount of deposit to the smart contract under the contract requirements. Once the deposit meets the requirements, the blockchain emits a request event to the server.
(e)
Upon listening to the request event, the server can authorize the client to communicate with it.
Computing is the main process, whereby the client obtains blind signatures on its set through a single interaction with the server.
(a)
The client blinds set Y to obtain set $B = {b_{1}, b_{2}, \dots, b_{w}}$ , where $b_{j} = H_{1} (y_{j}) \cdot R_{j} mod n$ .
(b)
The client uploads the Merkle root $M_{1}$ of B to the blockchain and sends B to the server.
(c)
The server verifies whether the Merkle root of received data is consistent $M_{1}$ on the blockchain, and it exits the protocol if they do not match.
(d)
The server signs every element $b_{j}$ in B to generate set $S B = {s b_{1}, s b_{2}, \dots, s b_{w}}$ using the private key $(d, n)$ , where $s b_{j} = {(b_{j})}^{d} mod n$ . It then uploads the Merkle root $M_{2}$ of $S B$ to the blockchain and sends set $S B$ and the Cuckoo filter $G F_{X}$ to the client.
(e)
The client verifies whether the Merkle root of received data is consistent $M_{2}$ on the blockchain, and it exits the protocol if they do not match.
(f)
If the client fails to upload $M_{1}$ or the server fails to upload $M_{2}$ to the blockchain before time $t_{1}$ , both parties’ deposits will be unlocked and the protocol will be terminated.
Verifying aims to verify whether the blind signatures received from the server are valid.
(a)
The client randomly samples $n_{1}$ elements from $S B$ , where the number of samples depends on the expected deterrence factor. It then checks whether the sample point $b_{t}$ satisfies $b_{t} = {(s b_{t})}^{e} mod n$ . If all samples satisfy this equation, it proceeds to the output phase; otherwise, it generates a proof $P r o o f = (t, b_{t}, s b_{t}, r o a d_{b_{t}}, r o a d_{s b_{t}})$ .
(b)
The client submits $P r o o f$ to the blockchain and invokes the verification function within the smart contract to verify whether $P r o o f$ is valid. If the proof is valid, the smart contract automatically deducts the server’s deposit as a penalty and refunds the client’s deposit. Then, the protocol ends.
(c)
If the smart contract does not receive any valid proofs prior to time $t_{2}$ , the server’s deposit will be refunded. A portion of the client’s deposit may be transferred to the server’s account as a reward (subject to the specific business rules), while the remaining portion (if any) will be refunded to the client’s account. Then, the protocol terminates.
Output is the last process and is responsible for calculating the final intersection.
(a)
The client recovers each element in $S B$ using the random values used during blinding to obtain signatures $s y_{j} = s b_{j} / r_{j} mod n$ of the original elements and hash values $h y_{j} = H_{2} (s y_{j})$ .
(b)
The client initializes the final intersection I as empty. It then looks up $h y_{j}$ in the Cuckoo filter $G F_{X}$ , and the corresponding original element $y_{j}$ is added to I if $h y_{j}$ exists in $G F_{X}$ .
(c)
At last, the client outputs the final intersection I.

Compared to the basic protocol, our protocol introduces an additional step of calculating the Merkle roots

M_{1}

and

M_{2}

. However, the elements contained in B and

S B

remain unchanged, ensuring that this process does not impact the correctness of the results. Furthermore, in our protocol, the server transmits the Cuckoo filter

G F_{X}

, which inserts set

H X

instead of the original set

H X

. Nonetheless, due to the Cuckoo filter’s low false positive rate when appropriately configured, the probability of the client making a false judgment on whether

h y_{j}

is in set

H X

does not exceed the false positive rate of the utilized filter. Therefore, given adherence to the protocol by both parties, the correctness of the basic protocol combined with the sufficiently low false positive rate of the Cuckoo filter ensures that our protocol yields the correct intersection result (with an error rate not exceeding the false positive rate of the Cuckoo filter).

4.4. Theoretical Analysis

We conduct a theoretical analysis of our protocol from security and efficiency viewpoints in this section.

4.4.1. Security Analysis

In terms of security, our protocol does not compromise the security guarantees provided by the basic protocol. Informally, on the one hand, the uploaded Merkle roots

M_{1}

and

M_{2}

reveal no additional information about sets B and

S B

. On the other hand, encoding set

H X

with a Cuckoo filter does not provide the client with more information compared to directly transmitting set

H X

. Formally, if there exists a PPT algorithm

A

that can break our proposed protocol with a non-negligible advantage

γ

, we can treat this algorithm as a subroutine to construct an algorithm

A^{'}

that breaks the basic protocol as follows: firstly, obtain sets

H X

and

S B

(or B); then, insert

H X

into a Cuckoo filter and compute the Merkle root of

S B

(or B) before invoking algorithm

A

. The view of algorithm

A

running as a subroutine of algorithm

A^{'}

is identical to its execution within our proposed protocol. Therefore, the advantage of algorithm

A

for breaking the basic protocol is also at least

γ

.

Additionally, our protocol can capture malicious behavior of the server with a certain probability. Specifically, let us assume that the proportion of the client’s data correctly signed by the server is denoted as

p (0 \leq p \leq 1)

, meaning that only

w p

elements in set B are correctly signed. If the client randomly selects

n_{1} (1 \leq n_{1} \leq w)

data points from set

S B

for verification, the probability

P_{c a p}

of capturing the server’s malicious behavior is given by Equation (5).

P_{c a p} = 1 - \frac{(\binom{w (1 - p)}{n_{1}}) \cdot (\binom{w p}{0})}{(\binom{w}{n_{1}})} = 1 - \frac{A_{w (1 - p)}^{n_{1}}}{A_{w}^{n_{1}}}

(5)

It can be observed that

P_{c a p} = 1

when

n_{1} > w (1 - p)

; otherwise,

P_{c a p}

is shown in Equation (6).

P_{c a p} \geq 1 - {(1 - p)}^{n_{1}}

(6)

where

n_{1} \leq w (1 - p)

.

By Equations (5) and (6), we observe that when the server’s malicious behavior remains constant, the higher the number of elements verified by the client during the validation phase, the greater the probability of detecting the server’s malicious behavior. In an extreme case, if the client does not tolerate any error in the results, it can verify every element in set

S B

. However, this approach comes at the cost of increased workload. Therefore, our protocol allows the client to adaptively verify the results to achieve the desired level of deterrence even with limited computational resources.

Moreover, upon failure to validate any sampled element, the client can generate a publicly verifiable and irrefutable proof to accuse the server. Our protocol ensures accountability, defamation-free elements, and privacy of the proof as follows:

Accountability: If Equation (4) does not hold, an honest client will always be able to produce a valid proof causing the output of the verification algorithm to be True. Given that both element $b_{t} \in B$ and its corresponding $s b_{t} \in S B$ reside in Merkle trees with roots $M_{1}$ and $M_{2}$ , the client can provide effective paths $r o a d_{b_{t}}$ and $r o a d_{s b_{t}}$ to pass the first two steps of the verification algorithm. Therefore, as long as there exists an element $b_{t} \in B$ satisfying $b_{t} \neq {(s b_{t})}^{e} mod n$ , the output of the verification algorithm will be True.
Defamation-Free: If the server is honest, the probability that a client generates a proof such that the output of the verification algorithm is True is negligible. Since the server is honest, every $s b_{t} \in S B$ satisfies $b_{t} = {(s b_{t})}^{e} mod n$ except for a negligible probability. Additionally, the Merkle roots of B and $S B$ have been recorded on the blockchain. Due to the tamper-evident nature of the blockchain, the client cannot modify $M_{1}$ and $M_{2}$ . Consequently, if the client attempts to tamper with either $b_{t}$ or $s b_{t}$ such that they do not satisfy Equation (4), the integrity of the Merkle tree renders it impossible for the client to provide valid Merkle paths for the altered elements.
Privacy: Apart from a negligible probability, the proof generated by the client does not disclose its private information. In the proof $P r o o f = (t, b_{t}, s b_{t}, r o a d_{b_{t}}, r o a d_{s b_{t}})$ , t, $b_{t}$ , and $s b_{t}$ are known to the server during the intersection process. The Merkle paths $r o a d_{b_{t}}$ and $r o a d_{s b_{t}}$ can also be independently calculated by the server. Therefore, the proof $P r o o f$ does not reveal any additional information to the server, which preserves the client’s privacy against the server and others.

Our protocol further enhances security by automatically executing economic incentives on smart contracts. For the client, once the server’s malicious behavior is captured, it can be inferred from the above accountability that the client can generate a valid proof to pass the verification of the smart contract. Since both parties have already pledged a certain amount of deposit before the computation, the smart contract automatically deducts the server’s deposit without the need for any third party for enforcement when the verification result is True. For the honest server, as implied by the defamation-free property, the client cannot generate a valid proof that passes the smart contract verification. Therefore, even if the client does not actively pay the server a reward, a portion of the client’s deposit will automatically transfer to the server’s account after time

t_{2}

. Furthermore, in our design, once a proof accusing the server is successfully validated by the smart contract, it is recorded on the blockchain through the contract’s event mechanism. Similarly, the event of the smart contract not receiving a valid proof before time

t_{2}

is also recorded on the blockchain. All these events contribute to the archive of the server providing the PSI service, which is publicly accessible and permanently stored on the blockchain. This, to some extent, enhances the deterrence against a malicious server.

Note that our protocol does not offer forward secrecy for the server. In practice, the server’s data typically change minimally. If a leakage occurs at any given point, it would lead to the compromise of almost all the data. Therefore, forward security is meaningful for the server. On the other hand, the client can achieve forward security by employing different random numbers according to its specific requirements.

4.4.2. Efficiency Analysis

In this section, we conduct a theoretical analysis of the efficiency of the proposed protocol and compare it with the protocols of Chen et al. [16], Jiang et al. [32], and Pinkas et al. [25], as summarized in Table 1. Chen [16] introduced an OPRF phase to ensure security against a malicious client, resulting in two rounds of interaction built upon [1]. To address the issue of not being able to verify the intersection in [16], Jiang [32] incorporated publicly verifiable inner product computations, which increases the communication complexity to

O (v)

and requires a large number of bilinear pairing operations. The protocol of Pinkas [25] is computationally efficient, yet its communication is

O (v + w)

. Furthermore, Paxos structures are employed to encode client’s data, and extra checking operations are introduced during active OT extension, which further increases communication overhead.

As demonstrated in Section 4.4.1, our protocol ensures that clients can verify the correctness of the results with a certain success rate, which depends on the level of deterrence the client aims to achieve.

Our protocol is designed to be completed in a single round of communication, wherein the client sends B to the server and subsequently receives

S B

and

G F_{X}

in return from the server. Although it appears that the total communication overhead encompasses B,

S B

, and

G F_{X}

, the transmission of the Cuckoo filter

G F_{X}

can be accomplished without the requirement for both parties to be concurrently online. The server can store

G F_{X}

in IPFS or any third-party cloud storage, which allows the client to download

G F_{X}

at any moment before the output phase, mitigating potential bandwidth limitations during online intersection computation. Therefore, the actual online communication volume solely comprises B and

S B

, which results in the final communication complexity being

O (w)

.

There may be slight changes on the server’s local dataset after the first upload of

G F_{X}

, such as the addition of new data or the removal of existing data (the update operation can be split into removal and addition). Instead of generating a new Cuckoo filter and repeatedly uploading it, the server can directly upload changed elements and corresponding instructions (including “ADD” and “REMOVE”). Each update will result in a new version of

G F_{X}

. When the client later performs another PSI with the same server whose dataset has undergone minor changes, it can obtain the Cuckoo filter for a specified version by tracing each update between the version it currently has and the expected version. For example, the client follows the insertion algorithm to add new elements to

G F_{X}

when the instruction is “Add”. For “REMOVE” instructions, the client uses the deletion algorithm to remove specified data from

G F_{X}

. However, the filter cannot exceed its maximum load (usually 95% when the bucket size is four and fingerprints are large enough). If the maximum load is exceeded, the server needs to extend the filter’s size and updates a new Cuckoo filter.

In terms of computational efficiency, the server is required to sign each element in its own set and the client’s blind set. Since the server’s set size is usually large, this process involves a significant number of exponential operations. However, this only needs to be performed once. The cost of this process can be amortized over the subsequent PSI with other clients. Consequently, during the online computation phase, the server is only required to sign each element within the set B received from the client, yielding a computational complexity of

O (w)

. Additionally, since the server is cognizant of the factorization of n, it can leverage the Chinese remainder theorem to dramatically improve the efficiency of the signing process.

The client’s workload is predominantly composed of three main components: (1) raising random values

r_{j}

-s to the e-th power

(mod n)

, (2) sampling to check the result’s correctness, and (3) generating a proof when the check fails. The subsequent discussion demonstrates that, given a certain probability that the server is engaging in malicious behavior, the online computational complexity of the aforementioned three processes is independent of the sizes of the sets held by both parties and is computationally efficient.

Although the computational complexity of the first process appears to be

O (w)

, this process can be completed offline as it does not depend on the server’s and client’s sets. On the other hand, as mentioned in the literature [15], unlike the typical requirement in RSA encryption, where the public exponent e should not be too small, we can use

e = 3

due to the introduction of random values

r_{j}

, which can significantly reduce the complexity of exponential operations. In practical applications, the values

R_{j}

and

{(r_{j})}^{- 1}

can be reused multiple times when the client’s forward security is not required, further amortizing the client’s computational overhead.

According to Equation (6), the number of samples required for spot checks is solely contingent upon the anticipated deterrence factor

P_{c a p}

and the probability p of the server committing malicious acts, and sampling a subset of elements can achieve a high level of deterrence. For example, assuming a server’s probability of incorrect computation is 0.001, the probability of detecting the server’s malicious behavior can reach 72% when

2^{7}

elements are sampled for verification. Taking into account that the protocol of Pinkas et al. [25] also requires a certain amount of public key operations in the basic OT phase, we believe that, for the client, the computation complexity of our protocol is comparable to it. Moreover, as mentioned above, the exponential operation during the checks is very simple (with an exponent of three).

The generation of the proof is trivial for the client. During the computing phase, the client has already computed the Merkle trees of sets B and

S B

separately, which allows it to easily retrieve paths

r o a d_{b_{t}}

and

r o a d_{s b_{t}}

from

b_{t}

and

s b_{t}

to the

M_{1}

and

M_{2}

, respectively.

Furthermore, since the depth of a Merkle tree grows logarithmically with the size of the set, the proof’s size is only

O (l o g w)

. On the other hand, the verification algorithm only contains two Merkle paths’ validation and a single exponentiation operation with an exponent of three, which can be accomplished at a very low cost on smart contracts by utilizing the built-in precompiled contracts on Ethereum.

5. Experiments

To evaluate the performance of the scheme proposed in this paper, two experiments are designed involving on-chain contract execution and off-chain computation. The off-chain computation is implemented using the C++ language with the cryptographic library GMP for large-integer arithmetic and OpenSSL for random number generation and hash function implementation. Both the server and client run on a Linux-based desktop equipped with a single-core Intel Xeon Gold 6278C CPU at a base frequency of 2.6 GHz and are executed in a single thread. The parameters are set as follows:

κ = 80, λ = 40, n_{1} = 128

.

Table 2 lists the storage required by the client for different sizes of the server’s set when the size of the client’s set is fixed at

2^{12}

. It can be seen that the Cuckoo filter significantly reduces the storage on the client, especially when the size of the server’s set is large.

Figure 4 shows that online communication grows linearly with the size of the client’s set. When

w = 2^{12}

, online communication is only 1 MB. From Figure 5, it is evident that the computation time for both parties is roughly linearly related to the size of the client’s set. More specifically, the client’s computation time is much shorter than the server’s, and the online phase comprises only a small fraction of the total computation time. Therefore, in our protocol, the main computational workload is in the server’s online computing phase, while the online computational workload for the client is minimal.

The on-chain contract is responsible for verifying proofs and automatically executing economic measures and is written in Solidity and tested on the Ethereum testnet Sepolia. To ensure compatibility with Ethereum’s hash function, we utilize the keccak-256 hash function to compute Merkle roots during the off-chain computation. Keccak-256 is Ethereum’s prevalent cryptographic hash function and is different from the standard sha3-256 only in padding mode, and it offers a high level of security. To reduce the transaction cost of verifying proofs on smart contracts, we employed Ethereum’s precompiled contracts RIPEMD160 for Merkle paths’ verification and ModExp for modular exponentiation arithmetic. Figure 6 presents the sizes of the generated publicly verifiable proofs along with the gas consumption required for their validation on smart contracts for various sizes of the client’s set.

From Figure 6, it is apparent that both the size of the generated proofs and the gas required for their verification exhibit logarithmic growth with the increase in the client’s set size. Currently, the contract call data in a single transaction is limited to 2048 bytes on Ethereum. Hence, our proofs are capable of supporting w up to

2^{28}

, which sufficiently meets the demands for practical unbalanced PSI. It should be noted that even if this limit is exceeded, proofs can be split into two or more transactions for contract calls. Transaction costs for verifying proofs on our smart contract are much lower than the current block gas limit (30 million gas). Based on the current Ethereum mainnet gas price (20 Gwei) and ETH price (USD 3035.93) [40], when the size of the client’s set is

2^{12}

, the verification cost amounts to just USD 5.74.

6. Conclusions

In this paper, we propose and implement a blockchain-based unbalanced PSI protocol with public verification and financial security. Our protocol allows the client to compute and verify the final intersection with efficient communication and client-side computation. The client can also generate publicly verifiable proofs to accuse the server of malicious behavior, which can be automatically validated by smart contracts at a low cost. By designing an appropriate economic incentive and penalty mechanism, our protocol provides financial security assurances on the foundation of public verification. We believe that our protocol enables clients to better protect their interests in unbalanced PSI scenarios without incurring significant costs.

Nevertheless, the server’s online computational demands, in terms of public key operations, escalate linearly with the magnitude of the client’s dataset. Consequently, the challenge of diminishing the computational burden on the server, without compromising the public verification of the result and the operational efficiency of the client, presents a significant avenue for subsequent scholarly inquiry.

Author Contributions

Methodology, Z.W.; Supervision, X.M. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was funded by the National Key R&D Program of China, grant number [2021YFC3340600].

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chen, H.; Laine, K.; Rindal, P. Fast Private Set Intersection from Homomorphic Encryption. In Proceedings of the 24th ACM-SIGSAC Conference on Computer and Communications Security (ACM CCS), Dallas, TX, USA, 30 October–3 November 2017; pp. 1243–1255. [Google Scholar] [CrossRef]
Zhou, Q.; Zeng, Z.; Wang, K.; Chen, M. Privacy Protection Scheme for the Internet of Vehicles Based on Private Set Intersection. Cryptography 2022, 6, 64. [Google Scholar] [CrossRef]
Mezzour, G.; Perrig, A.; Gligor, V.; Papadimitratos, P. Privacy-Preserving Relationship Path Discovery in Social Networks. In Proceedings of the 8th International Conference on Cryptology and Network Security, Kanazawa, Japan, 12–14 December 2009; Volume 5888, pp. 189–208. [Google Scholar]
Li, J.; Liu, Y.M.; Wu, S. Pipa: Privacy-preserving Password Checkup via Homomorphic Encryption. In Proceedings of the 16th ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS), Virtual Event, Hong Kong, 7–11 June 2021; pp. 242–251. [Google Scholar] [CrossRef]
Pinkas, B.; Schneider, T.; Zohner, M.; Assoc, U. Faster Private Set Intersection based on OT Extension. In Proceedings of the 23rd USENIX Security Symposium, San Diego, CA, USA, 20–22 August 2014; pp. 797–812. [Google Scholar]
Kolesnikov, V.; Kumaresan, R.; Rosulek, M.; Trieu, N. Efficient Batched Oblivious PRF with Applications to Private Set Intersection. In Proceedings of the 23rd ACM Conference on Computer and Communications Security (CCS), Vienna, Austria, 24–28 October 2016; pp. 818–829. [Google Scholar] [CrossRef]
Orrù, M.; Orsini, E.; Scholl, P. Actively Secure 1-out-of-N OT Extension with Application to Private Set Intersection. In Proceedings of the RSA Conference on Cryptographer’s Track (CT-RSA), San Francisco, CA, USA, 14–17 February 2017; Volume 10159, pp. 381–396. [Google Scholar] [CrossRef]
Pinkas, B.; Schneider, T.; Zohner, M. Scalable Private Set Intersection Based on OT Extension. Acm Trans. Priv. Secur. 2018, 21, 7. [Google Scholar] [CrossRef]
Pinkas, B.; Schneider, T.; Segev, G.; Zohner, M.; Assoc, U. Phasing: Private Set Intersection using Permutation-based Hashing. In Proceedings of the 24th USENIX Security Symposium, Washington, DC, USA, 12–14 August 2015; pp. 515–530. [Google Scholar]
Jiang, Z.; Guo, X.; Yu, T.; Zhou, H.; Wen, J.; Wu, Z. Private Set Intersection Based on Lightweight Oblivious Key-Value Storage Structure. Symmetry 2023, 15, 2083. [Google Scholar] [CrossRef]
Raghuraman, S.; Rindal, P. Blazing Fast PSI from Improved OKVS and Subfield VOLE. In Proceedings of the Proceedings of the 2022 ACM SIGSAC Conference on Computer and Communications Security, Los Angeles, CA, USA, 7–11 November 2022; pp. 2505–2517. [CrossRef]
Ishai, Y.; Kilian, J.; Nissim, K.; Petrank, E. Extending oblivious transfers efficiently. In Proceedings of the Annual International Cryptology Conference, Santa Barbara, CA, USA, 17–21 August 2003; pp. 145–161. [Google Scholar]
Jarecki, S.; Liu, X.M. Fast Secure Computation of Set Intersection. In Proceedings of the 7th Conference on Security and Cryptography for Networks, Amalfi, Italy, 13–15 September 2010; Volume 6280, pp. 418–435. [Google Scholar]
Resende, A.C.D.; Aranha, D.F. Faster Unbalanced Private Set Intersection. In Proceedings of the 22nd International Conference on Financial Cryptography and Data Security (FC), Nieuwpoort, Curaçao, 26 February–2 March 2018; Volume 10957, pp. 203–221. [Google Scholar] [CrossRef]
Cristofaro, E.D.; Tsudik, G. Practical private set intersection protocols with linear complexity. In Proceedings of the 14th Practical Private Set Intersection Protocols with Linear Complexity, Tenerife, Canary Islands, 25–28 January 2010. [Google Scholar] [CrossRef]
Chen, H.; Huang, Z.C.; Laine, K.; Rindal, P. Labeled PSI from Fully Homomorphic Encryption with Malicious Security. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), Toronto, ON, Canada, 15–19 October 2018; pp. 1223–1237. [Google Scholar] [CrossRef]
Zhao, Q.; Jiang, B.; Zhang, Y.; Wang, H.; Mao, Y.; Zhong, S. Unbalanced private set intersection with linear communication complexity. Sci. China Inf. Sci. 2024, 67, 132105. [Google Scholar] [CrossRef]
Ning, J.; Tan, Z.; Zhang, K.; Ye, W. Low Communication-Cost PSI Protocol for Unbalanced Two-Party Private Sets. IET Inf. Secur. 2024, 2024, 6052651. [Google Scholar] [CrossRef]
Hazay, C.; Lindell, Y. Efficient Protocols for Set Intersection and Pattern Matching with Security Against Malicious and Covert Adversaries. J. Cryptol. 2010, 23, 422–456. [Google Scholar] [CrossRef]
Asharov, G.; Orlandi, C. Calling Out Cheaters: Covert Security with Public Verifiability. In Proceedings of the 18th International Conference on Theory and Application of Cryptology and Information Security (ASIACRYPT), Beijing, China, 2–6 December 2012; Volume 7658, pp. 681–698. [Google Scholar]
Hong, C.; Katz, J.; Kolesnikov, V.; Lu, W.j.; Wang, X. Covert Security with Public Verifiability: Faster, Leaner, and Simpler. In Proceedings of the Advances in Cryptology—EUROCRYPT 2019, Darmstadt, Germany, 19–23 May 2019; pp. 97–121. [Google Scholar]
Zhu, R.Y.; Ding, C.C.; Huang, Y. Efficient Publicly Verifiable 2PC over a Blockchain with Applications to Financially-Secure Computations. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security (CCS), London, UK, 11–15 November 2019; pp. 633–650. [Google Scholar] [CrossRef]
Freedman, M.J.; Nissim, K.; Pinkas, B. Efficient Private Matching and Set Intersection. In Proceedings of the Advances in Cryptology—EUROCRYPT 2004, Interlaken, Switzerland, 2–6 May 2004; pp. 1–19. [Google Scholar]
Morales, D.; Agudo, I.; Lopez, J. Private set intersection: A systematic literature review. Comput. Sci. Rev. 2023, 49, 100567. [Google Scholar] [CrossRef]
Pinkas, B.; Rosulek, M.; Trieu, N.; Yanai, A. PSI from PaXoS: Fast, Malicious Private Set Intersection. In Proceedings of the 39th Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT), Zagreb, Croatia, 10–14 May 2020; Volume 12106, pp. 739–767. [Google Scholar] [CrossRef]
Fan, C.; Jia, P.; Lin, M.; Wei, L.; Guo, P.; Zhao, X.; Liu, X. Cloud-Assisted Private Set Intersection via Multi-Key Fully Homomorphic Encryption. Mathematics 2023, 11, 1784. [Google Scholar] [CrossRef]
Abadi, A.; Dong, C.; Murdoch, S.J.; Terzis, S. Multi-party Updatable Delegated Private Set Intersection. In Proceedings of the 26th International Conference on Financial Cryptography and Data Security, Grenada, 2–6 May 2022. [Google Scholar] [CrossRef]
Huang, Y.; Evans, D.; Katz, J.; Malka, L. Faster secure two-party computation using garbled circuits. In Proceedings of the 20th USENIX Conference on Security, San Francisco, CA, USA, 8–12 August 2011. [Google Scholar]
Ciampi, M.; Orlandi, C. Combining Private Set-Intersection with Secure Two-Party Computation. In Proceedings of the 11th International Conference on Security and Cryptography for Networks (SCN), Amalfi, Italy, 5–7 September 2018; Volume 11035, pp. 464–482. [Google Scholar] [CrossRef]
Meadows, C. A More Efficient Cryptographic Matchmaking Protocol for Use in the Absence of a Continuously Available Third Party. In Proceedings of the 1986 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 7–9 April 1986; p. 134. [Google Scholar] [CrossRef]
Huberman, B.A.; Franklin, M.; Hogg, T. Enhancing privacy and trust in electronic communities. In Proceedings of the 1st ACM Conference on Electronic Commerce, Denver, CO, USA, 3–5 November 1999. [Google Scholar] [CrossRef]
Jiang, Y.; Wei, J.; Pan, J. Publicly Verifiable Private Set Intersection from Homomorphic Encryption. In Proceedings of the Security and Privacy in Social Networks and Big Data, Xi’an, China, 16–18 October 2022; pp. 117–137. [Google Scholar]
Aumann, Y.; Lindell, Y. Security Against Covert Adversaries: Efficient Protocols for Realistic Adversaries. J. Cryptol. 2010, 23, 281–343. [Google Scholar] [CrossRef]
Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash System. 2008. Available online: https://bitcoin.org/bitcoin.pdf (accessed on 18 February 2024).
Martínez, V.G.; Hernández-Álvarez, L.; Encinas, L.H. Analysis of the Cryptographic Tools for Blockchain and Bitcoin. Mathematics 2020, 8, 131. [Google Scholar] [CrossRef]
Chaum, D. Blind Signature System. In Advances in Cryptology: Proceedings of Crypto 83; Springer: Boston, MA, USA, 1984; p. 153. [Google Scholar] [CrossRef]
Bellare, M.; Namprempre, C.; Pointcheval, D.; Semanko, M. The one-more-RSA-inversion problems and the security of Chaum’s blind signature scheme. J. Cryptol. 2003, 16, 185–215. [Google Scholar] [CrossRef]
Fan, B.; Andersen, D.G.; Kaminsky, M.; Mitzenrnacher, M.D. Cuckoo Filter: Practically Better Than Bloom. In Proceedings of the 10th ACM International Conference on Emerging Networking Experiments and Technologies (ACM CoNEXT), Sydney, Australia, 2–5 December 2014; pp. 75–87. [Google Scholar] [CrossRef]
Pagh, R.; Rodler, F.F. Cuckoo hashing. J. Algorithms 2004, 51, 122–144. [Google Scholar] [CrossRef]
Etherscan. Available online: https://etherscan.io/ (accessed on 25 March 2024).

Figure 1. Basic protocol proposed in [37].

Figure 2. Overall system architecture diagram.

Figure 3. Our full protocol.

Figure 4. Online communication costs for different sizes of the client’s set.

Figure 5. Computation time for different sizes of the client’s set.

Figure 6. Size of proofs and gas cost for verification for different sizes of the client’s set.

Table 1. Comparison of related PSI protocols.

Protocol	Correctness	Number of Rounds	Communication Complexity	Computation of Server	Computation of Client
Chen [16]	✘	2	$O (w \cdot l o g v)$	$O (v^{2})$	$O (w \cdot l o g v)$
Jiang [32]	✔	2	$O (v)$	$O (v^{2})$	$O (v + w)$
Pinkas [25]	✔	3	$O (v + w)$	$O (1)$	$O (1)$
Ours	✔	1	$O (w)$	$O (w)$	$O (1)$

Table 2. The client’s space cost of storing the server’s hashed set in MB in the basic protocol and our protocol.

v	Basic Protocol	Our Protocol
$2^{16}$	0.56	0.19
$2^{20}$	9.88	3
$2^{24}$	174	48
$2^{28}$	3040	768

Special note:

w = 2^{12}

; the configuration of the Cuckoo filter used is as follows: bucket size

b = 4

, fingerprint length

f = 12

, and false positive rate

ϵ

is up to 0.0496%.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Z.; Ma, X. Blockchain-Based Unbalanced PSI with Public Verification and Financial Security. Mathematics 2024, 12, 1544. https://doi.org/10.3390/math12101544

AMA Style

Wang Z, Ma X. Blockchain-Based Unbalanced PSI with Public Verification and Financial Security. Mathematics. 2024; 12(10):1544. https://doi.org/10.3390/math12101544

Chicago/Turabian Style

Wang, Zhanshan, and Xiaofeng Ma. 2024. "Blockchain-Based Unbalanced PSI with Public Verification and Financial Security" Mathematics 12, no. 10: 1544. https://doi.org/10.3390/math12101544

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Blockchain-Based Unbalanced PSI with Public Verification and Financial Security

Abstract

1. Introduction

2. Related Work

3. Preliminaries

3.1. Summary of Notations

3.2. Blockchain and Smart Contract

3.3. RSA Blind Signature

4. Protocol

4.1. The Basic Protocol

4.2. Optimizations

4.2.1. Reduce Storage

4.2.2. Improve Security

4.3. Full Protocol

4.4. Theoretical Analysis

4.4.1. Security Analysis

4.4.2. Efficiency Analysis

5. Experiments

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI