Next Article in Journal
Low-Profile UWB-MIMO Antenna System with Enhanced Isolation Using Parasitic Elements and Metamaterial Integration
Next Article in Special Issue
Comparative Analysis of Machine Learning Techniques for Non-Intrusive Load Monitoring
Previous Article in Journal
Voice-Controlled Intelligent Personal Assistant for Call-Center Automation in the Uzbek Language
Previous Article in Special Issue
Fast and Accurate SNN Model Strengthening for Industrial Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Practical and Malicious Multiparty Private Set Intersection for Small Sets

1
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China
2
Research Center for Basic Theories of Intelligent Computing, Research Institute of Basic Theories, Zhejiang Lab, Hangzhou 311500, China
3
Quan Cheng Laboratory, Jinan 250103, China
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(23), 4851; https://doi.org/10.3390/electronics12234851
Submission received: 27 October 2023 / Revised: 22 November 2023 / Accepted: 27 November 2023 / Published: 30 November 2023
(This article belongs to the Special Issue Security and Privacy Evaluation of Machine Learning in Networks)

Abstract

:
Private set intersection (PSI) is a pivotal subject in the realm of privacy computation. Numerous research endeavors have concentrated on situations involving vast and imbalanced sets. Nevertheless, there is a scarcity of existing PSI protocols tailored for small sets. Those that exist are either restricted to interactions between two parties or necessitate resource-intensive homomorphic operations. To bring forth practical multiparty private set intersection solutions for small sets, we present two multiparty PSI protocols founded on the principles of Oblivious Key–Value Stores (OKVSs), polynomials, and gabled cuckoo tables. Our security analysis underscores the resilience of these protocols against malicious models and collision attacks. Through experimental evaluations, we establish that, in comparison to related endeavors, our protocols excel in small-set contexts, particularly in low-bandwidth wide area network (WAN) settings.

1. Introduction

With the successive promulgation of data protection laws like the General Data Protection Regulation (GDPR), privacy preservation has garnered significant attention. Enabling data circulation and computation while safeguarding data privacy poses a challenge. Secure Multiparty Computation (MPC) has emerged as an important solution. MPC encompasses three key functionalities: element-to-element computations, such as basic MPC operations [1]; element-to-set computations, such as Private Information Retrieval (PIR) [2]; and set-to-set computations, like private set intersection [3], which is the focus of our interest. PSI allows two or more parties to compute the intersection of their sets without revealing the privacy of the set differences. PSI has practical applications in various scenarios, including evaluating the scale of incidental intelligence collection [4], safeguarding the plaintext privacy of users in biometric systems [5], privacy protection in vehicular networks [6], and data alignment in federated learning [7].
Data alignment constitutes a crucial preliminary step in federated learning, which is divided into horizontal federated learning and vertical federated learning. Horizontal federated learning, also termed sample-aligned federated learning, requires each party to execute the PSI protocol on the samples, typically characterized by a larger quantity. Vertical federated learning, also known as feature-aligned federated learning, mandates that parties run the PSI protocol on features, usually fewer in number. Presently, most PSI protocols are tailored for large sets, proving effective in horizontal federated learning scenarios. While large-set PSI can achieve feature alignment, it fails to fully consider scenarios with a limited number of features, lacking opportunities for further runtime optimization. Our primary focus lies in developing appropriate PSI protocols tailored for horizontal federated learning contexts. Beyond horizontal federated learning, numerous applications exist for small sets, such as identifying shared friends across hundreds of items or discovering social circles with common interests through a handful of shared tags. In analogous settings, an efficient protocol holds the potential to enhance the end-user experience significantly. The relationship between federated learning and private set intersection is shown in Figure 1.
To the best of our knowledge, the most efficient large-set PSI protocol is based on the Oblivious Transfer (OT) extension primitive [3,8,9]. The OT extension protocol involves a period of public key operations before formal symmetric cryptographic operations commence, and in the context of small-set scenarios, the pre-processing time for public key operations cannot be overlooked. Hence, directly applying large-set PSI protocols to small-set scenarios is not the optimal solution. Rosulek et al. proposed a PSI protocol based on key agreement, which is well-suited for small-set scenarios [10]. However, this protocol is limited to two parties, whereas parties in federated learning are not restricted to pairs. Therefore, its practicality needs further enhancement. Some scholars have introduced PSI protocols for unbalanced sets using fully homomorphic encryption. While theoretically applicable to small-set scenarios, as pointed out by Rosulek et al., these protocols exhibit computational complexities several orders of magnitude higher than key agreement and are also limited to two-party interactions. Although there are existing multiparty PSI protocols, such as OT-extension-based multiparty PSI protocols [11] and homomorphic multiparty PSI protocols [12], they are not optimal for multiparty small-set scenarios. OT-extension-based multiparty PSI protocols still require a warm-up period, and homomorphic multiparty PSI protocols involve expensive homomorphic operations. Incorporating the advantages of the key agreement into multiparty PSI scenarios constitutes a research direction that needs to be pursued.
We integrate the benefits of key agreement into multiparty PSI scenarios while also exploring other optimization strategies. In recent years, research on index structures has become a hot topic. Index structures can be effectively integrated into various fields. For example, incorporating optimized index structures in the field of machine learning significantly improves efficiency [13,14]. Similarly, in studies related to PSI, various index structures such as cuckoo hash and Bloom filters have been introduced [3,11,15]. In order to enhance security against malicious attacks, researchers have proposed more robust index structures such as gabled Bloom filters (GBFs) and gabled cuckoo tables (GCTs) [8,16]. Integrating these advanced index structures into new PSI protocols poses a significant challenge.
In this paper, we propose practical multiparty private set intersection protocols for small sets. In sum, the contributions of our work are shown below:
1.
We innovatively introduce two multiparty private set intersection protocols designed for small sets, leveraging distinct structures of oblivious key–value stores. These protocols employ key agreement and zero-sharing techniques to achieve our objectives.
2.
We analyze both protocols’ security and demonstrate that both of our protocols are correct and secure under the malicious security model against collision attacks.
3.
We implement the two protocols using Rust, and the experimental results demonstrate that, compared with related works, our protocols are more suitable for small-set scenarios, especially in bandwidth-constrained simulations.

1.1. Related Work

The original PSI was based on the idea of DH. Meadows et al. [17] proposed the first DH-based PSI protocol. Nowadays, the work on PSI aims at enhancing the security model, improving its efficiency, adapting to different scenarios, and so on.
Enhanced security model: De Cristofaro et al. [18] constructed a DH-based PSI protocol under the malicious model, while Orrù et al. [19] constructed an OT-extension-based PSI protocol under the malicious model. With the introduction of Oblivious Key–Value Store structures, the performance gap between the two approaches is gradually narrowing.
Improved efficiency: Ishai et al. [20] proposed an OT extension that reduces public key operations; Kolesnikov et al. [3,21] improved the OT-extension, permitting longer elements to be input; Garimella and Pinkas et al. [8,16] proposed an OT-extension-based PSI protocol under the malicious model, while Orrù et al. [19] constructed an OT-extension-based PSI protocol under the malicious model. Garimella et al. [16] proposed OKVSs to improve the computational efficiency. Theoretically, the 3H-GCT provided the best results, while the GBF, although more communicative, still requires some improvement. For example, Ben-Efra et al. [22] used the GBF to solve specific problems, Boyle et al. [23] proposed silent OT, and Rindal et al. [24] introduced silent OT into PSI to reduce the communication cost. Similarly, Rosulek et al. [10] reduced the communication volume by half compared to the original DH-based PSI protocol.
Adaptation scenarios: Kolesnikov et al. [11] proposed a multiparty PSI protocol using IKNP-OT extension; Rosulek et al. [10] proposed a two-party PSI for small sets; Nevo et al. [9] proposed an arbitrary conspiratorial mutltiparty PSI protocol; and Dui et al. [25] proposed a small-entries PSI protocol with a length of approximately 32 bits per element. Related works have been published [26,27] that require the number of intersection elements to exceed a certain value. Additionally, Bay et al. [12] introduced a different threshold, which characterizes an element appearing in at least t parties. Wei et al. [28] also recently proposed a PSI protocol for small sets of multiple participants based on a different zero-sharing structure to ours, which can be further optimized.

1.2. Organization

We introduce the prerequisite knowledge for the subsequent protocols in Section 2. Section 3 provides the definitions of functionality and security. Section 4 and Section 5 discuss two specific protocols. Section 6 presents the security analysis. Section 7 covers the experimental details. Finally, Section 8 discusses the findings, and Section 9 concludes the paper.

2. Preliminaries

2.1. Key Agreement

We focus on a type of one-round key agreement (KA), namely Diffie–Hellman key agreement (DHKA). Given security parameter κ , the size of the output keys space | KA . K | 2 κ . The algorithm corresponding to the KA is as follows:
1.
KA . R is a space of private randomness.
2.
KA . msg ( a ) = a G , where a is a secret key, and G is the base point.
3.
KA . key ( a , y ) = a y , where y is msg (public key).
Elliptic curves have distinctive features that can be easily identified from random strings. In certain scenarios, it is necessary to encode an elliptic curve into a uniformly distributed random string. This functionality was achieved in related work [29]. The algorithm for Elligator-DHKA is as follows:
1.
KA . R e = { r Z q : r G i m ( d e c ) }
2.
KA . msg e ( a ) = e n c ( a G )
3.
KA . key e ( a , y ) = a · d e c ( y )
The function e n c ( · ) encodes points on an elliptic curve into random strings. Approximately 50% of the points can be successfully encoded. r G represents a subset of the elliptic curve denoted as i m ( d e c ) , where d e c ( · ) is the function that decodes random strings into points on the elliptic curve.

2.2. Zero Sharing

We adopt an unconditional zero-sharing structure, which allows for a maximum of m 2 corrupt parties [11]. For any element x, the zero-sharing value of party P i is denoted as S i ( x ) and satisfies i = 0 m 1 S i ( x ) = 0 , where m is the number of parties.

2.3. Oblivious Key–Value Store

The Oblivious Key–Value Store (OKVS) is a structure abstracted by Garimella and Pinkas et al. [8,16]. It allows for the preservation of the key–value mapping while hiding the actual keys. An OKVS consists of two steps: encoding and decoding. We referred to related works [16] and provide a specific step-by-step procedure for a three-hashing gabled cuckoo table (3H-GCT).
The encoding of 3H-GCT is shown in Algorithm 1, and the decoding of 3H-GCT is shown in Algorithm 2. The common parameters required for encoding and decoding are as follows:
The parameter of statistical security λ = 40 ;
Random functions H 0 , H 1 , H 2 : { 0 , 1 } * { 0 , 1 , , 1.3 n 3 1 } ;
Random function R : { 0 , 1 } * { 0 , 1 } λ + log n ;
Function L ( k ) : k 0 × H 0 ( k ) 1000 | | 0 × H 1 ( k ) 1000 | | 0 × H 2 ( k ) 1000
Algorithm 1: Encoding of 3H-GCT
 
Input: set size n, key–value pairs { ( k j , v j ) : 0 j < n }
 
Output: coefficient vector D
1 
Initialize three spaces V 0 , V 1 , V 2 with buckets of fixed length 1.3 n 3
2 
Store { k i , v i } in the bucket at position H 0 ( k i ) of V 0 , the bucket at position H 1 ( k i ) of V 1 , and the bucket at position H 2 ( k i ) of V 2
3 
Continuously search the bucket containing only one element in three spaces, peeling the element of that bucket onto the stack S and removing that element from the other two spaces
4 
Solve the equation L ( k i ) | | R ( k i ) , D = v i for ( k i , v i ) S using Gaussian elimination and obtain the vector D
5 
Pop the elements from the stack S, map them onto the matrix, and modify vector D to satisfy L ( k i ) | | R ( k i ) , D = v i . If there are three (or two) undefined elements in the corresponding position in D, fill two (or one) of the positions randomly and adjust the remaining one to satisfy the equation
6 
return D
Algorithm 2: Decoding of 3H-GCT
 
Input: set size n, coefficient vector D, key k
 
Output: value v
1 
Determine the output lengths of L ( · ) and R ( · ) based on n;
2 
Compute v = L ( k ) | | R ( k ) , D ;
3 
return v
Where, x , y represents the inner product of vector x and vector y.
The 3H-GCT is essentially a hypergraph. Graphs have been extensively studied in various fields, such as graph decomposition [30] and research on symbolic networks [31]. In theory, optimizing graphs or hypergraphs could further enhance our protocol.

2.4. Oblivious Programmable PRF

Pseudorandom function (PRF): Given PRF key k, input x, output pseudorandom value P R F k ( x ) .
Oblivious programmable pseudorandom function (OPPRF): Sender S input P = { ( k j , v j ) : 0 j < n } , and receiver R input queries ( q 0 , q 1 , , q t 1 ) . Run ( k , h i n t ) k e y G e n ( P ) , give ( k , h i n t ) to S , and give ( h i n t , P R F k ( q 0 ) , P R F k ( q 1 ) , ⋯, P R F k ( q t 1 ) ) to R .

3. Security Definitions

Ideal Functionality of MPSI

Definition 1
(MPSI ideal functionality F MPSI ). For i [ 0 , m ) , each party P i holds his own set X i = { x i , j : j [ 0 , n i ) } as the inputs of F MPSI , where n i = | X i | . Then, F MPSI returns the output i = 0 m 1 X i to P m 1 without leaking extra information.
MPSI is based on the malicious security model. In the malicious security model, corrupted parties can deviate from the protocol and try to reveal other parties’ private input.
Let R e a l Π ( i n p u t ( · ) , o u t p u t ( · ) ) be the view of the adversaries in the execution of the real protocol Π with input i n p u t ( · ) and the real output o u t p u t ( · ) of corrupted parties P i C , where C is the set of corrupted parties. Let I d e a l Π ( i n p u t ( · ) , o u t p u t ( · ) ) be the view of the adversaries in the execution of the ideal protocol F with input i n p u t ( · ) and the ideal functionality F controlled by a simulator S i m .
Definition 2
(Malicious security model). Given the MPSI protocol Π M P S I and the corresponding ideal functionality F M P S I , if there exists a probabilistic polynomial time (PPT) simulator S i m using the random input R i of any party and output of F M P S I to generate a simulated view that is computationally indistinguishable from the view of an arbitrary PPT adversary in the real world,
I d e a l F M P S I ( i n p u t ( R i ) , o u t p u t ( F M P S I ( i n p u t ( R i ) ) ) ) c R e a l Π M P S I ( i n p u t ( R i ) , o u t p u t ( Π M P S I ( i n p u t ( R i ) , i n p u t ( X C ) ) ,
where i n p u t ( X C ) is the set of inputs of the parties in C, the protocol Π M P S I is secure in the malicious security model to achieve F M P S I .

4. Poly-DH MPSI Protocol

We were inspired by the works of Rosulek et al. [10] and Kolesnikov et al. [11] to describe an MPSI protocol based on key agreement and zero sharing, and we implemented the OPPRF functionality using KA and zero sharing. We coined the term “Poly-DH MPSI protocol” for this specific protocol, and we offer an encompassing framework diagram in Figure 2.
The protocol employs a star-shaped topology, with party P m 1 designated as the central node. P m 1 and parties P i , ( 0 i n 2 ) execute the OPPRF protocol. P m 1 collects the OPPRF results from the other parties and, ultimately, outputs the intersection. Our OPPRF protocol was based on a one-round key agreement construction. The process whereby P m 1 sends messages to P i , ( 0 i n 2 ) is referred to as PSI Request, and the process whereby P i , ( 0 i n 2 ) sends messages to P m 1 is called PSI Response. Of course, before the steps of request and response, there are the zero-sharing and preprocessing stages. We refer to the preprocessing stage as PSI Preparation.

4.1. Initialization

Determine the m parties P 0 , P 1 , , P m 1 and the set of parties in the PSI protocol X 0 , X 1 , , X m 1 . In the case of unconditional zero sharing, agree on a specific implementation of the PRF. Agree on random oracle H : { 0 , 1 } * F , ideal permutation Π / Π 1 : F F , and the specific KA implementation.
The ideal permutation is the weaker model of the ideal cipher, fixed to one key. It acts like a random oracle, but it is reversible.

4.2. Zero Sharing

We use an example to explain why we chose the zero-sharing technique to extend the two-party protocol of [10] to m parties.
Example 1
(Why did we choose zero sharing?). We represent by [ [ x ] ] the “ciphertext” obtained by a series of DHKA transformations of element x. Assuming Alice has a set X A = { a , d , e , g } , Bob has a set X B = { b , d , f , g } , and Carol has a set X C = { c , e , f , g } , the set relationship is shown in Figure 3.
We can obtain the intersection X A X B = { d , g } through the two-party PSI protocol, because [ [ d ] ] = [ [ d ] ] and [ [ g ] ] = [ [ g ] ] . We can obtain the intersection X A X C = { e , g } through the two-party PSI protocol, because [ [ e ] ] = [ [ e ] ] and [ [ g ] ] = [ [ g ] ] . We can obtain the intersection X B X C = { f , g } through the two-party PSI protocol, because [ [ f ] ] = [ [ f ] ] and [ [ g ] ] = [ [ g ] ] .
Although the intersection X A X B X C can be obtained through ( X A X C ) ( X B X C ) , this can leak information because the multiparty PSI protocol can only expose g to Alice, Bob, and Carol. However, implementing the two-party PSI protocol twice in this way will expose e to Alice and Carol and f to Bob and Carol.
If we apply zero sharing for each element, such as [ [ α a ] ] + [ [ β a ] ] + [ [ γ a ] ] = 0 , then the “ciphertext” of each element of Alice becomes [ [ α x ] ] [ [ x ] ] , the “ciphertext” of each element of Bob becomes [ [ β x ] ] [ [ x ] ] , and the “ciphertext” of each element of Carol becomes [ [ γ x ] ] [ [ x ] ] .
Therefore, ( [ [ α g ] ] [ [ g ] ] ) + ( [ [ β g ] ] [ [ g ] ] ) + ( [ [ γ g ] ] [ [ g ] ] ) = 3 [ [ g ] ] , and any one of Alice, Bob, or Carol has the element g, so 3 [ [ g ] ] can be calculated to determine whether the equation is equal to obtain the intersection. For Carol, the reason why e and f are not leaked is that Carol can only obtain [ [ α e ] ] = ( [ [ α e ] ] [ [ e ] ] ) + [ [ e ] ] 0 and [ [ β f ] ] = ( [ [ β f ] ] [ [ f ] ] ) + [ [ f ] ] 0 and cannot determine whether they are intersecting elements. If Carol is the central node, there is an equivalent judgment formula: ( [ [ α g ] ] [ [ g ] ] ) + [ [ g ] ] + ( [ [ β g ] ] [ [ g ] ] ) + [ [ g ] ] + [ [ γ g ] ] = 0 .
Parties P 0 , P 1 , , P m 1 in the zero-sharing protocol are connected in a fully connected topology.
1.
Party P i randomly generates a set of PRF keys { k i j : i < j < m } and sends PRF key k i j to party P j .
2.
Party P j receives PRF keys k i j from P i ( 0 < i < j ) and obtains a set of PRF keys { k i j : 0 < i < j } .
3.
Party P i obtains the zero-sharing function handle S i ( · ) through Equation (1).
S i ( · ) = ( j = 0 i 1 P R F k j i ( · ) ) ( j = i + 1 m 1 P R F k i j ( · ) ) .
Here, P R F k ( · ) denotes a pseudorandom function.

4.3. PSI Preparation

This stage is the preprocessing stage of PSI and can be performed in advance during the offline phase, such as generating the required DH public–private key pairs ( a , a G ) . Our zero-sharing protocol and PSI protocol are two independent protocols in real application scenarios, and multiple PSI tasks can reuse the same zero-sharing task, which is an advantage over the work of Wei et al. [28].
Execution for P i ( 0 i < m 1 ) :
1.
Party P i randomly generates secret key a i and obtains public key y i through Equation (2). Then, P i sends y i to party P m 1 .
y i = KA . msg ( a i ) .
Execution for P m 1 :
1.
Party P m 1 randomly generates a set of secret keys { a m 1 , j : 0 j < n m 1 ) } in KA . R e and obtains a set of public keys { y m 1 , j : 0 j < n m 1 } . Here, n m 1 is the set size of P m 1 ’s set X m 1 .
2.
Party P m 1 receives a set of public keys { y i : 0 i < m 1 } from P i ( 0 < i < m 1 ) and obtains a set of KA keys { k e y i , j : 0 i < m 1 , 0 j < n m 1 } through Equation (3).
k e y i , j = KA . key ( a m 1 , j , y i ) .

4.4. PSI Request Flow

This subsection mainly describes the relevant steps for P m 1 to send messages to the other parties.
Interpolation for P m 1 :
1.
Party P m 1 inputs set X m 1 = { x m 1 , j : 0 j < n m 1 } and obtains interpolation polynomial P o l m 1 through Equation (4). Then, P m 1 sends P o l m 1 to party P i ( 0 i < m 1 ) . Here, interpol ( k , v ) represents polynomial interpolation over the field F , where ( k , v ) is a key–value pair. H is the random oracle, and Π 1 is the inverse of the ideal permutation.
P o l m 1 = interpol ( { H ( x m 1 , j ) , Π 1 ( y m 1 , j ) : 0 j < n m 1 } ) .
Evaluation for P i ( 0 i < m 1 ) :
1.
Party P i receives interpolation polynomial P o l m 1 from party P m 1 and obtains a set of interpolated values { y m 1 , j i : 0 j < n i } through Equation (5). Here, v = P o l ( k ) represents the evaluation of k to obtain v, Π is the ideal permutation, and n i is the set size for set X i of party P i .
2.
Party P i obtains a set of KA keys { k e y m 1 , j i : 0 j < n i } through Equation (6).
3.
Party P i obtains a set Z i = { z i , j : 0 j < n i } through Equation (7). Here, S i ( x ) represents the zero-sharing shares of element x.
y m 1 , j i = Π ( P o l m 1 ( H ( x i , j ) ) ) .
k e y m 1 , j i = KA . key e ( a i , y m 1 , j i ) .
z i , j = S i ( x i , j ) k e y m 1 , j i .

4.5. PSI Response Flow

This step mainly describes the relevant steps for P i ( 0 i < m 1 ) to send messages to P m 1 .
Interpolation for P i ( 0 i < m 1 ) :
1.
Party P i obtains interpolation polynomial P o l m 1 through Equation (8). Then, P i sends P o l i to party P m 1 .
P o l i = interpol H ( x i , j ) , z i , j : 0 j < n i .
Evaluation for P m 1 :
1.
Party P m 1 receives interpolation polynomial P o l i from party P i ( 0 i < m 1 ) and obtains a set of interpolated values { z i , j m 1 : 0 j < n m 1 } through Equation (9).
2.
Party P m 1 obtains a set { t j : 0 j < n m 1 } through Equation (10).
3.
Party P m 1 outputs set intersection { x m 1 , j : t j = 0 and 0 j < n m 1 ) } .
z i , j m 1 = P o l i ( H ( x m 1 , j ) ) .
t j = S m 1 ( x m 1 , j ) + i = 0 m 2 z i , j m 1 + k e y i , j .
We summarize the Poly-DH MPSI protocol in Figure 4.

5. Cuckoo-DH MPSI Protocol

In Section 4, we implemented a small-set MPSI protocol using polynomial interpolation. In the OKVS structure, polynomial interpolation incurs the lowest communication overhead but has relatively high computational complexity. If we allow a slight increase in the communication overhead to reduce the computational costs, an alternative protocol construction approach is possible. This involves replacing the previous polynomial interpolation component with the structure of the 3H-GCT. The communication overhead of the 3H-GCT is roughly 1.3 times that of polynomial interpolation, but the computational complexity significantly decreases. Therefore, we propose a small-set MPSI protocol based on the 3H-GCT. We coined the term “Cuckoo-DH MPSI protocol” for this specific protocol, and we offer an encompassing framework diagram in Figure 5.
Compared to Poly-DH MPSI, Cuckoo-DH MPSI incurs a slight increase in communication overhead. Therefore, it is suitable for scenarios where the bandwidth is not extremely limited.

5.1. Initialization

Determine the m parties P 0 , P 1 , , P m 1 and the set of parties in the PSI protocol X 0 , X 1 , , X m 1 . In the case of unconditional zero sharing, agree on a specific implementation of the PRF. Agree on random oracle H : { 0 , 1 } * F , H 0 , H 1 , H 2 : { 0 , 1 } * { 0 , 1 , , 1.3 n 3 1 } , and R : { 0 , 1 } * { 0 , 1 } λ + log n ; ideal permutation Π / Π 1 : F F ; and the specific KA implementation.

5.2. Zero Sharing and PSI Preparation

These two steps are consistent with the zero-sharing step and PSI preparation step of the Poly-DH MPSI protocol, and there is no need to modify any relevant parameters.

5.3. PSI Request Flow with 3H-GCT

We abstracted the interpolation and evaluation as encode and decode operations. Therefore, the improved steps are as follows.
Encode for P m 1 :
1.
Party P m 1 inputs set X m 1 = { x m 1 , j : 0 j < n m 1 } and obtains coefficient vector D m 1 through Equation (11). Then, P m 1 sends D m 1 to party P i ( 0 i < m 1 ) . Here, E ( k , v ) represents the OKVS decoding.
D m 1 = E H ( x m 1 , j ) , Π 1 ( y m 1 , j ) : 0 j < n m 1 .
Decode for P i ( 0 i < m 1 ) :
1.
Party P i receives coefficient vector D m 1 from party P m 1 and obtains a set of decoded values { y m 1 , j i : 0 j < n i } through Equation (12). Here, v = D ( k ) represents the decoding of k to obtain v.
2.
Party P i obtains a set of KA keys { k e y m 1 , j i : 0 j < n i } through Equation (6).
3.
Party P i obtains a set Z i = { z i , j : 0 j < n i } through Equation (7).
y m 1 , j i = Π ( D m 1 ( H ( x i , j ) ) ) .

5.4. PSI Response Flow with 3H-GCT

Similarly, in this step, the polynomial is replaced with the 3H-GCT.
Encode for P i ( 0 i < m 1 ) :
1.
Party P i obtains coefficient vector D m 1 through Equation (13). Then, P i sends D i to party P m 1 .
D i = E H ( x i , j ) , z i , j : 0 j < n i .
Decode for P m 1 :
1.
Party P m 1 receives coefficient vector D i from party P i ( 0 i < m 1 ) and obtains a set of decoded values { z i , j m 1 : 0 j < n m 1 } through Equation (14)
2.
Party P m 1 obtains a set { t j : 0 j < n m 1 } through Equation (10)
3.
Party P m 1 outputs set intersection { x m 1 , j : t j = 0 and 0 j < n m 1 ) } .
z i , j m 1 = D i ( H ( x m 1 , j ) ) .
We summarize the Cuckoo-DH MPSI protocol in Figure 6.

6. Security Analysis

6.1. Correctness

Theorem 1. 
The two protocols, Poly-DH MPSI and Cuckoo-DH MPSI, are correct with overwhelming probability.
Proof. 
For each x i , q = x m 1 , j X i X m 1 , P i can compute P o l m 1 ( x i , q ) or D m 1 ( x i , q ) and obtain y m 1 , q i = y m 1 , j . Consequently, k e y m 1 , q i = a i y m 1 , q i = a i y m 1 , j and z i , q = S i ( x i , q ) k e y m 1 , q i = S i ( x m 1 , j ) a i y m 1 , j . P m 1 can compute P o l i ( x m 1 , j ) or D i ( x m 1 , j ) and obtain z i , j m 1 = z i , q = S i ( x m 1 , j ) a i y m 1 , j . P m 1 can also compute k e y i , j = a m 1 , j y i = a i y m 1 , j ; therefore, z i , j m 1 + k e y i , j = S i ( x m 1 , j ) .
For each x i , q X m 1 , P i can compute P o l m 1 ( x i , q ) or D m 1 ( x i , q ) and obtain y m 1 , q i = . Consequently, k e y m 1 , q i = a i y m 1 , q i = a i and z i , q = S i ( x i , q ) k e y m 1 , q i = S i ( x m 1 , j ) a i . In this context, is indistinguishable from random values.
For each x m 1 , j X i , P m 1 can compute P o l i ( x m 1 , j ) or D i ( x m 1 , j ) and obtain z i , j m 1 = . P m 1 can also compute k e y i , j = a m 1 , j y i = a i y m 1 , j ; therefore, z i , j m 1 + k e y i , j = + a m 1 , j y i = . In this context, and are indistinguishable from random values.
If x m 1 , j i = 0 m 1 X i , then ( i = 0 m 2 z i , j m 1 + k e y i , j ) + S m 1 ( x m 1 , j ) = ( i = 0 m 2 S i ( x m 1 , j ) ) + S m 1 ( x m 1 , j ) = i = 0 m 1 ( x m 1 , j ) = 0 . If x m 1 , j i = 0 m 1 X i , meaning that at least one random value is added, the final sum will still result in a random value.
Therefore, Poly-DH MPSI and Cuckoo-DH MPSI are correct with overwhelming probability. □

6.2. Malicious Secure MPSI

Theorem 2.
Poly-DH MPSI and Cuckoo-DH MPSI are secure against up to m 2 collision attacks in a malicious model. If KA is an Elligator-DHKA, H is a secure hash function and Π , Π 1 are a pair of ideal permutations.
Proof. 
We notate the set of corrupted and colluding parties as C. We considered two collusion attacks, P m 1 C and P m 1 C , to perform the simulation experiment between the ideal world and the real world.
With P m 1 C in Poly-DH MPSI, we used a series of hybrid experiments to prove that the real-world protocol execution was indistinguishable from the ideal-world simulation.
H y b r i d 0 . The experiment comprises a realistic protocol execution.
H y b r i d 1 . In the zero-sharing step, S i m plays the role of the honest parties P i C to send k i j to P j C and generates S i ( · ) with k i j from P j C . In the other part of the protocol, S i m executes as H y b r i d 0 . Obviously, H y b r i d 1 is computationally indistinguishable from H y b r i d 0 since the zero sharing is unconditionally secure against up to m 2 collusion attacks.
H y b r i d 2 . In the PSI preparation step, S i m plays the role of the honest party P i C , chooses a i KA . R to compute y i = KA . msg e ( a i ) , and sends y i to the adversary P m 1 C . In the other part of the protocol, S i m executes as H y b r i d 1 . Obviously, H y b r i d 2 is computationally indistinguishable from H y b r i d 1 since y i is uniformly randomly chosen for a i KA . R .
H y b r i d 3 . In the PSI request flow, S i m runs the random oracle H, records every query H ( x , k ) made by C, and stores the input–output tuple ( x , k , H ( x , k ) ) in the list L 1 . S i m also runs the ideal permutation Π 1 , records Π 1 ( KA . msg e ( b ) ) , and stores KA . msg e ( b ) in the list L 2 . Then, the adversary P m 1 in C returns the polynomial p o l y m 1 to S i m . In the other part of the protocol, S i m executes as H y b r i d 2 . Obviously, H y b r i d 3 is computationally indistinguishable from H y b r i d 2 since H is a cryptographically secure hash modeled as a random oracle, and Π , Π 1 are a pair of ideal permutations.
H y b r i d 4 . The experiment is a complete ideal-world experiment similar to H y b r i d 3 except for the PSI response flow. In the PSI response flow, for the honest party P i C , S i m executes polynomial P o l P i C as the protocol. Using the records in L 1 and L 2 , S i m can compute the intersection of the adversaries’ sets
S = { x | x L 1 a n d Π ( H ( x ) ) L 2 } .
S i m uses S as the inputs of parties in C and sends them and X i simulated by S i m as the honest parties P i C to the MPSI ideal functionality. S i m obtains the intersection I = P i C X i S from the MPSI ideal functionality and computes for each honest party P i
z i , j = S i ( x i , j ) KA . key e ( a i , Π ( P o l m 1 ( H ( x i , j ) ) ) )
where x i , j X i and x i , j I . Thus, S i m can construct the polynomial P o l i and send it to P m 1 . Obviously, H y b r i d 4 is computationally indistinguishable from H y b r i d 3 since H is a cryptographically secure hash modeled as a random oracle, and the KA is an Elligator-DHKA.
Therefore, H y b r i d 4 is computationally indistinguishable from H y b r i d 0 , that is, the real-world protocol execution is computationally indistinguishable from the ideal-world simulation. In this way, we prove that Poly-DH MPSI is secure against up to m 2 collision attacks in a malicious model if KA is an Elligator-DHKA, H is a secure hash function, and Π , Π 1 are a pair of ideal permutations when P m 1 C .
With P m 1 C in Poly-DH MPSI, we also used a series of hybrid experiments to prove that the real-world protocol execution is indistinguishable from the ideal-world simulation.
H y b r i d 1 , 0 . The experiment comprises a realistic protocol execution.
H y b r i d 1 , 1 . In the zero-sharing step, S i m plays the role of the honest party P i C to send k i j to P j C and generates S i ( · ) with k i j from P j C . In the other part of the protocol, S i m executes as H y b r i d 1 , 0 . Obviously, H y b r i d 1 , 1 is computationally indistinguishable from H y b r i d 1 , 0 , since the zero sharing is unconditionally secure against up to m 2 collusion attacks.
H y b r i d 1 , 2 . In the PSI preparation step, S i m plays the role of the honest party P i C , chooses a i KA . R to compute y i = KA . msg e ( a i ) , and records y i as the role of the honest party P m 1 . In the other part of the protocol, S i m executes as H y b r i d 1 , 1 . Obviously, H y b r i d 1 , 2 is computationally indistinguishable from H y b r i d 1 , 1 , since y i is uniformly randomly chosen for a i KA . R .
H y b r i d 1 , 3 . In the PSI request flow, S i m runs the random oracle H, records every query H ( x , k ) made by C, and stores the input–output tuple ( x , k , H ( x , k ) ) in the list L 1 . S i m also runs the ideal permutation Π 1 , records Π 1 ( KA . msg e ( b ) ) , and stores KA . msg e ( b ) in the list L 2 . Then, S i m records the polynomial p o l y m 1 from the role of the honest party P m 1 and sends it to P i C . In the other part of the protocol, S i m executes as H y b r i d 1 , 2 . Obviously, H y b r i d 1 , 3 is computationally indistinguishable from H y b r i d 1 , 2 , since H is a cryptographically secure hash modeled as a random oracle, and Π , Π 1 are a pair of ideal permutations.
H y b r i d 1 , 4 . The experiment is a complete ideal-world experiment similar to H y b r i d 1 , 3 except for the PSI response flow. In the PSI response flow, for the honest party P i C , S i m executes polynomial P o l P i C as the protocol. Using the records in L 1 and L 2 , S i m can compute the intersection of the adversaries’ sets
S = { x | x L 1 a n d Π ( H ( x ) ) L 2 } .
S i m uses S as the inputs of the parties in C and sends them and X i simulated by S i m as the honest parties P i C to the MPSI ideal functionality. S i m obtains the intersection I = P i C X i S from the MPSI ideal functionality, which is the same as the intersection computed based on the correctness proof in Theorem 1. Obviously, H y b r i d 1 , 4 is computationally indistinguishable from H y b r i d 1 , 3 , since H is a cryptographically secure hash modeled as a random oracle and the KA.
Therefore, H y b r i d 1 , 4 is computationally indistinguishable from H y b r i d 1 , 0 , that is, the real-world protocol execution is computationally indistinguishable from the ideal-world simulation. In this way, we prove that Poly-DH MPSI is secure against up to m 2 collision attacks in a malicious model if KA is an Elligator-DHKA, H is a secure hash function, and Π , Π 1 are a pair of ideal permutations when P m 1 C .
Similar to the security proof of Poly-DH MPSI, we can also use a simulation model based on the security analysis of the 3H-GCT [16] and prove that Cuckoo-DH MPSI is also secure against up to m 2 collision attacks in a malicious model if KA is an Elligator-DHKA, H is a secure hash function, and Π , Π 1 are a pair of ideal permutations. □

7. Experimental Results

7.1. Implementation

We will describe how the various components of the protocol were instantiated in this section. Our entire protocol revolved around a series of 256-bit operations.
Zero Sharing: The key in zero sharing was a uniformly distributed 256-bit random value, called the rand library of Rust; the PRF protocol took the keyed hash algorithm in BLAKE3.
Key Agreement: For one-round KA, we used Curve25519 [32] based on the Montgomery curve B y 2 = x 3 + A x 2 + x , where B takes 1, A takes 486,662, and x , y F 2 255 19 . We followed the implementation in the work of Bernstein et al. [29]. The e n c ( · ) and d e c ( · ) functions satisfied the following definitions:
1.
e n c ( x ) = x 2 ( x + 486662 ) if x q 1 2 x + 486662 2 x otherwise
2.
Definition d e c ( y ) = ϵ d 243331 ( 1 ϵ ) where ϵ = ( d 3 + 486662 d 2 + d ) q 1 2 , q = 2 255 19 , d = 486662 1 + 2 y 2
Random Oracle: We used the hash method of BLAKE3. BLAKE3 is one of the best hashing algorithms available and has more performance advantages than the SHA2 algorithm used by Rosulek et al. [10].
Ideal Permutation: The ideal permutation [10] is a reversible permutation function that was simulated using a fixed-key Rijndael-256. AES was derived from Rijndael, but the block size was fixed at 128 bits, while the ideal permutation here required a 256-bit block size. We followed the definition of Daemen et al. [33] for the implementation.
Polynomial Interpolation: We used the Lagrangian interpolation method, with the number in G F ( 2 256 ) . Rust currently does not possess a cryptographic library similar to the C++ NTL and GMP library, and there is a lack of finite-field interpolation tools due to time constraints and excessive engineering.
Gabled Cuckoo Graph: We used the 3H-GCT structure shown in the work of Garimella et al. [16], where three hashes acted on each of the three regions. Our implementation referred to the C++ code of the 2H-GCT shown in the work of Pinkas et al. [8] and widened it to three hashes.

7.2. Experiments and Evaluation

We implemented our protocol in Rust and ran it on a 12th Gen Intel Core i7-12700H with 32 GB RAM and Ubuntu 22.04. We conducted experiments on the two protocols proposed in this article and compared them with the work of Kolesnikov et al. [11].
The MultipartyPSI (https://github.com/osu-crypto/MultipartyPSI (accessed on 6 April 2023)) library is also one of the few open-source libraries for multiparty PSI available at present. It provides PSI constructions based on the Poly, Table, and Bloom filter (BF) types, with the Table type utilizing cuckoo hashing for optimization. Our protocol was constructed based on the Poly and 3H-GCT designs. The Poly and 3H-GCT structures we used were advanced versions compared to the Poly and Table structures used in the work of Kolesnikov et al. [11], designed to be suitable for malicious models. However, we did not utilize the advanced version of the BF, i.e., the GBF, in constructing the PSI protocol due to its higher communication overhead.
We set the parameter of computational security κ = 128 and the parameter of statistical security λ = 40 for conducting experiments under both local area network (LAN) and wide area network (WAN) conditions.
Small sets with LAN setting: We set the scenario as PSI for small sets, taking the selected set size n as 2 4 , 2 5 , , 2 10 for the experiment. In the LAN setting, the main influencing factor was the computational cost. The work of Kolesnikov et al. [11] was based on the OT protocol, which had a certain startup cost, and we found that our protocol constructed based on DHKA had an advantage when the set size was smaller than 2 7 . Ref. [11] presented a semi-honest model protocol, which was more time-consuming to transform into a malicious model protocol; as a reference, the malicious OT-PSI protocol with PaXoS [8] consumed about 1.4 times more time than the semi-honest OT-PSI protocol [3]. Due to the limited Rust ecology at present, there is no efficient computational library similar to C++ NTL/GMP; thus, we manually wrote some non-deeply optimized code for temporary replacement. The two-party PSI protocol [10] demonstrated that both the DHKA-based protocols had advantages when the set was smaller than 2 10 , and thus we believe that our protocol has much room for engineering improvement. To explore the impact of the number of parties on the protocol, we selected 5 , 10 , 20 for our experiments. Since the protocol adopted a star topology, theoretically the time elapsed for the task and the number of parties would have a linear growth relationship (similar to [11]). The experimental results are shown in Table 1.
Small sets with WAN setting: OT-based PSI protocols perform best without bandwidth limitations, but their relatively high communication costs, as highlighted by Google, have a more substantial impact in real-world deployments compared to computational costs [34]. Therefore, we limited the bandwidth and conducted simulation experiments for the cases of 20 Mbps, 10 Mbps, 5 Mbps, and 1 Mbps. From the experimental results, it was clear that our protocol based on DHKA had lower time consumption than the protocol based on OT in the case of bandwidth limitation. Under the settings of 5 Mbps 20 Mbps , there was an efficiency improvement of about 3 × 7 × , and under the worse network conditions of 1 Mbps, there was an efficiency improvement of more than 10 × . Our proposed Cuckoo-DH MPSI protocol was able to balance the computation cost and communication cost and had the best performance in bandwidth-constrained situations. The experimental results are shown in Table 2.
The execution time of PSI is influenced by both computational and communication costs. In the LAN experiments, where messages were transmitted locally and communication time could be neglected, the execution time primarily hinged on the computational complexity of the protocol. As seen in Table 1, both the approach in reference [11] and our method based on polynomial interpolation demonstrated a notable increase as the set elements grew. This was primarily due to the time-consuming nature of polynomial interpolation, particularly evident for non-prime modulus G F ( 2 256 ) . The polynomial interpolation method used in reference [11] was derived from the C++ NTL library with a time complexity of O ( n l o g 2 n ) , while our implementation using Rust code, currently not fully optimized, operated at a time complexity of O ( n 2 ) . However, this does not imply that protocols based on polynomial interpolation lack value, a point that will be explained in subsequent experiments on a WAN. Table 1 demonstrates that in scenarios involving small sets, our DH-based protocol exhibited higher efficiency compared to the OT-based protocol in reference [11]. This discrepancy was attributed to not only the time-consuming nature of polynomial interpolation but also the significant time overhead from public key operations, as shown in Table 3.
The DH-based PSI protocol conducted approximately m · n m 1 public key operations, whereas the OT-based PSI protocol performed about 3.5 ( m + 1 ) κ public key operations. Thus, when the set size was n m 1 < 3.5 κ ( 1 + 1 m ) , the DH-based PSI protocol had fewer public key operations. For example, when m = 2 and κ = 128 , n m 1 < 672 < 2 9.4 ; similarly, when m = + and κ = 128 , n m 1 < 448 < 2 8.81 . On the other hand, reference [11] explored the use of Bloom filters and cuckoo hash tables as alternative methods to polynomial interpolation. Bloom filters trade space for time, whereas cuckoo hash tables strike a balance between the two. However, these methods are suitable for semi-honest models and are not effectively applicable to malicious models. In contrast, our protocol introduced the 3H-GCT, an improved version of the cuckoo hash table suitable for malicious models but at the expense of increased time consumption.
Reference [10] employed polynomial interpolation to construct two-party PSI protocols, and experiments validated the significant advantage of this protocol in bandwidth-constrained scenarios, as evident from Table 2. Both the protocol based on polynomial interpolation in reference [11] and our own polynomial interpolation-based protocol performed well. The reason behind this was that polynomial interpolation incurs the lowest communication overhead. Of course, the DH-based PSI protocol also had lower communication overhead compared to the OT-based PSI protocol, resulting in strong performance for both of our proposed protocols in Table 2. Since reference [11] open-sourced its code and implemented various protocols based on polynomial interpolation, Bloom filters, cuckoo graphs, and other designs, it provided a valuable comparison for our two protocols based on polynomial interpolation and gabled cuckoo graphs. We conducted theoretical analyses on communication overhead, as depicted in Table 4 and Table 5, encompassing several other protocols. For instance, reference [24] introduced the VOLE-PSI protocol based on silent OT, along with the design of OPPRF, enabling the construction of a multiparty PSI protocol based on OPPRF. Although the reference did not elaborate further on multiparty PSI or provide code implementation for MPSI, we conducted theoretical analyses for its multiparty scenario. Table 4 and Table 5 reveal that our protocols theoretically exhibited lower communication overhead. Apart from these protocols amenable to theoretical analysis, we also conducted a comparison of experimental protocol content, as illustrated in Figure 7.

8. Discussion

In Section 6, we proved the security of our proposed protocol under a malicious model using unconditional zero sharing, withstanding attacks from m 2 colluding parties for m parties. To defend against attacks from m 1 colluding parties, conditional zero sharing is required, as demonstrated in [11]. Due to the lower efficiency of conditional zero sharing, we did not provide a detailed explanation of this type of protocol in the paper. However, conditional zero sharing can also be applied within our protocol, and its principles are similar to those outlined in [11].
In Section 7, we manually implemented interpolation with a time complexity of O ( n 2 ) and limited optimization, which was not as good as the work of Moenck et al. [35], whose method had a time complexity of O ( n l o g 2 n ) ; we hope to improve upon this in the future. If the structure of 2H-GCT is a graph, then the structure of 3H-GCT is a hypergraph. Solving the loop problem in a hypergraph is more complicated, and we adopted a polling-like approach for edge peeling, which may be combined with deeper knowledge of the hypergraph here for further optimization.
In addition, improving the encoding and decoding efficiency of the OKVS and reducing the space occupied by the coefficient vector after OKVS encoding are all ways to improve the efficiency of the PSI protocol. Our KA was specifically implemented through Curve25519, but KA is not limited to Curve25519. The new elliptic curve cryptography, and even the new cryptographic structure, may improve the efficiency of the PSI protocol. In addition, utilizing hardware acceleration protocols such as GPU and FPGA without introducing new cryptographic theories is also a future research focus.
Our protocol can be effectively utilized for feature alignment in federated learning. However, our protocol is not limited to this. For example, in [10], a scenario was discussed wherein two parties aim to schedule a meeting that must occur during a time slot when both parties are available. This requires computing the intersection of available time slots without disclosing any additional schedule information beyond the intersection. We extended this scenario to encompass multiple parties agreeing on a meeting, a common situation, and our protocol could fulfill this requirement. Identifying common friends among multiple users and privacy-preserving intelligent recommendations are potential subsequent applications.

9. Conclusions

This paper introduced two multiparty private set intersection protocols for small sets: Poly-DH MPSI and Cuckoo MPSI. These protocols were constructed using key agreement, zero sharing, and different OKVS structures. In small-set scenarios, both Poly-DH MPSI and Cuckoo MPSI showed higher efficiency than previous approaches, even in LAN settings. Particularly in scenarios with bandwidth constraints, our proposed protocol demonstrated distinct advantages. The protocol based on the gabled cuckoo table incurred lower computational costs but slightly higher communication costs compared to the polynomial-based protocol.

Author Contributions

Conceptualization, J.Z.; methodology, Z.L. (Zhusen Liu) and J.Z.; software, J.Z.; formal analysis, Z.L. (Zhusen Liu) and J.Z.; writing—original draft preparation, J.Z.; writing—review and editing, Z.L. (Zhusen Liu) and L.W.; supervision, L.W. and C.Z.; project administration, Z.L. (Zhe Liu) and L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Key R&D Program of China (2021YFB3100700); the National Natural Science Foundation of China (62076125, 62032025, U20B2049, U20B2050, U21A20467, 62272228, and U22B2029); the Shenzhen Science and Technology Program (JCYJ20210324134810028, JCYJ20210324134408023); the Key R&D Program of Guangdong Province (2020B0101090002); the Natural Science Foundation of Jiangsu Province (BK20200418); the Shenzhen Virtual University Park Support Scheme (YFJGJS1.0); and the China Postdoctoral Science Foundation (2023M733265).

Data Availability Statement

All data underlying the results are available as part of the article and no additional source data are required.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Keller, M.; Orsini, E.; Scholl, P. MASCOT: Faster Malicious Arithmetic Secure Computation with Oblivious Transfer. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 830–842. [Google Scholar]
  2. Angel, S.; Chen, H.; Laine, K.; Setty, S. PIR with Compressed Queries and Amortized Query Processing. In Proceedings of the 2018 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 20–24 May 2018; pp. 962–979. [Google Scholar]
  3. Kolesnikov, V.; Kumaresan, R.; Rosulek, M.; Trieu, N. Efficient Batched Oblivious PRF with Applications to Private Set Intersection. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 818–829. [Google Scholar]
  4. Kulshrestha, A.; Mayer, J. Estimating Incidental Collection in Foreign Intelligence Surveillance: Large-Scale Multiparty Private Set Intersection with Union and Sum. In Proceedings of the 31st USENIX Security Symposium (USENIX Security 22), Boston, MA, USA, 10–12 August 2022; pp. 1705–1722. [Google Scholar]
  5. Uzun, E.; Chung, S.P.; Kolesnikov, V.; Boldyreva, A.; Lee, W. Fuzzy Labeled Private Set Intersection with Applications to Private Real-Time Biometric Search. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), USENIX Association, Virtually, 11–13 August 2021; pp. 911–928. [Google Scholar]
  6. Zhou, Q.; Zeng, Z.; Wang, K.; Chen, M.; Zheng, Y. Privacy Protection Scheme for the Internet of Vehicles Based on Collaborative Services. IEEE Internet Things J. 2023, 10, 13342–13353. [Google Scholar] [CrossRef]
  7. Wu, Y.; Cai, S.; Xiao, X.; Chen, G.; Ooi, B.C. Privacy preserving vertical federated learning for tree-based models. arXiv 2020, arXiv:2008.06170. [Google Scholar] [CrossRef]
  8. Pinkas, B.; Rosulek, M.; Trieu, N.; Yanai, A. PSI from PaXoS: Fast, Malicious Private Set Intersection. In Proceedings of the Advances in Cryptology—EUROCRYPT 2020, Zagreb, Croatia, 10–14 May 2020; Canteaut, A., Ishai, Y., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 739–767. [Google Scholar]
  9. Nevo, O.; Trieu, N.; Yanai, A. Simple, Fast Malicious Multiparty Private Set Intersection. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, 15–19 November 2021; pp. 1151–1165. [Google Scholar]
  10. Rosulek, M.; Trieu, N. Compact and Malicious Private Set Intersection for Small Sets. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, Virtual Event, Republic of Korea, 15–19 November 2021; pp. 1166–1181. [Google Scholar]
  11. Kolesnikov, V.; Matania, N.; Pinkas, B.; Rosulek, M.; Trieu, N. Practical Multi-Party Private Set Intersection from Symmetric-Key Techniques. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1257–1272. [Google Scholar]
  12. Bay, A.; Erkin, Z.; Hoepman, J.H.; Samardjiska, S.; Vos, J. Practical Multi-Party Private Set Intersection Protocols. IEEE Trans. Inf. Forensics Secur. 2022, 17, 1–15. [Google Scholar] [CrossRef]
  13. Liu, B.; Yuan, L.; Lin, X.; Qin, L.; Zhang, W.; Zhou, J. Efficient (α, β)-core computation: An index-based approach. In Proceedings of the World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 1130–1141. [Google Scholar]
  14. Yuan, L.; Qin, L.; Zhang, W.; Chang, L.; Yang, J. Index-based densest clique percolation community search in networks. IEEE Trans. Knowl. Data Eng. 2017, 30, 922–935. [Google Scholar] [CrossRef]
  15. Chen, H.; Laine, K.; Rindal, P. Fast Private Set Intersection from Homomorphic Encryption. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1243–1255. [Google Scholar]
  16. Garimella, G.; Pinkas, B.; Rosulek, M.; Trieu, N.; Yanai, A. Oblivious Key-Value Stores and Amplification for Private Set Intersection. In Proceedings of the Advances in Cryptology—CRYPTO 2021, Virtual Event, 16–20 August 2021; Malkin, T., Peikert, C., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 395–425. [Google Scholar]
  17. Meadows, C. A More Efficient Cryptographic Matchmaking Protocol for Use in the Absence of a Continuously Available Third Party. In Proceedings of the 1986 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 7–9 April 1986; p. 134. [Google Scholar]
  18. De Cristofaro, E.; Kim, J.; Tsudik, G. Linear-Complexity Private Set Intersection Protocols Secure in Malicious Model. In Proceedings of the Advances in Cryptology—ASIACRYPT 2010, Singapore, 5–9 December 2010; Abe, M., Ed.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 213–231. [Google Scholar]
  19. Orrù, M.; Orsini, E.; Scholl, P. Actively Secure 1-out-of-N OT Extension with Application to Private Set Intersection. In Proceedings of the Topics in Cryptology—CT-RSA 2017, San Francisco, CA, USA, 14–17 February 2017; Handschuh, H., Ed.; Springer: Cham, Switzerland, 2017; pp. 381–396. [Google Scholar]
  20. Ishai, Y.; Kilian, J.; Nissim, K.; Petrank, E. Extending Oblivious Transfers Efficiently. In Proceedings of the Advances in Cryptology—CRYPTO 2003, Santa Barbara, CA, USA, 17–21 August 2003; Boneh, D., Ed.; Springer: Berlin/Heidelberg, Germany, 2003; pp. 145–161. [Google Scholar]
  21. Kolesnikov, V.; Kumaresan, R. Improved OT Extension for Transferring Short Secrets. In Proceedings of the Advances in Cryptology—CRYPTO 2013, Santa Barbara, CA, USA, 18–22 August 2013; Canetti, R., Garay, J.A., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 54–70. [Google Scholar]
  22. Ben-Efraim, A.; Nissenbaum, O.; Omri, E.; Paskin-Cherniavsky, A. PSImple: Practical Multiparty Maliciously-Secure Private Set Intersection. In Proceedings of the 2022 ACM on Asia Conference on Computer and Communications Security, Nagasaki, Japan, 30 May–3 June 2022; pp. 1098–1112. [Google Scholar]
  23. Boyle, E.; Couteau, G.; Gilboa, N.; Ishai, Y.; Kohl, L.; Scholl, P. Efficient Pseudorandom Correlation Generators: Silent OT Extension and More. In Proceedings of the Advances in Cryptology—CRYPTO 2019, Santa Barbara, CA, USA, 18–22 August 2019; Boldyreva, A., Micciancio, D., Eds.; Springer: Cham, Switzerland, 2019; pp. 489–518. [Google Scholar]
  24. Rindal, P.; Schoppmann, P. VOLE-PSI: Fast OPRF and Circuit-PSI from Vector-OLE. In Proceedings of the Advances in Cryptology—EUROCRYPT 2021, Zagreb, Croatia, 17–21 October 2021; Canteaut, A., Standaert, F.X., Eds.; Springer: Cham, Switzerland, 2021; pp. 901–930. [Google Scholar]
  25. Bui, D.; Couteau, G. Improved Private Set Intersection for Sets with Small Entries. In Proceedings of the Public-Key Cryptography—PKC 2023, Atlanta, GA, USA, 7–10 May 2023; Boldyreva, A., Kolesnikov, V., Eds.; Springer: Cham, Switzerland, 2023; pp. 190–220. [Google Scholar]
  26. Branco, P.; Döttling, N.; Pu, S. Multiparty Cardinality Testing for Threshold Private Intersection. In Proceedings of the Public-Key Cryptography—PKC 2021, Virtual Event, 10–13 May 2021; Garay, J.A., Ed.; Springer: Cham, Switzerland, 2021; pp. 32–60. [Google Scholar]
  27. Badrinarayanan, S.; Miao, P.; Raghuraman, S.; Rindal, P. Multi-party Threshold Private Set Intersection with Sublinear Communication. In Proceedings of the Public-Key Cryptography—PKC 2021, Virtual Event, 10–13 May 2021; Garay, J.A., Ed.; Springer: Cham, Switzerland, 2021; pp. 349–379. [Google Scholar]
  28. Wei, L.; Liu, J.; Zhang, L.; Wang, Q.; Zhang, W.; Qian, X. Efficient multi-party private set intersection protocols for large participants and small sets. Comput. Stand. Interfaces 2024, 87, 103764. [Google Scholar] [CrossRef]
  29. Bernstein, D.J.; Hamburg, M.; Krasnova, A.; Lange, T. Elligator: Elliptic-Curve Points Indistinguishable from Uniform Random Strings. In Proceedings of the 2013 ACM SIGSAC Conference on Computer & Communications Security, Berlin, Germany, 4–8 November 2013; pp. 967–980. [Google Scholar]
  30. Chen, Z.; Yuan, L.; Han, L.; Qian, Z. Higher-Order Truss Decomposition in Graphs. IEEE Trans. Knowl. Data Eng. 2023, 35, 3966–3978. [Google Scholar] [CrossRef]
  31. Chen, Z.; Yuan, L.; Lin, X.; Qin, L.; Zhang, W. Balanced Clique Computation in Signed Networks: Concepts and Algorithms. IEEE Trans. Knowl. Data Eng. 2023, 35, 11079–11092. [Google Scholar] [CrossRef]
  32. Bernstein, D.J. Curve25519: New Diffie-Hellman Speed Records. In Proceedings of the Public Key Cryptography—PKC 2006, New York, NY, USA, 24–26 April 2006; Yung, M., Dodis, Y., Kiayias, A., Malkin, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 207–228. [Google Scholar]
  33. Daemen, J.; Rijmen, V. The Design of Rijndael; Springer: Berlin/Heidelberg, Germany, 2002; Volume 2. [Google Scholar]
  34. Ion, M.; Kreuter, B.; Nergiz, A.E.; Patel, S.; Saxena, S.; Seth, K.; Raykova, M.; Shanahan, D.; Yung, M. On Deploying Secure Computing: Private Intersection-Sum-with-Cardinality. In Proceedings of the 2020 IEEE European Symposium on Security and Privacy (EuroS&P), Genoa, Italy, 7–11 September 2020; pp. 370–389. [Google Scholar]
  35. Moenck, R.; Borodin, A. Fast modular transforms via division. In Proceedings of the 13th Annual Symposium on Switching and Automata Theory (Swat 1972); IEEE Computer Society: Washington, DC, USA, 1972; pp. 90–96. [Google Scholar]
Figure 1. Federated learning and private set intersection.
Figure 1. Federated learning and private set intersection.
Electronics 12 04851 g001
Figure 2. Poly-DH MPSI framework diagram.
Figure 2. Poly-DH MPSI framework diagram.
Electronics 12 04851 g002
Figure 3. Subset division under three-party PSI.
Figure 3. Subset division under three-party PSI.
Electronics 12 04851 g003
Figure 4. Poly-DH MPSI protocol.
Figure 4. Poly-DH MPSI protocol.
Electronics 12 04851 g004
Figure 5. Cuckoo-DH MPSI framework diagram.
Figure 5. Cuckoo-DH MPSI framework diagram.
Electronics 12 04851 g005
Figure 6. Cuckoo-DH MPSI Protocol.
Figure 6. Cuckoo-DH MPSI Protocol.
Electronics 12 04851 g006
Figure 7. Running time (ms) of MPSI protocols. Our Poly-MPSI protocol did not use the C++ NTL API with a time complexity of O ( n log 2 n ) for the interpolation but rather used the interpolation function manually written by Rust. Due to the large engineering workload, the time complexity was O ( n 2 ) , and the experimental results were not the theoretical optimal values, so they are not compared in the figure.
Figure 7. Running time (ms) of MPSI protocols. Our Poly-MPSI protocol did not use the C++ NTL API with a time complexity of O ( n log 2 n ) for the interpolation but rather used the interpolation function manually written by Rust. Due to the large engineering workload, the time complexity was O ( n 2 ) , and the experimental results were not the theoretical optimal values, so they are not compared in the figure.
Electronics 12 04851 g007
Table 1. Running time (ms) of MPSI protocols (LAN setting) for m parties on sets with size n. “SH” and “M” refer to semi-honest and malicious protocols, respectively.
Table 1. Running time (ms) of MPSI protocols (LAN setting) for m parties on sets with size n. “SH” and “M” refer to semi-honest and malicious protocols, respectively.
mProtocolSec.Running Time (ms)
n = 2 4 n = 2 5 n = 2 6 n = 2 7 n = 2 8 n = 2 9 n = 2 10
5[11] (Poly)SH5562791454214041310
[11] (Table)SH57565761595860
[11] (BF)SH50545560586376
Ours (Poly)M234610820354118606794
Ours (3H-GCT)M18213959132194382
10[11] (Poly)SH881021182025375921861
[11] (Table)SH779490949299117
[11] (BF)SH8795859295104122
Ours (Poly)M31701533651137375113,900
Ours (3H-GCT)M24415699199350698
20[11] (Poly)SH19420726137590010072948
[11] (Table)SH176174187189186199215
[11] (BF)SH191198203207205213272
Ours (Poly)M501172856902122737627,549
Ours (3H-GCT)M38661091943346221286
Table 2. Running time (ms) of MPSI protocols (WAN setting) for 5 parties on sets with size n. “SH” and “M” refer to semi-honest and malicious protocols, respectively.
Table 2. Running time (ms) of MPSI protocols (WAN setting) for 5 parties on sets with size n. “SH” and “M” refer to semi-honest and malicious protocols, respectively.
ProtocolSec.Running Time (ms)
n = 2 4 n = 2 5 n = 2 6 n = 2 7 n = 2 8 n = 2 9 n = 2 10
20 Mbps[11] (Poly)SH76851081894745291524
[11] (Table)SH8499120171235461873
[11] (BF)SH10617128050795421014111
Ours (Poly)M24519722957419056855
Ours (3H-GCT)M19264489158269537
10 Mbps[11] (Poly)SH1061341512565546651811
[11] (Table)SH1121441952984308861728
[11] (BF)SH169290522997191142568176
Ours (Poly)M26489923062019066961
Ours (3H-GCT)M284575109183326700
5 Mbps[11] (Poly)SH1721932263517039122229
[11] (Table)SH19725535154780618213611
[11] (BF)SH296544110420623931843316,729
Ours (Poly)M427810524063820157043
Ours (3H-GCT)M49611031932704901173
1 Mbps[11] (Poly)SH1926201123242701342451428347
[11] (Table)SH1712207626253969531610,98419,167
[11] (BF)SH26643848780112,77321,37744,31284,851
Ours (Poly)M10915224744899426337974
Ours (3H-GCT)M217272437672120022805194
Table 3. Theoretical computation costs of MPSI protocols. “SH” and “M” refer to semi-honest and malicious protocols, respectively. κ is the parameter of computational security. λ is the parameter of statistical security. n i is the set size of party P i . “Fixed-base Mul”, “Variable-base Mul”, and “Add” in the table represent operations on points of an elliptic curve. “Encode” is an indexing operation (including cuckoo hashing) for reference [11], while others refer to OKVS encoding. When κ = 128 , L = 1023 .
Table 3. Theoretical computation costs of MPSI protocols. “SH” and “M” refer to semi-honest and malicious protocols, respectively. κ is the parameter of computational security. λ is the parameter of statistical security. n i is the set size of party P i . “Fixed-base Mul”, “Variable-base Mul”, and “Add” in the table represent operations on points of an elliptic curve. “Encode” is an indexing operation (including cuckoo hashing) for reference [11], while others refer to OKVS encoding. When κ = 128 , L = 1023 .
ProtocolSec.Party P i ( 0 i < m 1 ) Party P m 1
Fixed-Base
Mul
Variable-Base
Mul
AddEncodeFixed-Base
Mul
Variable-Base
Mul
AddEncode
[11] (Poly)SH 3.5 κ 3.5 κ 3.5 κ O ( n i log 2 n i ) 7 κ 3.5 ( m 1 ) κ 3.5 ( m 1 ) κ O ( n m 1 )
[11] (Table)SH 3.5 κ 3.5 κ 3.5 κ O ( n m 1 ) 7 κ 3.5 ( m 1 ) κ 3.5 ( m 1 ) κ O ( n m 1 )
[11] (BF)SH 3.5 κ 3.5 κ 3.5 κ O ( n i ) 7 κ 3.5 ( m 1 ) κ 3.5 ( m 1 ) κ O ( n m 1 )
COT (2H-GCT) *MLLL O ( λ n i ) 2 L ( m 1 ) L ( m 1 ) L O ( λ n m 1 )
[28] (Poly)M1 n i m 1 + n i O ( n i log 2 n i ) n m 1 ( m 1 ) n m 1 m 1 + n m 1 O ( n m 1 log 2 n m 1 )
Ours (Poly)M1 n i 0 O ( n i log 2 n i ) n m 1 ( m 1 ) n m 1 0 O ( n m 1 log 2 n m 1 )
Ours (3H-GCT)M1 n i 0 O ( λ n i ) n m 1 ( m 1 ) n m 1 0 O ( λ n m 1 )
* “COT (2H-GCT)” is the multiparty promotion of the two-party PSI protocol proposed in [8], where “2H-GCT” refers to the two-hashing gabled cuckoo table. The structure of silent OT [23] used by VOLE-PSI [24] is relatively complex. When the set is larger than 2 20 , it can evenly share the cost of computation and perform better, so VOLE-PSI is not compared here.
Table 4. Theoretical communication costs of zero sharing (in bits). “SH” and “M” refer to semi-honest and malicious protocols, respectively. κ is the parameter of computational security.
Table 4. Theoretical communication costs of zero sharing (in bits). “SH” and “M” refer to semi-honest and malicious protocols, respectively. κ is the parameter of computational security.
ProtocolSec.Party P i ( 0 i < m 1 ) Party P m 1
[11]SH 2 ( m 1 ) κ 2 ( m 1 ) κ
VOLE *M 2 ( m 1 ) κ 2 ( m 1 ) κ
[28] *M--
OursM 2 ( m 1 ) κ 2 ( m 1 ) κ
* VOLE-PSI is a two-party PSI protocol based on silent OT proposed in paper [24], which introduces the specific construction of OPPRF. OPPRF can be used to construct the MPSI protocol, so “VOLE” here refers to the protocol constructed based on paper [24] and combined with zero-sharing technology. There is a structural difference between the zero sharing mentioned in [28] and the unconditional zero sharing proposed in [11], and although the ideas are similar, the latter form of zero-sharing is heavily coupled with the subsequent DH-based MPSI; therefore, the zero sharing of the latter protocol is subsumed into the subsequent MPSI protocol.
Table 5. Theoretical communication costs of MPSI protocols (in bits). “SH” and “M” refer to semi-honest and malicious protocols, respectively. κ is the parameter of computational security. λ is the parameter of statistical security. The cost of base OTs are independent of input size and equal to 5 κ . n i is the set size of party P i , and n = max i = 0 m 2 n i . ϕ is the size of elliptic curve group elements (256 was used here). β i , 1 and β i , 2 is the required bin size mapping n i elements to 1.2 n i and 0.2 n i bins using simple hashing, and γ i = 3.6 β i , 1 + 0.4 β i , 2 . When n i = 2 14 , β i , 1 = 28 and β i , 2 = 63 . When κ = 128 , L = 1023 .
Table 5. Theoretical communication costs of MPSI protocols (in bits). “SH” and “M” refer to semi-honest and malicious protocols, respectively. κ is the parameter of computational security. λ is the parameter of statistical security. The cost of base OTs are independent of input size and equal to 5 κ . n i is the set size of party P i , and n = max i = 0 m 2 n i . ϕ is the size of elliptic curve group elements (256 was used here). β i , 1 and β i , 2 is the required bin size mapping n i elements to 1.2 n i and 0.2 n i bins using simple hashing, and γ i = 3.6 β i , 1 + 0.4 β i , 2 . When n i = 2 14 , β i , 1 = 28 and β i , 2 = 63 . When κ = 128 , L = 1023 .
ProtocolSec.Party P i ( 0 i < m 1 ) Party P m 1
[11] (Poly)SH 5 ( λ + log ( n n m 1 ) ) n i + 4.9 κ n m 1 + | b a s e O T | i = 0 m 2 5 ( λ + log ( n n m 1 ) ) n i + 4.9 κ n m 1 + | b a s e O T |
[11] (Table)SH γ i ( λ + log ( n n m 1 ) ) n m 1 + 4.9 κ n m 1 + | b a s e O T | i = 0 m 2 γ i ( λ + log ( n n m 1 ) ) n m 1 + 4.9 κ n m 1 + | b a s e O T |
[11] (BF)SH 300 ( λ + log ( n n m 1 ) ) n i + 4.9 κ n m 1 + | b a s e O T | i = 0 m 2 300 ( λ + log ( n n m 1 ) ) n i + 4.9 κ n m 1 + | b a s e O T |
COT(2H-GCT) *M 2.4 κ n i + 2.4 L n m 1 + | b a s e O T | i = 0 m 2 2.4 κ n i + 2.4 L n m 1 + | b a s e O T |
VOLE(Poly) *M κ n i + 2 17 κ n m 1 0.05 + κ n m 1 + | b a s e O T | i = 0 m 2 κ n i + 2 17 κ n m 1 0.05 + κ n m 1 + | b a s e O T |
VOLE(2H-GCT) *M 2.4 κ n i + 2 17 κ n m 1 0.05 + 2.4 κ n m 1 + | b a s e O T | i = 0 m 2 2.4 κ n i + 2 17 κ n m 1 0.05 + 2.4 κ n m 1 + | b a s e O T |
VOLE(3H-GCT) *M 1.3 κ n i + 2 17 κ n m 1 0.05 + 1.3 κ n m 1 + | b a s e O T | i = 0 m 2 1.3 κ n i + 2 17 κ n m 1 0.05 + 1.3 κ n m 1 + | b a s e O T |
[28] (Poly)M 2 κ n i + ϕ n m 1 + ( m 1 ) ϕ i = 0 m 2 2 κ n i + ϕ n m 1 + ϕ
Ours (Poly)M 2 κ n i + ϕ n m 1 + ϕ i = 0 m 2 2 κ n i + ϕ n m 1 + ϕ
Ours (3H-GCT)M 2.6 κ n i + 1.3 ϕ n m 1 + ϕ i = 0 m 2 2.6 κ n i + 1.3 ϕ n m 1 + ϕ
* “COT (2H-GCT)” is the multiparty promotion of the two-party PSI protocol proposed in [8], where “2H-GCT” refers to the two-hashing gabled cuckoo table. “VOLE (Poly)” and “VOLE (2H-GCT)” are the multiparty promotions of the two-party PSI protocol proposed in [24], and “VOLE (3H-GCT)” is a multiparty promotion of the combination of [24] and [16].
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, J.; Liu, Z.; Wang, L.; Zhao, C.; Liu, Z.; Zhou, L. Practical and Malicious Multiparty Private Set Intersection for Small Sets. Electronics 2023, 12, 4851. https://doi.org/10.3390/electronics12234851

AMA Style

Zhou J, Liu Z, Wang L, Zhao C, Liu Z, Zhou L. Practical and Malicious Multiparty Private Set Intersection for Small Sets. Electronics. 2023; 12(23):4851. https://doi.org/10.3390/electronics12234851

Chicago/Turabian Style

Zhou, Ji, Zhusen Liu, Luyao Wang, Chuan Zhao, Zhe Liu, and Lu Zhou. 2023. "Practical and Malicious Multiparty Private Set Intersection for Small Sets" Electronics 12, no. 23: 4851. https://doi.org/10.3390/electronics12234851

APA Style

Zhou, J., Liu, Z., Wang, L., Zhao, C., Liu, Z., & Zhou, L. (2023). Practical and Malicious Multiparty Private Set Intersection for Small Sets. Electronics, 12(23), 4851. https://doi.org/10.3390/electronics12234851

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop