Next Article in Journal
Experimental Investigation of a Device to Restrain the Horizontal Sliding of U-FREIs
Previous Article in Journal
Greenhouse Ventilation Equipment Monitoring for Edge Computing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on Dynamic Searchable Encryption Method Based on Bloom Filter

1
School of Electronic and Electrical Engineering, Shanghai University of Engineering Science, Shanghai 201620, China
2
Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2024, 14(8), 3379; https://doi.org/10.3390/app14083379
Submission received: 29 February 2024 / Revised: 11 April 2024 / Accepted: 12 April 2024 / Published: 17 April 2024

Abstract

:
Data outsourcing has become more and more popular due to its low cost and flexibility. However, there is a problem that the cloud server used to store data is partially trusted. Searchable encryption is an efficient technology that is devoted to helping people conduct accurate searches without leaking information. Nonetheless, most existing schemes cannot support dynamic updates or meet the privacy requirements of all users. There have been some experiments to solve these issues by implementing a dynamically searchable asymmetric encryption scheme. This paper proposes an efficient searchable encryption scheme based on the Authenticator Bloom Filter (ABF). The solution can support dynamic updates and multiple users and meet forward and backward security. This paper uses an ABF to improve the efficiency of searches and updates while playing a significant role in dynamic updates. This paper designs a new token encryption scheme and file set encryption scheme, which not only helps users reduce time in searches and updates but also supports multi-user modes. Experiments show that the proposed scheme takes less time in searching and updating algorithms, especially when the keyword does not exist. The solution also takes into account the problem of history storage when updating, which reduces the unnecessary consumption of memory and avoids multiple storage states for the same file.

1. Introduction

In the age of big data, both individuals and businesses need to store large amounts of data. The identification information, preferences, and habits generated by users when using various applications are stored and analyzed. To protect the privacy of users, cloud servers fade in people’s sight. As cloud servers are semi-trusted, unencrypted information being stored in a server can be insecure in two ways: In the first case, some malicious users will access the server. These malicious users will copy the information from the server, which will cause the user’s information to be compromised. In the second scenario, the cloud server is honest but curious. In Chai and Gong [1], the definition of an honest but curious server in the paper is: (1) storing outsourced data without modifying it; (2) honestly performing all operations such as searching and returning text data separately; and (3) attempting to learn the users’ initial data.
An honest but curious adversary is also defined as a legitimate server that will try to find out all the useful information from the obtained content but will not deviate from the set protocol in the communication channel mentioned by Paverd et al. [2]. As a result, users need to encrypt their important information and store it on cloud servers; otherwise, their information security will be at risk. For example, threat actors broke into Amazon’s web servers and caused a breach of the sensitive information of 3.7 million users. The stolen data were then posted on various hacking forums for sale. In the same time frame, the FlexBooker cloud server was also compromised and the personal data of up to 19 million users were leaked. The investigation found that the company was using AWS S3 storage buckets to store data but had not implemented any security measures. It is therefore essential that data in cloud servers are kept encrypted.
While encrypting the data keeps them from being compromised, it also prevents the cloud server from being able to manipulate the data. The reality is that users do not want to download data and process them again; they want to be able to add, delete, change, and check their encrypted data directly on the cloud server. As a result, the concept of symmetric searchable encryption (SSE) has been introduced and investigated. SSE enables the execution of keyword searches in ciphertext, one of the most basic data operations [3].
Users encrypt their private data and outsource them to a semi-trusted server, after which they send a search token to the server to perform a keyword search without revealing sensitive information  [4,5,6,7,8]. Searchable encryption is divided into symmetric searchable encryption and asymmetric searchable encryption. In earlier research, symmetric searchable encryption was mainly studied in the static case. However, the static case is not applicable to practical work. Dynamic searchable encryption implements dynamic updates on the basis of the former. However, in order to improve the efficiency of the search, each of these options allows some information to be divulged within certain limits. The file injection attack was confirmed by Zhang et al. [9]. This attack is performed by injecting a relatively small number of files to learn a large portion of the keywords searched by the client. To resist this attack, forward security has received attention.
Bost et al. [10] proposed a definition of forward security for searchable encryption and proposed a scheme for a type of DSSE based on forward security. Later, Bost et al. [11] proposed a definition of backward security and gave several schemes. He et al. [12] propose a searchable solution that satisfies backward and forward security. However, this scheme is only applicable to individual users for searching their own data stored in cloud servers and is not applicable to practical applications.
As most practical application environments are not closed, the implementation of symmetric searchable encryption always falls short of the requirements. We therefore introduce asymmetric searchable encryption.
Our main contributions are summarized as follows.
(1)
In this paper, a new multi-user dynamic searchable scheme is proposed on the basis of the predecessors. A new validation Bloom filter structure (ABF) based on the existing Bloom filter is proposed. The new ABF not only includes the original features of the Bloom filter but also adds a counter module, which makes the solution easy to implement in dynamic updates and greatly reduces the error rate.
(2)
In this paper, a new file set encryption scheme is designed, which uses a lightweight algorithm to reduce the overhead of initialization and update. At the same time, the ABF and state op of encrypted files are used to realize dynamic update of data.
(3)
The scheme satisfies forward and backward safety. Forward security is satisfied by updating the search token, and backward security is realized by a new file set encryption scheme. Compared with other schemes, the scheme in this paper not only has a great advantage in time cost but also fully considers the problem of historical storage, avoiding multiple different storage states for the same file.

2. Related Work

Asymmetric encryption utilizes a pair of keys, known as the public key and private key. The public key is made publicly available and is used for encrypting data, while the private key is kept secret and is used for decrypting data. For asymmetric searchable encryption, it was first proposed by again Boneh et al. [13] in their article. But this scheme requires a secure communication channel to pass the trapdoor. However, establishing a secure communication channel is very difficult and expensive. Therefore, Beak et al. devised a scheme that does not require a secure communication channel [14]. Tang and Chen et al. [15] designed a PKI-based asymmetric searchable encryption scheme. It improves on the flaw that an attacker can obtain the relationship between the trapdoor and the ciphertext, as proposed by Baek et al. Park et al. [16] propose two structures for link keyword search with public key encryption. However, this solution involves many connections between the user and the server and has a large storage overhead.
Guo et al. [17] analyze the scheme of Li et al. [18] and demonstrate that its trapdoor indistinguishability is not satisfied. A security scheme that satisfies the requirements of the test-specified server and provides stronger security guarantees for the confidentiality of keywords is also proposed. However, these two schemes do not address the encryption algorithm for files and only enhance the security of keywords, without much advantage in terms of practical application.
A spatial keyword query satisfying forward–backward security was proposed by Wang et al. [19]. The article uses Hilbert curves to simplify geometric range queries to range queries and uses prefix encoding to cover range queries. This solution allows other users to search for data, but this solution is not designed to update steps regarding data that already exist on the cloud server. Although this solution proposes a change to the bitmap when updating, the part of the update algorithm and storage file involved is not further described. Chen et al. [20] proposed a blockchain-based public-key searchable encryption scheme in the paper. The scheme, BSPEFB, not only makes use of smart contracts for searching, which can ensure the correctness and immutability of the returned results, but also satisfies backward and forward security. The solution reduces the number of computationally intensive operations and has a high search efficiency. However, each trapdoor in the scheme corresponds to a separate keyword, which causes a huge inconvenience to the data owner each time the data user requests a search token while giving the data owner an idea of the range of keywords the data user is interested in. If a malicious data owner uses this for analysis, it could easily compromise the data user’s privacy.

3. Preliminaries

3.1. Forward and Backward Privacy

The definition of forward–backward security was first proposed by Stefanov et al. [21] with the first scheme to support dynamic keywords. Then, forward–backward security was first formalized by Bost et al. [10,11]. As for the definition of forward security, Bost et al. argue that an update does not reveal information about the updated keywords. Importantly, the server does not know if the updated file matches the keyword that the user searched for. The definition is as follows:
Definition 1.
The update leak function  L U p d t  can be written as:
L U p d t o p , i n = L o p , i n d i , μ i
For o p , i n pairs, op is the update query and in is the input. i n d i , μ i is a collection of update files, in which μ i is the key of the update and i n d is the updated document. If the update function can be written as the above expression, then the L adaptive secure SSE scheme is forward private.
Bost et al.’s [10] definition here extends forward privacy as proposed by Stefanov et al. [21]. Also, this definition focuses only on adding documents, not updating them.
In backward security, the server cannot associate the currently searched keyword with the results of a previous search. That is, every time a user adds a document i n d corresponding to a keyword to the database, then it is removed later [21]. After this series of operations, when searching this keyword, the search result will not appear in the document ind, so the SSE scheme is backward safe. Bost et al. defined three types of backward security in their paper: backward privacy with insertion pattern, backward privacy with update pattern, and weak backward privacy. In this paper, backward security is judged according to the second one: backward privacy with update mode. It leaks the documents currently matching w, when they were inserted, and when all the updates on w happened (but not their content).

3.2. Bilinear Pairing

Let q be a prime, G 1 and G 2 be two cycle groups of prime order q, on which the operations are addition and multiplication, respectively. The bilinear mapping e : G 1 × G 1 G 2 from G 1 to G 2 satisfies the following properties:
  • Bilinearity: For all P , Q , R G 1 and a , b Z , there are e a P , b Q = e P , Q a b , or e P + Q , R = e P , R · e Q , R and e P , Q + R = e P , Q · e P , R then the mapping is said to be bilinear.
  • Non-degeneracy: If P is the generator of G 1 , then e P , P is the generator of G 2 .
  • Computability: For all P , Q G 1 , there is an efficient algorithm for calculating e P , Q .

3.3. Bloom Filter

The Bloom filter was proposed by Bloom in 1970 [22]. It is actually a very long binary vector and a series of random mapping functions. The Bloom filter can be used to search whether an element is in a set; its advantage is that the space efficiency and query time are much better than the general algorithm, and the disadvantage is that there is a certain misrecognition rate and deletion difficulty.
For each datum, the data owner hashes it into the Bloom filter through k unbiased hash functions. For the number of unbiased hash functions k, this paper uses the following formula to calculate:
k = ( m n ) l n ( 2 ) ,
where m is the size of the filter’s bit array, and n is the number of elements expected to be inserted.
For the false positive rate of the Bloom filter, the following definition is given:
B F R = ( 1 e k n m ) k ,
where k is the number of the hush function, n is the number of elements to be stored, and m is the size of the bit array. In this paper the false positive rate of 10 6 is specified based on the size of the experimental data.

4. Proposed Construction

We propose a new chain structure which includes two parts: keyword-security encryption and file-security encryption. Forward–backward security can be satisfied by performing these two parts.

4.1. System Model

In our design, there are three parts: data owner (DO), data user (DU) and cloud server (CS). The cloud server stores and manages the data owner’s ciphertext set and helps legitimate data users search for the corresponding data. The system model is shown in Figure 1. First, the data owner collects the public keys of all legitimate users to be used to compute the relevant data pp for the search token. In the second step, the data owner sends the encrypted EDB, ABF, and B to the cloud server for storage. The above is the initialization preparation. Next, if there is a user (legitimate or not), he/she can request the data about the search token from the data owner (Step 3). After that, the data owner sends pp to the data user (Step 4). In step 5, the data user uses the data pp to calculate the corresponding keyword search token. Here, only legitimate users can calculate the correct search token using their private key; otherwise, they will only obtain the wrong data. In the next step, the data user sends the search token to the cloud server to apply for the search. In the seventh step, the cloud server sends a collection of encrypted files from the search to the data user, and finally, in the eighth step, the data user decrypts the data in the res to obtain the plaintext.
The scheme proposed in this paper consists of eight algorithms:
S e t u p ( λ ) p a r a : Input the security parameter λ and generate the hash function, pseudo-random function, bilinear mapping, and other data required in the next step.
K e y G e n ( H 0 , i d , g ) s k d u i , p k d u i : Each DU generates her/his own public/private keys using its own id.
I n i t i a l W i n d , p a r a A B F , E D B , B , N E : data owner initializes the related data. Encrypts (keyword, file set) pairs and sends the encrypted data to cloud server.
T r a p d o o r w , p a r a , h , s s t w : DU runs this algorithm, enters its private key (the data are sent from the DO into the algorithm), calculates the search token, and sends it to the cloud server.
S e a r c h s t w , p a r a , E D B , A B F r e s : The data user sends the keyword search token to the cloud server, and the server runs the algorithm to send the corresponding encrypted file set to the data user.
U p d a t e D o c , p a r a , E D B , A B F : The algorithm is used to update the data. This algorithm is run by the DO to encrypt the newly stored keywords or files and put them in the corresponding location.
U p d a t e S T W , p a r a , E D B , A B F E D B , A B F : This algorithm is run by the DO, which updates all keyword search tokens at the end of each update, and then sends all the updated data to the server.
D e c r e s , w , H 2 t a l : The data user decrypts the collection from the server to obtain the required file.

4.2. Keyword-Security Encryption

For the encryption of the search token of the keyword, this paper sets the following definition in order to meet the search conditions of multiple users.
Let U = i d 1 , i d 2 , i d 3 , i d 4 , , i d n be the id set of legitimate search users and the number of users be n. Then, set X = x 1 , x 2 , x 3 , , x l = ( H 0 i d 0 , H 0 i d 1 , H 0 i d 2 , , H 0 i d l , ) to the hashed set of a user’s id, where x 0 = 1 . Set Z = z 1 , z 2 , z 3 , , z l , where z j in Z is the coefficient of z j of the expansion of j = 1 n z H 0 i d j , and i d j is the id of the j-th user. In this article, the data owner sends the following data to the data user:
p p = p 0 , p 1 , p 2 , p a r a ,
where p 0 = g r , andr is the random number; p 1 = g r i = 1 l z i , and z i is the data of Z ; p 2 = g z 0 .
Assuming that a data user with id wants to search for the keyword w, all the data need to be organized into the following form:
t w = e g i = 1 l x i , p 1 e g w p 2 , p 0 = e g i = 1 l x i , g r i = 1 l z i e g w g z 0 , g r = e g , g x 0 z 0 r e g , g w r + z 0 r .
As H 0 i d is the root of j = 1 n z H 0 i d j , and X , Z = i = 0 l x i z i = 0 , so that i = 1 l x i z i = x 0 z 0 . However, x 0 = 1 , we can obtain x 0 z 0 r = z 0 r .

4.3. Keyword Storage Scheme

For keyword storage, this paper designs an Authenticator Bloom Filter (ABF). As shown in Figure 2, the Bloom filter has been modified to add a counting module, and the authenticator is designed to support dynamic updates.
In the ABF structure, for each keyword, the Data Wwner hashes it into the Bloom filter through k unbiased hash functions. For the problem that there may be multiple keywords corresponding to one location, this article adds the counter A[]. Each time A keyword is computed and mapped to the Bloom filter, the count is increased by one for each position A[i]. This means that there is a keyword mapped in the i-th position.
Figure 3 and Figure 4 show the process of adding and deleting data for the ABF. Add data as shown in Figure 3. Hash keyword A and map it to bits 1, 3, 5, and 8 in the Bloom filter. Since bits 1, 5, and 8 are already mapped with keywords, only one is added to the counter. On the third bit, not only a one is added to the counter, but also a one is placed on the corresponding bit of the filter. The deletion process is shown in Figure 4. After keyword B is hashed, it is mapped to the first, fifth, sixth, and eighth bits. First, the corresponding counters are reduced by one, and it is found that the eighth counter is reduced to 0. This means that there are no more keywords mapped to this location, so place 0 in the Bloom filter.

4.4. File-Security Encryption

The ind in this paper’s scheme refers to the address of the file, and the user can find the encrypted file by decrypting to obtain the ind plaintext. This paper uses a symmetric encryption scheme to encrypt the contents of the file, which is not specifically described because it is not very relevant to the scheme of this paper.
The encryption for the document set is as follows:
I E s t w j = H 2 w | | j i n d [ j ] | | o p ,
where o p is the state of the file (add/del).
The form of the encrypted file collection is put into the server, but the form of the first key-value pair of each keyword is different from the other; the first set of key-value pairs is as follows:
a d d s t w 1 = H 3 s t w ,
v a l s t w 1 = I E s t w 1 | | r n 1 H 3 s t w .
Here the first set of key-value pairs requires a search token and a randomly generated number that is used to search for the next key-value pair, and the rest of the key-value pairs are as follows:
a d d s t w i = H 3 r n i 1 ,
v a l s t w i = I E s t w 1 | | r n i H 3 r n i 1 .
Each key-value pair here is calculated from the previous set of key-value pairs, as shown in Figure 5.
For document deletion operations, this article does not physically delete an existing document but sets the op state corresponding to the document to delete. When the server runs the search algorithm, it obtains an encrypted file, thus supporting backward security.

5. Construction

In this section, we introduce our method. This method can be used in many different situations. We will describe and analyze the following Algorithms 1–8.
Algorithm 1 Setup
Input:  λ
Output:  p a r a
  1:
Generates the paramenters about the pairing operation  ( G 1 , G 2 , e , g , q )
  2:
Generate the sets Z
  3:
Select the Hash functions  H i 0 , 2 , h i 1 , 4
  4:
p a r a = G 1 , G 2 , e , g , q , H i 0 , 2 , h i 1 , 4
Algorithm 2 KeyGen
Input:  H 0 , i d , g
Output:  s k d u i , p k d u i
  1:
Generate the sets  X  of user i
  2:
s k d u = g i = 1 l x i
  3:
p k d u = H 0 i d
Algorithm 3 Initial
Input:  W i n d , p a r a
Output:  A B F , E D B , B , N E
  1:
while W i n d n u l l do
  2:
     w i n d R W i n d
  3:
     W i n d W i n d w i n d
  4:
     P a r s e w i n d a s w , s t w , i n d j
  5:
     A B F H = h 1 s t w , h 2 s t w , h 3 s t w , h 4 s t w
  6:
     c s t w = 1 , B [ s t w ] = c s t w
  7:
     I E s t w 1 = H 2 w | | 1 i n d [ 1 ] | | o p
  8:
     a d d s t w 1 = H 3 s t w
  9:
     r n 1 0 , 1 λ
10:
     v a l s t w 1 = I E s t w 1 | | r n 1 H 3 ( s t w )
11:
     E D B [ a d d s t w 1 ] = v a l s t w 1 , N E [ s t w ] = r n 1
12:
    for i = 2 to j do
13:
           I E s t w i = H 2 w | | i i n d [ i ] | | o p
14:
           r n i 0 , 1 λ , r n N E [ s t w ]
15:
           a d d s t w i = H 3 r n
16:
           v a l s t w i = I E s t w i | | r n i H 3 ( s t w )
17:
           E D B [ a d d s t w i ] = v a l s t w i
18:
           N E [ s t w ] = r n i , c s t w B [ s t w ]
19:
           B [ s t w ] = c s t w + 1
20:
    end for
21:
end while
22:
send A B F , E D B ,B and N E to the cloud server
Algorithm 4 Trapdoor
Input:  w , p p
Output:  s t w
  1:
s t w = H 0 e g i = 1 l x i , p 1 e g w p 2 , p 0
  2:
S e n d s t w t o C l o u d S e r v e r
Algorithm 5 Search
Input:  s t w , p a r a , E D B , B
Output:  r e s
  1:
r e s
  2:
H = h 1 s t w , h 2 s t w , h 3 s t w , h 4 s t w
  3:
if H c a n t b e m a p p e d t o A B F , b r e a k ;
  4:
else
  5:
     c s t w B [ s t w ]
  6:
     v a l s t w 1 E D B [ H 3 s t w ]
  7:
     I E s t w 1 | | r n 1 = v a l s t w 1 H 3 s t w
  8:
     r e s = r e s I E s t w 1 , t e m p = r n 1
  9:
    for y = 2 to c s t w do
10:
        v a l s t w y E D B [ H 3 t e m p ]
11:
        I E s t w y | | r n y = v a l s t w y H 3 t e m p
12:
        r e s = r e s I E s t w y , t e m p = r n y
13:
    end for
14:
send res to DataUser
Algorithm 6 Update
Input:  D o c , p a r a , E D B , B
Output:
  1:
while D o c n u l l do
  2:
     d o c R D o c , D o c D o c d o c , f l a g = 0
  3:
     P r a s e d o c a s w , i n d , s t w , o p
  4:
    While f l a g = = 0  do
  5:
       if  s t w not exit
  6:
           update ABF
  7:
            c s t w = 1 , B [ s t w ] = c s t w
  8:
            a d d s t w 1 = H 3 s t w r n 1 R 0 , 1 λ
  9:
            I E s t w 1 = H 2 w | | 1 i n d [ 1 ] | | o p
10:
            v a l s t w 1 = I E s t w 1 | | r n 1 H 3 s t w
11:
            f l a g = 1 , u p d a t e E D B , N E
12:
            P u t s e a r c h t o k e n s t w i n t h e A B F
13:
       else
14:
            c B [ s t w ] , r n N E [ s t w ]
15:
            v a l s t w 1 E D B [ H 3 s t w ]
16:
            I E s t w 1 | | r n 1 = v a l s t w 1 H 3 ( s t w )
17:
            i n d [ 1 ] | | o p = I E s t w 1 H 2 w | | 1
18:
            t e m p = r n 1
19:
           if i n d = = i n d [ 1 ]
20:
                  o p = o p , f l a g = 1
21:
           else
22:
              for y = 2 to c do
23:
                   v a l s t w y E D B [ H 3 t e m p ]
24:
                   I E s t w y | | r n y = v a l s t w y H 3 ( t e m p )
25:
                   i n d [ y ] | | o p = I E s t w y H 2 w | | y
26:
                  if i n d = = i n d [ y ]
27:
                         o p = o p , f l a g = 1
28:
              end for
29:
            r n c + 1 R 0 , 1 λ , N E [ s t w ] = r n c + 1
30:
            a d d s t w c + 1 = H 3 r n
31:
            I E s t w c + 1 = i n d [ c + 1 ] | | o p H 2 w | | c + 1
32:
            v a l s t w c + 1 = I E s t w c + 1 | | r n c + 1 H 3 r n , f l a g = 1
33:
    end while
34:
end while
Algorithm 7 UpdateST
Input:  W , p a r a
Output:  E D B , A B F
  1:
r R 0 , 1 λ
  2:
for each keyword w i W  do
  3:
       t w = e ( g , g ) w i r
  4:
       s t w = H 0 t w
  5:
       a d d s t w 1 = H 3 s t w
  6:
       v a l s t w 1 = I E s t w 1 | | r n 1 H 3 s t w
  7:
       U p d a t e E D B A B F
  8:
end for
  9:
S e n d E D B A B F t o C l o u d S e r v e r
Algorithm 8 Dec
Input:  r e s , w , H 2
Output:  t a l
  1:
t a l ϕ
  2:
for j = 1 to | r e s | do
  3:
       I E s t w j r e s
  4:
       ( i n d [ j ] | | o p ) = I E s t w j H 2 w | | j
  5:
      if o p = = a d d
  6:
            t a l = t a l i n d [ j ]
  7:
end for
  8:
return tal
S e t u p ( λ ) p a r a : The algorithm is run by the DO and the initialization parameters are defined. First, the data owner inputs the security parameter λ to the algorithm, then generatea the addition group G 1 , whose order is a prime q. Let the multiplicative group G 2 have the same order. Let e : G 1 × G 1 G 2 be a map. So, we have g as the generator of group G 1 . Then, the Hash function is selected and we also need to generate vector Z based on the set of public keys of all users.
K e y G e n ( H 0 , i d , g ) s k d u i , p k d u i : Each DU generates her/his own public/private keys using its own id.
I n i t i a l W i n d , p a r a A B F , E D B , B , N E : The data owner runs Algorithm 3 to initialize all the data. The data owner encrypts all the data before sending it to the cloud server. The initialization data for each keyword are placed in w i n d = w , s t w , i n d [ j ] , where i n d [ j ] is a document collection of keywords (line 4–5). The search token corresponding to this keyword is evaluated by four hashes and mapped to the Bloom filter, while the counter A[i] at each corresponding position of the filter is increased by one.
The next step is to encrypt the file set. The first document of each keyword is encrypted differently from the other documents, so it needs to be calculated separately. This scheme requires key-value pairs to store encrypted file sets. Key-value pairs are represented in this paper by ( a d d / v a l ) , and the corresponding v a l value is represented in this paper by E D B [ a d d ] . This scheme needs to create an N E to store the random number generated by the latest keyword file for future updates. Finally, A B F , E D B , B , N E are sent to the cloud server.
T r a p d o o r w , p p s t w : The DU runs this algorithm, enters its private key and the data sent from the DO into the algorithm, and calculates the search token and sends it to the cloud server.
S e a r c h s t w , p a r a , E D B , A B F r e s : The CS runs this algorithm, uses the keyword search token to put its corresponding set of encrypted files into the set res, and sends the res to the DU. First, the cloud server needs to map the search token to the vABF to determine whether the token exists (Line 2–3). If the search token exists, the key-value pair of the encrypted file is found through the token. First, the first value v a l s t w 1 is found by searching the token, and then the encrypted file and r n value are calculated by v a l s t w 1 . Then, the key pair of the next encrypted file is found by the r n value found in turn, and all the encrypted files I E are put into the set r e s through calculation. Finally, the cloud server sends the set r e s to the DU.
U p d a t e D o c , p a r a , E D B , A B F : This algorithm is run by the DO to encrypt the newly stored keywords or files and put them in the corresponding location.
Updates in this scheme are batch updates (including additions and deletions), and the data owner packages the files that need to be added along with other keywords, update status, and search tokens into a quadruple doc, and puts all the docs into a collection, Doc.
There are four situations that need to be determined during the update:
  • When the keyword corresponding to the updated document does not exist (line 8–18). At this point, you need to initialize the keyword and its files and update the ABF;
  • When the keyword exists, and the corresponding first document is the target document (line 20–28). When determining that the first document is the target document, change the status op directly to the target op′;
  • When the keyword exists, and the target file corresponds to a subsequent known file set (line 30–36). Check whether all the files correspond to the target file at one time, and change the corresponding state of the file op to the target state op′ (add or del state) if found;
  • If the target file has not been stored (line 38–43). Add the target file to the end of the file set while updating E D B and file counters B and N E .
U p d a t e S T W , p a r a , E D B , A B F E D B , A B F : This algorithm is run by the DO, which updates all keyword search tokens at the end of each update and then sends all the updated data to the server. This algorithm is run when the data owner is sure that all the data that need to be updated have been updated. When updating the search token, the data owner needs to randomly select a random number r to replace the original r to achieve the purpose of data update. Since the generation of a key-value pair for the first document of the encrypted document set corresponding to each keyword involves a search token, the EDB needs to be updated after each search token is updated. Finally, the new E D B and A B F are sent to the cloud server.
D e c r e s , w , H 2 t a l : The DU runs this algorithm to take the encrypted data from the cloud server and decrypt it one by one. After obtaining the file state o p , determine whether it is the state a d d , and if it is, put the file into the collection t a l . Finally, send t a l to the DU.

6. Security Analysis

6.1. Forward–Backward Privacy

First, forward security means that an update does not reveal any information about the updated keywords. Since the hash function is one-way, the server cannot decrypt the stored identifier unless the client can generate a previous search token. At the same time, every time the data owner updates, the updated keyword search token is updated, so even if the previous search token is leaked, it will not affect future security. Therefore, the scheme in this paper realizes forward privacy.
Backward security ensures that search queries do not show indexes that were previously added but later removed. In this scenario, the file and its file state o p are encrypted. Because the search results are still in ciphertext, even if it is stored in a curious server, an attacker cannot learn useful information about the index without knowing exactly what the keyword is. Thus, we support backward privacy.

6.2. Adaptive Security

In order to improve the efficiency of the solution, most existing solutions will leak some information to the cloud server. Therefore, the confidentiality of searchable encryption schemes means that no more information is leaked than is allowed. To demonstrate confidentiality, we follow a true-ideal simulation paradigm similar to the work [23].
Let Π = S e t u p , K e y G e n , I n i t i a l , T r a p d o o r , S e a r c h , U p d a t e , U p d a t e S T , D e c be this article’s scheme, S be the simulator, and A be the adversary. We defined the following two games:
R e a l A Π λ : Run the algorithm S e t u p λ and the algorithm K e y G e n p a r a . Then, the game is published p a r a , p k d u and s k d u is saved. After that, The attacker then selects a database DB, executes various queries against it, including update queries, search queries, and decryption queries, and returns the answers to these queries by executing the corresponding algorithms or protocols update, search, and dec, respectively. Finally, A outputs a bit b 0 , 1 .
I d e a l A Π λ : In an ideal world, the opponent selects A safety parameter, and the simulator selects the leak functions L S e t u p and L K e y G e n to generate system parameters and return them to the A . The adversary then selects a database, DB, and executes various queries against it, including update queries, search queries, and decryption queries. The experiment returns the answers to these queries by calling the leak function L = L S e t u p , L K e y G e n , L I n i t i a l , L T r a p d o o r , L S e a r c h , L U p d a t e , L D e c . Finally, A outputs a bit b 0 , 1 .
Theorem 1.
Let H be the password hash function. The scheme is L adaptively safe in the stochastic prediction model, where the set of leakage functions L is defined as follows:
L = L S e t u p , L K e y G e n , L I n i t i a l , L T r a p d o o r , L S e a r c h , L U p d a t e , L D e c ,
where L S e t u p = , L K e y G e n = , L I n i t i a l = , L T r a p d o o r = , L D e c = .
Proof. 
Our proof uses a hybrid argument consisting of a series of games. The first game is exactly the same as the game in the real world, while the last game is exactly the same as the game in the ideal world.
G0: This game is the real world SSE security game Real. So, we can obtain:
P r R e a l A E D A E F B λ = 1 = P r G a m e 0 = 1 .
G1: In this game, we need to randomly select the user’s public key I D i to replace the original public key p k d u i = H 0 i d i . It is easy to see here that G1 and G0 are indistinguishable.
P r G a m e 0 = 1 = P r G a m e 1 = 1 .
G2: In this game, we create a table T O K E N to store search tokens. Each search token is replaced by a random number. Whenever a keyword search token is called, we call the number in the table T O K E N instead of the number in the text. In the case of updates, we will randomly select a string in 0 , 1 λ to act. Here we have:
P r G 2 = 1 P r G 1 = 1 A d v A h a s h λ
G3: In this game, we need to create four tables H a 1 , H a 2 , H a 3 , H a 4 to answer the random oracle query, which are used to record the h i 1 , 4 that needs to be mapped to the ABF. In the game, whenever these four values need to be calculated, they are directly taken at random from 0 , 1 λ and put into the four H a 1 , H a 2 , H a 3 , H a 4 tables. If the opponent can distinguish between game 2 and game 3, then the hash function can be distinguished from the real random function, which is obviously impossible. Thus, we have:
P r G 3 = 1 P r G 2 = 1 A d v A h a s h λ .
G4: In this game, two tables, H1 and H2, need to be created to answer A ’s query. H 1 is to record the response to H 2 w | | j and H2 is to record the response to H 3 (). In our game, we only consider the leak function in the algorithms update, so we can define L U p d a t e D O C = Σ w W E D B w , which only leaks the number of keyword/document pairs. In game 2, we generate the search token s t w in the update algorithm as a random string instead of the search token generated in the algorithm. In addition, the H 1 s t w , w i and H 2 c s t w s t w during token generation is also replaced by the random strings. If the adversary can distinguish between games 2 and 3, we can distinguish between hashed and truly random functions. Then, we have:
P r G a m e 4 = 1 P r G a m e 3 = 1 A d v A h a s h λ .
G5: In this game, we maintain a table U P D A T E to generate the encrypted document. In the update protocol, game 5 uses random numbers instead of encrypted document I E . It can be seen that games 4 and 5 are the same.
P r G a m e 4 = 1 = P r G a m e 5 = 1 .
G6: Simulator S simulates the adversary’s point of view with a leak function L that includes search patterns and add history. From the opponent’s point of view, G4 and G5 are exactly the same. Thus, they are indistinguishable:
P r G a m e 6 = 1 = P r G a m e 5 = 1 = P r I d e a l A Π λ = 1 .
Conclusion: To sum up the contributions of G0, G1, G2, G3, G4, G5, and G6 we have:
P r R e a l A Π λ = 1 = P r I d e a l A Π λ = 1 A d v A h a s h λ
Since the hash function is a one-way function, this scheme is an L -adaptively-secure searchable encryption scheme.

7. Performance Analysis

This chapter analyzes our scheme through performance and experiments. Comparing multiple thesis schemes with the scheme of this paper, we draw a conclusion.
We use python cryptographic libraries on a machine with 16 GB of RAM, Intel CORE i7-9700 (8-core, 3.6 GHz), running Windows 10 to implement our algorithm. The experiment took Enron email as the data set, mainly tested the algorithm of updating and searching, and compared the time spent in processing all the keywords and file pairs. In the experiment, the security parameter λ = 128 was set, and MD5 was used to implement the hash function. The scheme in this paper will also be compared with the schemes in papers of Chen1 [24], Liu [25], and Chen2 [20].

7.1. Functional Comparison

The functional comparison is shown in Table 1. The scheme of Chen1 et al. [24] can satisfy the anterograde safety. However, this scheme uses symmetric encryption and does not satisfy multiple users. Liu et al.’s [25] scheme satisfies forward security but not backward security or multiple users and uses symmetric key encryption. The scheme proposed by Chen2 et al. [20] satisfies both forward and backward security, supports multiple users, and uses asymmetric key encryption. However, the multiple users of this scheme will consume more time but not in the main scheme, which the article author only mentioned in the article. As can be seen from the table, the scheme in this paper is one of the best.

7.2. Time Consuming for Different ABF Hash Function Numbers

In order to improve the search efficiency, this paper uses the ABF to search keywords. In the ABF, the main factor affecting its efficiency is the number of hashes. Since the use of Bloom filters saves on the error rate, there are generally two solutions: increase the number of hashes and increase the storage array. Since the ABF is stored in the cloud, the error rate can be reduced by increasing the array size so only a few hash functions need to be considered for the most efficient update time. As shown in Figure 6 and Figure 7, the experiment compares the time spent adding and deleting the hash numbers of 4, 6, 8, and 10. It can be seen that when the number of hashing times is four, the time required is the least, and the time required increases slowly as the number of keywords increases.
Based on the conclusion drawn above, it can be determined that the number of hash functions used by this paper’s scheme in the ABF is four. Therefore, Formula (3) can be utilized to calculate the size of the bit set in this paper. It can be obtained as m = 47,925,315 bits so that the false positive rate of this paper scheme can be 10 6 .

7.3. Time Cost of Search Algorithm

When the user initiates a query operation, a token for the corresponding keyword is generated, and the token is then sent to the server. Figure 8 shows the relationship between the number of keyword document pairs and the search time when the server performs a search. In the experiment, the three schemes Chen1 [24], Liu [25], and Chen2 [20] were compared. In order to obtain a more fair result, the experiment added up the search token generation time of each scheme for comparison, equivalent to calculating the total process of the data user to obtain the encrypted file of the file.
As we can see from Figure 8, the search time in this article is less than in other scenarios, whether the keyword is present or not. If it does not exist, simply return and prompt. As can be seen from the Figure 8, the search time without keywords is between 0.03 ms and 0.05 ms, which can greatly improve the search rate and reduce the waiting time for users to receive feedback. Compared with other experiments, it is necessary to conduct a chain search for all keywords and then give feedback.
For the search of non-existent keywords, since the other three experiments are all using the same chain storage mode, they are combined into one for comparison (shown in Figure 9). It can be seen that once the keywords exceed 100,000, the chain search method will gradually increase the time. However, the time consumed in the search method in this paper is basically stable, and the time consumed is not increased due to the growth of the total number of searches. The data will only fluctuate in a small range, and the feedback time of users will be greatly shortened.

7.4. Time Cost of Update Algorithm

In the update algorithm, this paper is compared with the schemes of Chen1 [24], Liu [25], and Chen2 [20]. Since these schemes are added and deleted with this algorithm, they are all shown in the following figure.
As can be seen from Figure 10, the solution update of Chen2 et al. [20] took more time. In this scheme, for each keyword that needs to be updated, the file corresponding to it needs to be re-encrypted once. That is, the data owner does not care about the previous encrypted file set; she/he re-encrypts the new file set, and then sends it directly into the database. The reason for the time consumed is that some calculations used in the encryption process, such as pseudo-random function F, bilinear pair e, etc., will be more time-consuming than hashing operations. In addition, the update scheme of Chen2 et al.’s [20] scheme does not take into account whether the newly added file has been added before, which will lead to repeated searches, or whether the new file is deleted and the old version is added, resulting in the deleted file being obtained by the data user.
In Liu et al.’s [25] scheme, the previous addition of files was also not taken into account in the update. Although the time is shorter, with the increase in the number of files, the time is the fastest growing, even exceeding the original time-consuming Chen2 scheme.
Among the schemes proposed by Chen1 et al. [24], the time is second only to that proposed in this paper. However, the problem is the same as that in the previous two schemes; the existing files and their status are not considered. This not only adds unnecessary storage space but also affects subsequent search results. Finally, the scheme in this paper is superior to the other three schemes both in terms of rationality and time.

8. Conclusions

In this paper, we design a new dynamic searchable scheme which satisfies forward and backward security. A new ABF based on the original Bloom filter is proposed to reduce the misjudgment rate. At the same time, new key encryption and file encryption schemes are designed. The solution supports forward and backward security, multiple users, and dynamic updates. Compared with other existing schemes on the premise of forward and backward security, especially when the keyword does not exist, the scheme in this paper greatly reduces the time and improves the efficiency. And the keyword search time in this paper has been maintained between 0.03 ms and 0.05 ms. The scheme of this paper takes into account the history of file storage, avoids the situation of multiple storage states of a file when searching, and greatly meets the needs of users.

Author Contributions

Supervision, D.L.; Writing—original draft, Z.J.; Review, X.Z. and Z.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Open Foundation of Shanghai Key Laboratory of Integrated Administration Technologies for Information Security (Grant No. AKG2019005), and Scientific and Technological Innovation 2030—“New Generation Artificial Intelligence” Major Project (Grant No. 2020AAA0109300), and Shanghai Local Colleges and Universities Science and Technology Innovation Capacity-Building Project (Grant No. 23010501800).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the article.

Acknowledgments

Throughout the writing of this dissertation I have received a great deal of support and assistance. I would also like to thank my tutor, Dongmei Li, for her valuable guidance throughout my studies. You provided me with the tools that I needed to choose the right direction and successfully complete my dissertation. I am also extremely grateful to all my friends and classmates who have kindly provided me assistance and companionship in the course of preparing this paper. In addition, I would like to thank Shanghai University of Engineering Science for providing me with a good learning environment.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Chai, Q.; Gong, G. Verifiable symmetric searchable encryption for semi-honest-but-curious cloud servers. In Proceedings of the 2012 IEEE International Conference on Communications (ICC), Ottawa, ON, Canada, 10–15 June 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 917–922. [Google Scholar]
  2. Paverd, A.; Martin, A.; Brown, I. Modelling and automatically analysing privacy properties for honest-but-curious adversaries. Tech. Rep. 2014, 1–14. [Google Scholar]
  3. Song, D.X.; Wagner, D.; Perrig, A. Practical techniques for searches on encrypted data. In Proceedings of the Proceeding 2000 IEEE Symposium on Security and Privacy. S&P 2000, Berkeley, CA, USA, 14–17 May 2000; IEEE: Piscataway, NJ, USA, 2000; pp. 44–55. [Google Scholar]
  4. Curtmola, R.; Garay, J.; Kamara, S.; Ostrovsky, R. Searchable symmetric encryption: Improved definitions and efficient constructions. In Proceedings of the 13th ACM Conference on Computer and Communications Security, Alexandria, VA, USA, 3 November 2006; pp. 79–88. [Google Scholar]
  5. Chase, M.; Kamara, S. Structured encryption and controlled disclosure. In Advances in Cryptology-ASIACRYPT 2010: Proceedings of the 16th International Conference on the Theory and Application of Cryptology and Information Security, Singapore, 5–9 December 2010; Proceedings 16; Springer: Berlin/Heidelberg, Germany, 2010; pp. 577–594. [Google Scholar]
  6. Cash, D.; Jarecki, S.; Jutla, C.; Krawczyk, H.; Roşu, M.C.; Steiner, M. Highly-scalable searchable symmetric encryption with support for boolean queries. In Advances in Cryptology–CRYPTO 2013: Proceedings of the 33rd Annual Cryptology Conference, Santa Barbara, CA, USA, 18–22 August 2013; Proceedings, Part I; Springer: Berlin/Heidelberg, Germany, 2013; pp. 353–373. [Google Scholar]
  7. Cash, D.; Tessaro, S. The locality of searchable symmetric encryption. In Advances in Cryptology–EUROCRYPT 2014: Proceedings of the 33rd Annual International Conference on the Theory and Applications of Cryptographic Techniques, Copenhagen, Denmark, 11–15 May 2014; Proceedings 33; Springer: Berlin/Heidelberg, Germany, 2014; pp. 351–368. [Google Scholar]
  8. Asharov, G.; Naor, M.; Segev, G.; Shahaf, I. Searchable symmetric encryption: Optimal locality in linear space via two-dimensional balanced allocations. In Proceedings of the Forty-Eighth Annual ACM Symposium on Theory of Computing, Cambridge, MA, USA, 19–21 June 2016; pp. 1101–1114. [Google Scholar]
  9. Zhang, B.; Zhang, F. An efficient public key encryption with conjunctive-subset keywords search. J. Netw. Comput. Appl. 2011, 34, 262–267. [Google Scholar] [CrossRef]
  10. Bost, R. Σoφoς: Forward secure searchable encryption. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 1143–1154. [Google Scholar]
  11. Bost, R.; Minaud, B.; Ohrimenko, O. Forward and backward private searchable encryption from constrained cryptographic primitives. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1465–1482. [Google Scholar]
  12. He, K.; Chen, J.; Zhou, Q.; Du, R.; Xiang, Y. Secure dynamic searchable symmetric encryption with constant client storage cost. IEEE Trans. Inf. Forensics Secur. 2020, 16, 1538–1549. [Google Scholar] [CrossRef]
  13. Boneh, D.; Di Crescenzo, G.; Ostrovsky, R.; Persiano, G. Public key encryption with keyword search. In Advances in Cryptology-EUROCRYPT 2004: Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques, Interlaken, Switzerland, 2–6 May 2004; Proceedings 23; Springer: Berlin/Heidelberg, Germany, 2004; pp. 506–522. [Google Scholar]
  14. Baek, J.; Safavi-Naini, R.; Susilo, W. Public key encryption with keyword search revisited. In Computational Science and Its Applications–ICCSA 2008: Proceedings of the International Conference, Perugia, Italy, 30 June–3 July 2008; Proceedings, Part I 8; Springer: Berlin/Heidelberg, Germany, 2008; pp. 1249–1259. [Google Scholar]
  15. Tang, Q.; Chen, L. Public-key encryption with registered keyword search. In Public Key Infrastructures, Services and Applications; Springer: Berlin/Heidelberg, Germany, 2009; pp. 163–178. [Google Scholar]
  16. Park, D.J.; Kim, K.; Lee, P.J. Public key encryption with conjunctive field keyword search. In Information Security Applications; Springer: Berlin/Heidelberg, Germany, 2004; pp. 73–86. [Google Scholar]
  17. Guo, J.; Han, L.; Yang, G.; Liu, X.; Tian, C. An improved secure designated server public key searchable encryption scheme with multi-ciphertext indistinguishability. J. Cloud Comput. 2022, 11, 14. [Google Scholar] [CrossRef]
  18. Li, H.; Huang, Q.; Shen, J.; Yang, G.; Susilo, W. Designated-server identity-based authenticated encryption with keyword search for encrypted emails. Inf. Sci. 2019, 481, 330–343. [Google Scholar] [CrossRef]
  19. Wang, X.; Ma, J.; Liu, X.; Miao, Y.; Liu, Y.; Deng, R.H. Forward/backward and content private dsse for spatial keyword queries. IEEE Trans. Dependable Secur. Comput. 2022, 20, 3358–3370. [Google Scholar] [CrossRef]
  20. Chen, B.; Wu, L.; Wang, H.; Zhou, L.; He, D. A blockchain-based searchable public-key encryption with forward and backward privacy for cloud-assisted vehicular social networks. IEEE Trans. Veh. Technol. 2019, 69, 5813–5825. [Google Scholar] [CrossRef]
  21. Stefanov, E.; Papamanthou, C.; Shi, E. Practical dynamic searchable encryption with small leakage. Cryptol. ePrint Arch. 2013. Available online: https://eprint.iacr.org/2013/832 (accessed on 28 February 2024).
  22. Bloom, B.H. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 1970, 13, 422–426. [Google Scholar] [CrossRef]
  23. Song, X.; Dong, C.; Yuan, D.; Xu, Q.; Zhao, M. Forward private searchable symmetric encryption with optimized I/O efficiency. IEEE Trans. Dependable Secur. Comput. 2018, 17, 912–927. [Google Scholar] [CrossRef]
  24. Chen, B.; Xiang, T.; He, D.; Li, H.; Choo, K.K.R. BPVSE: Publicly Verifiable Searchable Encryption for Cloud-Assisted Electronic Health Records. IEEE Trans. Inf. Forensics Secur. 2023, 18, 3171–3184. [Google Scholar] [CrossRef]
  25. Liu, Y.; Yu, J.; Yang, M.; Hou, W.; Wang, H. Towards fully verifiable forward secure privacy preserving keyword search for IoT outsourced data. Future Gener. Comput. Syst. 2022, 128, 178–191. [Google Scholar] [CrossRef]
Figure 1. System model.
Figure 1. System model.
Applsci 14 03379 g001
Figure 2. Authenticator Bloom Filter.
Figure 2. Authenticator Bloom Filter.
Applsci 14 03379 g002
Figure 3. Authenticator Bloom Filter (addition). (The red font is the changed data).
Figure 3. Authenticator Bloom Filter (addition). (The red font is the changed data).
Applsci 14 03379 g003
Figure 4. Authenticator Bloom Filter (deletion). (The red font is the changed data).
Figure 4. Authenticator Bloom Filter (deletion). (The red font is the changed data).
Applsci 14 03379 g004
Figure 5. Encrypted file storage structure.
Figure 5. Encrypted file storage structure.
Applsci 14 03379 g005
Figure 6. Impact of different hashing times on search time (add).
Figure 6. Impact of different hashing times on search time (add).
Applsci 14 03379 g006
Figure 7. Impact of different hashing times on search time (del).
Figure 7. Impact of different hashing times on search time (del).
Applsci 14 03379 g007
Figure 8. Time cost of search algorithm [20,24,25].
Figure 8. Time cost of search algorithm [20,24,25].
Applsci 14 03379 g008
Figure 9. Time spent searching for a keyword when the keyword does not exist.
Figure 9. Time spent searching for a keyword when the keyword does not exist.
Applsci 14 03379 g009
Figure 10. Time cost of update algorithm [20,24,25].
Figure 10. Time cost of update algorithm [20,24,25].
Applsci 14 03379 g010
Table 1. Functional comparison.
Table 1. Functional comparison.
SchemeFPBPMulti-UserCryptosystem
Chen1 [24]×symmetric
Liu [25]××symmetric
Chen2 [20]asymmetric
ourasymmetric
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jin, Z.; Li, D.; Zhang, X.; Cai, Z. Research on Dynamic Searchable Encryption Method Based on Bloom Filter. Appl. Sci. 2024, 14, 3379. https://doi.org/10.3390/app14083379

AMA Style

Jin Z, Li D, Zhang X, Cai Z. Research on Dynamic Searchable Encryption Method Based on Bloom Filter. Applied Sciences. 2024; 14(8):3379. https://doi.org/10.3390/app14083379

Chicago/Turabian Style

Jin, Ziqi, Dongmei Li, Xiaomei Zhang, and Zhi Cai. 2024. "Research on Dynamic Searchable Encryption Method Based on Bloom Filter" Applied Sciences 14, no. 8: 3379. https://doi.org/10.3390/app14083379

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop