Fast, Searchable, Symmetric Encryption Scheme Supporting Ranked Search
Abstract
:1. Introduction
- (1)
- A clustering algorithm is adopted to cluster the document set into multiple document clusters, and a secure index tree is constructed on each cluster. By utilizing this approach, the time complexity of index tree retrieval can be reduced because the height of the index tree is decreased.
- (2)
- We optimize the search method to reduce the number of index trees to be retrieved and further improve the query efficiency of our scheme without sacrificing query accuracy too much.
- (3)
- By utilizing the ASPE scheme [10] to encrypt the index and the query, we propose a fast SSE scheme support ranked search (F-SSE-RS). Moreover, we also design a dynamic update method so that the index in our scheme can support safe document update operations.
2. Problem Formulation
2.1. System Model
- (1)
- Data owner (DO): DO owns a large number of sensitive documents . DO utilizes a symmetric key encryption scheme, e.g., AES, to encrypt the document set F, and adopts the F-SSE-RS scheme to build the secure searchable index. After these operations, DO uploads encrypted documents and secure index to CS. Finally, DO delivers the secret key to the data users who have been granted access to the data.
- (2)
- Data user (DU): An authorized DU can make a secure query over encrypted data. Given a query Q, DU creates a trapdoor with the secret key and Q and sends it to CS. When DU gets search results from CS, DU can use the secret key to decode the encrypted contents.
- (3)
- Cloud server (CS): DO’s encrypted index and documents are stored in CS. Once the trapdoor, , is obtained from DU, CS executes the query over the index and returns the most relevant encrypted documents related to Q. In addition, CS runs update operations over the encrypted index after obtaining updated information from DO.
2.2. Threat Model
- -
- Known ciphertext model: In this model, CS only knows encrypted documents and the secure index, which are stored on the server.
- -
- Known background model: CS can access more information in this model than the aforementioned model. This information involves the relationship between a trapdoor and the dataset, and the statistics related to the dataset. For example, CS might exploit the dataset’s term frequency (TF)-inverse document frequency (IDF) knowledge to perform the statistical attack.
2.3. Design Goals
- (1)
- Multi-keyword ranked search: Each document, , in F is associated with a keyword set, , in which is the number of keywords in . The multi-keyword query, Q, is . The F-SSE-RS scheme’s search result is sorted, which means that F-SSE-RS returns documents whose keyword set, , is highly relevant to the query, Q. Furthermore, the F-SSE-RS scheme can enable efficiently dynamic activities, such as document insertion and deletion.
- (2)
- Efficiency: The F-SSE-RS scheme can achieve sublinear search efficiency. Furthermore, the time cost of keyword search is substantially lower than existing similar schemes.
- (3)
- Privacy preserving: The F-SSE-RS scheme, like some previous schemes, prevents CS from deducing more private information from ciphertexts, secure indexes, and trapdoors. Privacy requirements that our scheme focuses on are listed as follows.
- -
- Document and index privacy: Document privacy is usually protected by traditional symmetric-key encryption schemes, e.g., AES, DES, and six-face cubical key encryption [27]. For index privacy, the F-SSE-RS scheme prevents CS from learning what is hidden in the index.
- -
- Trapdoor unlinkability: The trapdoor generation algorithm needs to be probabilistic rather than deterministic, which means that the same keyword query will generate different trapdoors.
- -
- Keyword privacy: Although the trapdoor can be protected using cryptographic techniques, search results can be adopted to infer query keywords. Thus, our scheme needs to prevent CS from learning query keywords from trapdoors by search results and statistics of documents.
3. Methods for Index Building and Searching
3.1. Keyword Conversion Method
- (1)
- The method extracts keywords in the dataset, and builds a dictionary , where is an unique keyword in and .
- (2)
- For each document, , associated with a keyword set, , the method first creates a vector for . Based on Equation (1), the method then sets when , where , and .The number of repetitions of in the document, , is denoted by in Equation (1).
- (3)
- For a query, , the method first constructs a vector . After this, based on Equation (2), the method sets when , where and .In Equation (2), is the number of documents that contain the keyword in the dataset.
3.2. Approach for Index Building
3.2.1. Dataset Division Method
Algorithm 1 Dataset division method. |
Input: A vector set for the dataset F, the number of document clusters (k) that users want to produce. Output: k document clusters .
|
3.2.2. Method for Building the Plaintext Index
Algorithm 2 The algorithm for building index tree for the cluster, , declared by BuildTree() |
Input: The leaf node set of the cluster, . Output: The plaintext index tree, , for the cluster, .
|
3.3. Approach for Index Search
Algorithm 3 The algorithm for search the index tree, declared by SearchIndexTree(, u, ) |
Input: An IDF vector of the query Q, an index tree of the root node and an empty result list . Output: containing documents with maximum relevant scores.
|
4. Proposed Scheme
4.1. Construction of F-SSE-RS
- KeyGen (): Taking a security parameter, , as an input, this algorithm chooses two random invertible matrices, , whose dimension are , and a vector, S, whose dimension is . Then, it sets the secret key as and outputs the to authorized data users.
- IndexBuild (, ): Given a document set F, this algorithm first partitions F into k document subsets using the data division method. For each document set, , this algorithm adopts Algorithm 2 to generate an index tree for , where . Then, this algorithm encrypts the index tree, . The encryption process starts from the root node, and each node is encrypted using a sequential traversal method. More precisely, for a node , the algorithm extends the N-dimension vector into a -dimension vector , in which the value of is set to be when , and the value of is set as a random number, , when . After the extension process, two random vectors, , of can be created by using the following equations.After encrypting each node in the index tree , the algorithm generates the encrypted index tree , where each encrypted node of u can be expressed as Finally, after encrypting all the index trees, the algorithm outputs the encrypted index .
- TrapdoorGen (,): Given a query, Q, the algorithm first transforms Q into an IDF vector using the keyword conversion method given in Section 3.1. Then, this algorithm extends the N-dimension vector into a -dimension vector , where each is set to be when and each is set to be 0 or 1 randomly when . After this, this algorithm generates two random vectors, , according to the following equations.Finally, this algorithm outputs as the trapdoor for Q.
- Search (, ): Given the trapdoor, , for each encrypted tree, , this algorithm computes the relevant score, , between the encrypted root node of and , where . Suppose that are the top-t correlation scores, the search algorithm performs the traversal search on these encrypted trees , where . For each , this algorithm searches the encrypted tree according to Algorithm 3. In the search process, for an encrypted tree node and the trapdoor , this algorithm can compute:According to Equation (4), the computation result between and is the same as that between the plaintext u and Q. Therefore, the search algorithm can employ Algorithm 3 to perform the sorting search in the encrypted state. After finishing the query on the encrypted tree , a result set on can be obtained. Finally, this algorithm figures out the documents with the highest scores from and return them to the user as query results.
4.2. Dynamic Update Operations
- -
- Deletion: When DO wants to delete the document f from the index, DO first determines which tree in the index f exists in. Then, DO locates the position information about the leaf node of f in that index tree. Finally, DO sends the location information to CS, which can null the node based on the location information to achieve the deletion operation.
- -
- Addition: When DO wants to add a document f to the index, DO first transforms f’s keywords into a TF vector using the keyword conversion method and constructs a leaf node about f with its TF vector. Subsequently, using the TF vector, DO finds the index tree whose root node is the most semantic similar to f in the index, and locates a leaf node marked as invalid in that tree. Then, DO replaces this invalid node with a leaf node of f and updates the vector of all internal nodes on the path from the root of the tree to this leaf node. Finally, DO encrypts all the changed nodes and sends them to CS together with their corresponding position information. When CS receives these nodes, CS replaces the relevant nodes based on the position information to implement the insertion operation. In addition, if there are no leaf nodes marked as invalid in the index tree, DO can add multiple invalid nodes to the index tree and update the index tree. After that, DO encrypts the modified tree nodes and sends their location information to CS. According to this location information, CS updates the index tree to realize the file addition operation.
- -
- Modification: If DO wants to modify a file, then DO first locates the leaf node corresponding to that file and replaces the semantic vector for the leaf node with the newer vector. Then, DO updates all the nodes on the path from the root of the tree to that leaf node based on the modified vector of the leaf node. Finally, DO encrypts the contents of all nodes to be changed and sends their location information together to CS. When CS receives these nodes, it replaces the old nodes according to the location information to perform the update operation.
4.3. Security Analysis
- -
- Document and index privacy: In the F-SSE-RS scheme, the confidentiality of the document content is guaranteed by a traditional symmetric secret key encryption scheme, such as AES. The index in the F-SSE-RS scheme is a combination of multiple index trees, and the content of each node in the index tree is cryptographically protected using the ASPE scheme. Because AES and ASPE are provably secure under known ciphertext models, the plaintext contents hidden in the documents and indices cannot be inferred by an attacker. So, we argue that the privacy of documents and indices is protected well.
- -
- Trapdoor unlinkability: The trapdoor-generation algorithm of the proposed scheme is probabilistic, which is manifested in the following two aspects. (1) The semantic vector of the query Q is enlarged into an extension vector before generating the trapdoor, and even the same two queries can be enlarged into different extension vectors; (2) in the “ TrapdoorGen” algorithm, the query vector is partitioned into two parts randomly. Based on the above two points, we can conclude that the same two queries can be encrypted into different trapdoors, so the proposed scheme can satisfy the requirement of trapdoor unlinkability.
- -
- Keyword privacy: Under the known ciphertext model, the attacker cannot infer the keyword information from the index and trapdoor since the F-SSE-RS scheme utilizes the ASPE scheme to encrypt the index and trapdoor. However, in the known background model, CS can use the document–word frequency to perform statistical attacks and then infer the keywords embedded in the index and trapdoors. For the statistical attack in the known background model, our scheme extends the keyword vectors and in the index and trapdoor into and , respectively. Specifically, for each extended dimension of , the scheme randomly selects a number , while for each extended dimension of , the scheme randomly selects a number 0 or 1. This approach allows the query results to be masked by the randomness of . Since the number of extended dimensions is L, the probability that two have the same value is only . Therefore, when L increases, the query results will be more influenced by , bringing the result that the privacy of keywords increases but the search accuracy decreases. Therefore, by adjusting L, we can make a tradeoff between precision and privacy in practical applications. The analysis of the tradeoff between precision and privacy can be found in [8].
5. Performance Evaluation
5.1. Efficiency of Index Building
5.2. Efficiency of Trapdoor Generation
5.3. Efficiency of Search
5.4. Accuracy
5.5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
SE | Searchable encryption. |
ASPE | Asymmetric scalar-product-preserving encryption. |
SSE | Searchable symmetric key encryption. |
SPE | Searchable public key encryption. |
TF-IDF | Term frequency-inverse document frequency. |
PEKS | encryption with keyword search. |
DO | Data owner. |
DU | Data user. |
CS | Cloud server. |
References
- Song, D.; Wagner, D.; Perrig, A. Practical techniques for searching on encrypted data. In Proceedings of the IEEE Symposium on Research in Security and Privacy, Berkeley, CA, USA, 14–17 May 2000; pp. 44–55. [Google Scholar]
- Fu, Z.; Ren, K.; Shu, J.; Sun, X.; Huang, F. Enabling personalized search over encrypted outsourced data with efficiency improvement. IEEE Trans. Parallel Distrib. Syst. 2015, 27, 2546–2559. [Google Scholar] [CrossRef]
- Sun, W.; Liu, X.; Lou, W.; Hou, Y.T.; Li, H. Catch you if you lie to me: Efficient verifiable conjunctive keyword search over large dynamic encrypted cloud data. In Proceedings of the 2015 IEEE Conference on Computer Communications (INFOCOM), Hong Kong, China, 26 April–1 May 2015; pp. 2110–2118. [Google Scholar]
- Boneh, D.; Di Crescenzo, G.; Ostrovsky, R.; Persiano, G. Public key encryption with keyword search. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques, Interlaken, Switzerland, 2–6 May 2004; pp. 506–522. [Google Scholar]
- Zhang, Y.; Li, Y.; Wang, Y. Secure and Efficient Searchable Public Key Encryption for Resource Constrained Environment Based on Pairings under Prime Order Group. Secur. Commun. Netw. 2019, 2019, 1–14. [Google Scholar] [CrossRef]
- Miao, Y.; Tong, Q.; Deng, R.; Choo, K.K.R.; Liu, X.; Li, H. Verifiable searchable encryption framework against insider keyword-guessing attack in cloud storage. IEEE Trans. Cloud Comput. 2020, 1–14. [Google Scholar] [CrossRef]
- Cao, N.; Wang, C.; Li, M.; Ren, K.; Lou, W. Privacy-preserving multi-keyword ranked search over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 2013, 25, 222–233. [Google Scholar] [CrossRef] [Green Version]
- Xia, Z.; Wang, X.; Sun, X.; Wang, Q. A Secure and Dynamic Multi-Keyword Ranked Search Scheme over Encrypted Cloud Data. IEEE Trans. Parallel Distrib. Syst. 2016, 27, 340–352. [Google Scholar] [CrossRef]
- Guo, C.; Zhuang, R.; Chang, C.C.; Yuan, Q. Dynamic multi-keyword ranked search based on bloom filter over encrypted cloud data. IEEE Access 2019, 7, 35826–35837. [Google Scholar] [CrossRef]
- Wong, W.K.; Cheung, D.W.; Kao, B.; Mamoulis, N. Secure kNN computation on encrypted databases. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, Providence, RI, USA, 29 June–2 July 2009; pp. 139–152. [Google Scholar]
- Goh, E.J. Secure indexes. IACR Cryptol. EPrint Arch. 2003, 2003, 216. [Google Scholar]
- Wang, B.; Li, M.; Wang, H. Geometric range search on encrypted spatial data. IEEE Trans. Inf. Forensics Secur. 2015, 11, 704–719. [Google Scholar] [CrossRef]
- Xu, G.; Li, H.; Dai, Y.; Yang, K.; Lin, X. Enabling efficient and geometric range query with access control over encrypted spatial data. IEEE Trans. Inf. Forensics Secur. 2018, 14, 870–885. [Google Scholar] [CrossRef]
- Fu, Z.; Wu, X.; Guan, C.; Sun, X.; Ren, K. Toward Efficient Multi-Keyword Fuzzy Search Over Encrypted Outsourced Data With Accuracy Improvement. IEEE Trans. Inf. Forensics Secur. 2017, 11, 2706–2716. [Google Scholar] [CrossRef]
- Kuzu, M.; Islam, M.S.; Kantarcioglu, M. Efficient similarity search over encrypted data. In Proceedings of the 2012 IEEE 28th International Conference on Data Engineering, Arlington, VA, USA, 1–5 April 2012; pp. 1156–1167. [Google Scholar]
- Zhang, Y.; Li, Y.; Wang, Y. Efficient Searchable Symmetric Encryption Supporting Dynamic Multikeyword Ranked Search. Secur. Commun. Netw. 2020, 2020, 1–16. [Google Scholar] [CrossRef]
- Wang, C.; Cao, N.; Ren, K.; Lou, W. Enabling Secure and Efficient Ranked Keyword Search over Outsourced Cloud Data. IEEE Trans. Parallel Distrib. Syst. 2012, 23, 1467–1479. [Google Scholar] [CrossRef]
- Shao, J.; Lu, R.; Guan, Y.; Wei, G. Achieve Efficient and Verifiable Conjunctive and Fuzzy Queries over Encrypted Data in Cloud. IEEE Trans. Serv. Comput. 2020, 15, 124–137. [Google Scholar] [CrossRef]
- Wang, X.; Ma, J.; Liu, X.; Deng, R.H.; Miao, Y.; Zhu, D.; Ma, Z. Search me in the dark: Privacy-preserving boolean range query over encrypted spatial data. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Toronto, ON, Canada, 6–9 July 2020; pp. 2253–2262. [Google Scholar]
- Guo, C.; Chen, X.; Jie, Y.; Fu, Z.; Li, M.; Feng, B. Dynamic multi-phrase ranked search over encrypted data with symmetric searchable encryption. IEEE Trans. Serv. Comput. 2020, 13, 1034–1044. [Google Scholar] [CrossRef]
- Park, D.J.; Kim, K.; Lee, P.J. Public key encryption with conjunctive field keyword search. In Proceedings of the International Workshop on Information Security Applications, Jeju Island, Korea, 23–25 August 2004; pp. 73–86. [Google Scholar]
- Katz, J.; Sahai, A.; Waters, B. Predicate encryption supporting disjunctions, polynomial equations, and inner products. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques, Istanbul, Turkey, 13–17 April 2008; pp. 146–162. [Google Scholar]
- Xu, P.; Tang, S.; Xu, P.; Wu, Q.; Hu, H.; Susilo, W. Practical multi-keyword and boolean search over encrypted e-mail in cloud server. IEEE Trans. Serv. Comput. 2019, 14, 1877–1889. [Google Scholar] [CrossRef]
- Miao, Y.; Liu, X.; Choo, K.K.R.; Deng, R.H.; Li, J.; Li, H.; Ma, J. Privacy-preserving attribute-based keyword search in shared multi-owner setting. IEEE Trans. Dependable Secur. Comput. 2021, 18, 1080–1094. [Google Scholar] [CrossRef]
- Xu, P.; He, S.; Wang, W.; Susilo, W.; Jin, H. Lightweight searchable public-key encryption for cloud-assisted wireless sensor networks. IEEE Trans. Ind. Inform. 2017, 14, 3712–3723. [Google Scholar] [CrossRef]
- Zhang, Y.; Wang, Y.; Li, Y. Searchable Public Key Encryption Supporting Semantic Multi-Keywords Search. IEEE Access 2019, 7, 122078–122090. [Google Scholar] [CrossRef]
- Dhandabani, R.; Periyasamy, S.S.; Padma, T.; Sangaiah, A.K. Six-face cubical key encryption and decryption based on product cipher using hybridisation and Rubik’s cubes. IET Netw. 2018, 7, 313–320. [Google Scholar] [CrossRef]
- Cohen, W.W. Enron E-Mail Dataset. Available online: Http://www.cs.cmu.edu/~./enron/ (accessed on 19 April 2022).
- Sangaiah, A.K.; Javadpour, A.; Ja’fari, F.; Pinto, P.; Ahmadi, H.; Zhang, W. CL-MLSP: The design of detection mechanism for sinkhole attacks in smart cities. Microprocess. Microsyst. 2022, 90, 104504. [Google Scholar] [CrossRef]
- Zhang, J.; Liang, X.; Zhou, F.; Li, B.; Li, Y. TYLER, a fast method that accurately predicts cyclin-dependent proteins by using computation-based motifs and sequence-derived features. Math. Biosci. Eng. 2021, 18, 6410–6429. [Google Scholar] [PubMed]
F | A document set . |
d | The number of documents in F. |
The dictionary of a dataset. | |
N | The number of keywords in the dictionary. |
The keyword set for the document, , where . | |
The number of keywords in , where . | |
The j-th keyword in , where , . | |
The vector representation for . | |
A keyword query. | |
A keyword in Q, where . | |
The vector representation of the query Q. | |
The trapdoor of Q. | |
k document clusters divided from F. | |
The document set in . | |
The j-th document in the cluster, , where , . | |
The vector representation of . | |
k | The number of clusters for dataset clustering. |
The number of documents in each cluster. | |
An index tree for the cluster, , where . | |
The root node for , where . | |
u | A node in an index tree. |
The vector representation of the node u. | |
The index for F. | |
The encrypted index for F. | |
The encrypted index tree for the cluster, , where . | |
t | The number of index trees needed to be search. |
The number of documents needed to be returned. |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
He, W.; Zhang, Y.; Li, Y. Fast, Searchable, Symmetric Encryption Scheme Supporting Ranked Search. Symmetry 2022, 14, 1029. https://doi.org/10.3390/sym14051029
He W, Zhang Y, Li Y. Fast, Searchable, Symmetric Encryption Scheme Supporting Ranked Search. Symmetry. 2022; 14(5):1029. https://doi.org/10.3390/sym14051029
Chicago/Turabian StyleHe, Wei, Yu Zhang, and Yin Li. 2022. "Fast, Searchable, Symmetric Encryption Scheme Supporting Ranked Search" Symmetry 14, no. 5: 1029. https://doi.org/10.3390/sym14051029
APA StyleHe, W., Zhang, Y., & Li, Y. (2022). Fast, Searchable, Symmetric Encryption Scheme Supporting Ranked Search. Symmetry, 14(5), 1029. https://doi.org/10.3390/sym14051029