Next Article in Journal
Cabin as a Home: A Novel Comfort Optimization Framework for IoT Equipped Smart Environments and Applications on Cruise Ships
Next Article in Special Issue
Large-Scale, Fine-Grained, Spatial, and Temporal Analysis, and Prediction of Mobile Phone Users’ Distributions Based upon a Convolution Long Short-Term Model
Previous Article in Journal
CropDeep: The Crop Vision Dataset for Deep-Learning-Based Classification and Detection in Precision Agriculture
Previous Article in Special Issue
A Decentralized Privacy-Preserving Healthcare Blockchain for IoT
 
 
Erratum published on 20 May 2019, see Sensors 2019, 19(10), 2327.
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Searchable Encryption Scheme for Personalized Privacy in IoT-Based Big Data

School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing 100083, China
*
Author to whom correspondence should be addressed.
Sensors 2019, 19(5), 1059; https://doi.org/10.3390/s19051059
Submission received: 16 January 2019 / Revised: 24 February 2019 / Accepted: 25 February 2019 / Published: 1 March 2019
(This article belongs to the Special Issue Big Data Driven IoT for Smart Cities)

Abstract

:
The Internet of things (IoT) has become a significant part of our daily life. Composed of millions of intelligent devices, IoT can interconnect people with the physical world. With the development of IoT technology, the amount of data generated by sensors or devices is increasing dramatically. IoT-based big data has become a very active research area. One of the key issues in IoT-based big data is ensuring the utility of data while preserving privacy. In this paper, we deal with the protection of big data privacy in the data storage phase and propose a searchable encryption scheme satisfying personalized privacy needs. Our proposed scheme works for all file types including text, audio, image, video, etc., and meets different privacy needs of different individuals at the expense of high storage cost. We also show that our proposed scheme satisfies index indistinguishability and trapdoor indistinguishability.

1. Introduction

Internet of Things (IoT) has become a significant part of our daily life over the past few years. A huge number of sensors or intelligent devices have been integrated together to interconnect people with the physical world, which also generates massive sensing data. Data generated by IoT devices are collected, disseminated, and exchanged among different people, business, and societies. With the development of IoT, the amount of data generated by organizations or individuals is increasing dramatically [1].
Although the massive data generated in the IoT environment is of significant value, exploring and using the extraordinary value of IoT data will increase the risk of privacy breach [2]. To obtain profits, the collection, storage, and reuse of our personal data poses a serious threat to our privacy. Consequently, researchers are faced with the challenge of ensuring the utility of data while preserving privacy. Various techniques have been developed to protect data privacy. Generally, these techniques for data privacy can be grouped based on the stages of big data life cycle, as follows [3].
  • Data generation: In the data generation phase, access restriction, and falsifying data techniques are used.
  • Data storage: The approaches in the data storage phase are mainly based on encryption techniques.
  • Data processing: Anonymization techniques as well as clustering, classification, and association rule mining-based techniques are used in the data processing phase.
In this paper, we will focus on the protection of big data privacy in the data storage phase of the big data life cycle. In the IoT environment, the sensing data generated by various sensors and devices will be collected and uploaded to cloud servers, where cloud servers can provide massive storage and cloud computing services. We know that encryption techniques are used for the protection of big data privacy in the data storage phase. When a large amount of encrypted data is stored in cloud servers, the first consideration is confidentiality of the data, which can be ensured by secure and efficient encryption schemes. However, when the data user wants to retrieve the data containing a specific keyword, the cloud server cannot respond to the data user’s retrieval request, because it cannot decrypt the encrypted data. All these problems can be solved by searchable encryption schemes [4,5], such as searchable symmetric encryption [6], public key encryption with keyword search [7], etc. The searchable encryption scheme mainly includes three entities—data owner, data user, and cloud server. The data owner outsources the encrypted data to the cloud server. The data user queries the encrypted data containing a specific keyword to the cloud server. The cloud server stores and retrieves the encrypted data.
In existing searchable encryption schemes, the data user can access all the data owned by the data owner, which can result in a privacy breach for the data owner. On the one hand, the data owner may be willing to share the data with some specific data users, but not with other data users. On the other hand, the data owner may be willing to share specific data with the data user, but not willing to share other data. Therefore, the data user accesses all the data owned by the data owner, which can result in a privacy breach for the data owner. Furthermore, additional information in the data owned by the data owner can also result in a privacy breach for the data owner. Privacy is subjective, and different people have different privacy needs. For example, the hidden text in a typical Word file includes a lot of sensitive personal information [8]. However, this additional information, which may disclose the privacy of the data owner, is useless for some data users. In data mining, data preprocessing is used to transform raw data into an understandable format [9]. In natural language processing, text feature extraction is used to transform a list of words into a feature set that is usable by a classifier [10]. In speech recognition and image recognition, feature extraction is a key step [11,12]. It means that this additional information may be discarded by the data user in the feature extraction phase. In summary, the data user accessing all the data owned by the data owner will result in a privacy breach for the data owner, but will not improve the utility of the data.
In this paper, we will propose a searchable encryption scheme for personalized privacy protection in IoT-based big data. The main contributions of our proposed scheme are as follows:
  • In our proposed scheme, the data owner generates the file features at different levels, and uploads the encrypted file features to the cloud server.
  • The proposed scheme makes a trade-off between ensuring the utility of the data and preserving the privacy, and meets the different privacy needs of different individuals.
The rest of this paper is as follows. Section 2 discusses the recent searchable encryption scheme. Section 3 presents necessary notations and definitions. Section 4 formalizes the searchable encryption scheme for meeting the personalized privacy needs in big data and presents main security definition. Section 5 describes the detailed construction of our proposed scheme. Section 6 discusses the security of our proposed scheme. Section 7 performs real time experimental results and makes a comparison of our proposed scheme with the existing schemes. The last section is the conclusion of this paper.

2. Related Work

Several different searchable encryption schemes have been proposed to allow the data user to retrieve the encrypted data [4,5]. In this section, we give a simple review on the existing work of the searchable encryption schemes.
In 2000, Song et al. [6] first proposed a searchable encryption scheme based on the symmetric encryption algorithm, which is called searchable symmetric encryption (SSE). However, their scheme has the following limitations: it is not proven to be a secure searchable encryption scheme; the distribution of the underlying plaintexts is vulnerable to statistical attacks; and the search time is linear to the length of the document collection. To overcome these limitations, Goh et al. [13] and Chang and Mitzenmacher [14] deployed a masked index table for SSE and introduced the notion of security for indexes. Curtmola et al. [15] generalized the security definitions of SSE and proposed two SSE schemes which are secure under the new security definitions. The search time of their schemes is linear to the number of documents. Subsequently, several SSE schemes were proposed for improvement. For example, Cash et al. [16] proposed an SSE scheme that supports conjunctive search and general Boolean queries on outsourced symmetrically encrypted data; Salam et al. [17] proposed a privacy-preserving data storage and retrieval system in cloud computing; Li et al. [18] proposed three different SSE schemes that can guard against a coercer by using the deniable encryption idea; Soleimanian et al. [19] proposed an SSE scheme to be publicly verifiable.
Although SSE schemes have high efficiency, they suffer from complicated secret key distribution. To resolve this problem, Boneh et al. [7] introduced a searchable encryption scheme based on public key cryptography, namely public key encryption with keyword search (PEKS). Waters et al. [20] showed that the PEKS schemes based on bilinear map could be applied to build encrypted and searchable auditing logs. However, the bilinear pairing operation is very complicated. Di et al. [21] introduced a PEKS scheme without bilinear pairing. The original PEKS scheme in [7] requires a secure channel to transmit the trapdoors. To overcome this limitation, Baek et al. [22] proposed a new PEKS scheme without requiring a secure channel. Byun et al. [23] introduced the off-line keyword-guessing attack (KGA) and pointed out that the original PEKS scheme in [7] was susceptible to KGA. Rhee et al. [24] proposed the notion of trapdoor indistinguishability and showed that trapdoor indistinguishability is a sufficient condition for preventing outside KGAs. Jeong et al. [25] showed that constructing secure PEKS schemes against inside KGA is impossible under the original PEKS framework in [7]. Xu et al. [26] proposed a PEKS scheme to against inside KGA. More recently, various improved PEKS schemes have been proposed. For example, Liang et al. [27] proposed a searchable attribute-based proxy re-encryption system to achieve privacy-preserving keyword search and encrypted data sharing as well as keyword update; Chen et al. [28] proposed a dual-server PEKS scheme to against inside KGA launched by the malicious server; Yang et al. [29] proposed a semantic key word searchable proxy re-encryption scheme for secure cloud storage using lattice-based cryptographic primitives; Wu et al. [30] designed an efficient and secure searchable encryption protocol using the trapdoor permutation function for cloud-based IoT; Yin et al. [31] proposed a ciphertext-policy attribute-based searchable encryption scheme to achieve keyword-based search and fine-grained access control over encrypted data.
Table 1 shows a simple comparison of some existing searchable encryption schemes. In the design of searchable encryption scheme, privacy is a key concern. However, in all the existing searchable encryption schemes, the data user can access all the data owned by the data owner, which can result in a privacy breach for the data owner.

3. Preliminaries

A summary of the notations used in this paper is presented in Table 2.
The set of all binary strings of length n is denoted as { 0 , 1 } n , and the set of all finite binary strings is denoted as { 0 , 1 } .
An index table (or dictionary) denotes the data structure of the form I [ k e y ] = v a l u e . Given a k e y , the v a l u e matching the k e y is returned.
A function μ : N N is negligible if for every positive polynomial p ( · ) and all sufficiently large λ , μ ( λ ) < 1 p ( λ ) . We similarly write f ( λ ) = negl ( λ ) to mean that there exists a negligible function μ ( · ) such that f ( λ ) μ ( λ ) for all sufficiently large λ .
The following basic cryptographic primitives can be found in [32].
A symmetric encryption scheme is a tuple E = ( G e n , E n c , D e c ) of probabilistic, polynomial-time (PPT) algorithms, where G e n takes the security parameter λ as input, and outputs a secret key k; E n c takes a key k and a message m { 0 , 1 } as input, and outputs a ciphertext c = E n c ( k , m ) ; D e c takes a key k and a ciphertext c as input, and outputs m if c = E n c ( k , m ) .
For any symmetric encryption scheme E = ( G e n , E n c , D e c ) , any adversary A and any value λ for the security parameter, the chosen-plaintext attack (CPA) indistinguishability experiment S E A , E c p a ( λ ) is defined as:
  • A random key k is generated by running G e n ( λ ) .
  • The adversary A is given input λ and oracle access to E n c ( k , · ) , and outputs a pair of messages m 0 , m 1 of the same length.
  • A random bit b { 0 , 1 } is chosen, and then a ciphertext c = E n c ( k , m b ) is computed and given to A. c is called the challenge ciphertext.
  • The adversary A continues to have oracle access to E n c ( k , · ) , and outputs a bit b .
  • The output of the experiment is defined to be 1 if b = b , and 0 otherwise. In the case S E A , E c p a ( λ ) = 1 , we say that A succeeded.
Definition 1.
A symmetric encryption scheme E = ( G e n , E n c , D e c ) is CPA-secure if for all PPT adversaries A there exists a negligible function negl such that
P r [ S E A , E c p a ( λ ) = 1 ] 1 2 + negl ( λ ) ,
where the probability is taken over the random coins used by A, as well as the random coins used in the CPA indistinguishability experiment.
For any adversary A and any value λ for the security parameter, the computational Diffie-Hellman (CDH) experiment C D H A , S e t u p ( λ ) is defined as:
  • Run S e t u p ( λ ) to obtain output ( G , q , g ) , where G is a cyclic group of order q (with bit length λ ) and g is a generator of G .
  • Randomly choose a, b Z q .
  • A is given G , q, g, g a , g b and outputs h G .
  • The output of the experiment is defined to be 1 if h = g a b , and 0 otherwise.
Definition 2.
The CDH problem is hard relative to S e t u p if for all PPT adversaries A there exists a negligible function negl such that
P r [ C D H A , S e t u p ( λ ) = 1 ] negl ( λ ) .

4. System Model

The searchable encryption scheme for personalized privacy protection mainly includes three entities, i.e., the data owner, the data user, and cloud server. The data owner outsources the encrypted file features to the cloud server. The data user queries the encrypted file features containing a specific keyword to the cloud server. The cloud server stores and retrieves the encrypted file features. As the existing searchable encryption schemes, in this paper, the data owner is considered fully trusted. The data user is considered malicious, which means it may attempt to learn more information than it can retrieve. The cloud server is considered honest but curious in the sense that it may try to learn as much information as possible from the stored encrypted data and correctly execute the searchable encryption protocol.
Given n files F i , 1 i n , and a non-negative integer l, let F i l denote the file feature of F i at level l. Specially, let F i 0 = F i , i.e., the file feature of F i at level 0 is still F i .
Let n f + 1 denote the number of the file feature level (FFL). The data owner wishes to store the file features set F = { F i l : 1 i n , 0 l n f } on the cloud server. The objectives of the data owner are as follows:
  • For 1 i n , 0 l n f , the file feature F i l are stored on the cloud server such that the confidentiality of F i l is preserved.
  • The data user queries for a keyword w and an FFL l to retrieve all authorized file features F i l such that w F i l 0 for a given l 0 in a secure and efficient way.

4.1. Formal Definition

The searchable encryption scheme for meeting the personalized privacy needs consists of the following algorithms:
  • S e t u p ( λ ) : This algorithm is run by the data owner. It takes the security parameter λ as input, and outputs the global parameter Λ .
  • K e y G e n ( Λ ) : This algorithm is run by the data owner and the data user, respectively. It takes the global parameter Λ as input, and outputs public/private key pairs ( p k o , s k o ) and ( p k u , s k u ) for the data owner and the data user, respectively.
  • S t o r e ( F , p k u , s k o ) : This algorithm is run by the data owner. It takes the file features set F , the data user’s public key p k u and the data owner’s private key s k o as input, and outputs the encrypted file features set F and the encrypted index set I n d .
  • T r a p d o o r ( w , l , p k o , s k u ) : This algorithm is run by the data user. It takes a keyword w, an FFL l, the data owner’s public key p k o , and the data user’s private key s k u as input, and outputs the trapdoor T w , l .
  • S e a r c h ( F , I n d , T w , l ) : This algorithm is performed interactively between the cloud server and the data user. It takes the encrypted file features set F , the encrypted index set I n d , and the trapdoor T w , l as input, and outputs all authorization file features F i l such that w F i l 0 for a given l 0 .

4.2. Security Definition

The searchable encryption scheme for meeting the personalized privacy needs must satisfy the index indistinguishability and the trapdoor indistinguishability under chosen keyword-FFL pair attack. As per literature [15], we define two challenge-response games G a m e I and G a m e T between the adversary A and the challenger C to show the index indistinguishability and the trapdoor indistinguishability under chosen keyword-FFL pair attack, respectively.
The adversary A plays G a m e I with the challenger C and attempts to distinguish an encrypted index of the given keyword-FFL pair from some encrypted indexes. If A wins G a m e I , then A has obtained some useful information from some encrypted indexes.
   G a m e I :
Setup: 
Challenger C runs S e t u p ( λ ) and K e y G e n ( Λ ) to generate the global parameter Λ and the public/private key pairs ( p k o , s k o ) and ( p k u , s k u ) of the data owner and the data user respectively, and sends Λ , p k o and p k u to A.
Adaptive query:
The adversary A makes the following queries to C:
-
The adversary A adaptively selects the keyword-FFL pair ( w , l ) for the encrypted index query. C responds with I n d [ w ] .
-
The adversary A adaptively selects the keyword-FFL pair ( w , l ) for the trapdoor query. C responds with T w , l .
Challenge: 
The adversary A sends two challenged keyword-FFL pairs ( w 0 , l 0 ) , ( w 1 , l 1 ) to C. C picks a random number b { 0 , 1 } and sends the encrypted index I n d [ w b ] of the keyword-FFL pair ( w b , l b ) to A.
Guess: 
The adversary A outputs b { 0 , 1 } and wins the game if b = b .
Definition 3.
We say the searchable encryption scheme for meeting the personalized privacy needs satisfies the index indistinguishability under chosen keyword-FFL pair attack if for all PPT adversaries A there exists a negligible function negl such that
P r [ A w i n s G a m e I ] 1 2 + negl ( λ ) .
Adversary A plays G a m e T with challenger C and attempts to distinguish a trapdoor of the given keyword-FFL pair from some trapdoors. If A wins G a m e T , then A has obtained some useful information from some trapdoors.
   G a m e T :
Setup: 
C runs S e t u p ( λ ) and K e y G e n ( λ ) to generate the global parameter Λ and the public/private key pairs ( p k o , s k o ) and ( p k u , s k u ) of the data owner and the data user respectively, and sends Λ , p k o and p k u to A.
Adaptive query:
A makes the following queries to C:
-
Adversary A adaptively selects the keyword-FFL pair ( w , l ) for the encrypted index query. C responds with I n d [ w ] .
-
Adversary A adaptively selects the keyword-FFL pair ( w , l ) for the trapdoor query. C responds with T w , l .
Challenge: 
Adversary A sends two challenged keyword-FFL pairs ( w 0 , l 0 ) , ( w 1 , l 1 ) to C. C picks a random number b { 0 , 1 } and sends the trapdoor T w b , l b of the keyword-FFL pair ( w b , l b ) to A.
Guess: 
Adversary A outputs b { 0 , 1 } and wins the game if b = b .
Definition 4.
We say the searchable encryption scheme for meeting the personalized privacy needs satisfies the trapdoor indistinguishability under chosen keyword-FFL pair attack if for all PPT adversaries A there exists a negligible function negl such that
P r [ A w i n s G a m e T ] 1 2 + negl ( λ ) .

5. Proposed Scheme

In this section, we present our proposed searchable encryption scheme for meeting the personalized privacy needs. It consists of the following algorithms.
S e t u p ( λ ) is run by the data owner. It takes the security parameter λ as input, and performs the following:
  • Choose a cyclic group G of prime order q and a generator g of G .
  • Choose a symmetric encryption scheme E = ( G e n , E n c , D e c ) .
  • Choose two collision-resistant hash functions H 1 : G { 0 , 1 } λ and H 2 : { 0 , 1 } { 0 , 1 } λ .
  • Set the global parameter Λ = ( G , q , g , E , H 1 , H 2 ) .
K e y G e n ( Λ ) is run by the data owner and the data user, respectively. It takes the global parameter Λ as input, and performs the following:
  • Randomly select two elements k o and k u in Z q as the private keys of the data owner and the data user, respectively.
  • Compute g k o and g k u in G as the public keys of the data owner and the data user, respectively.
S t o r e ( F , p k u , s k o ) is run by the data owner. It takes the file features set F , the data user’s public key p k u = g k u and the data owner’s private key s k o = k o as input, and performs the following:
  • Compute k 1 = H 1 ( ( g k u ) k o ) .
  • For 1 i n , 0 l n f , randomly select i d i l { 0 , 1 } λ as the identifier of F i l , run algorithm G e n ( λ ) to generate the encryption key e k i l of F i l , and compute i d i l = E n c ( k 1 , i d i l ) , e k i l = E n c ( k 1 , e k i l ) , F i l = E n c ( e k i l , F i l ) .
  • Create the index table F such that F [ i d i l ] = F i l for every 1 i n and 0 l n f .
  • Given an FFL l 0 , create the keyword set W l 0 of the file features set { F i l 0 : 1 i n } .
  • For w W l 0 , compute w = E n c ( k 1 , H 2 ( w ) ) .
  • For 0 l n f , compute l = E n c ( k 1 , H 2 ( l ) ) .
  • For 1 i n , construct the set L i of the authorized FFL of the file F i . In other words, l L i implies the date user has authorization to access the file feature F i l .
  • Create the index table I n d such that I n d [ w ] = { ( i d i l , e k i l , l ) : w F i l 0 , l L i , 1 i n } for every w W l 0 .
  • Send F and I n d to the cloud server.
T r a p d o o r ( w , l , p k o , s k u ) is run by the data user. It takes a keyword w, an FFL l, the data owner’s public key p k o = g k o and the data user’s private key s k u = k u as input, and performs the following:
  • Compute k 2 = H 1 ( ( g k u ) k o ) .
  • Compute T w , l = E n c ( k 2 , H 2 ( w ) ) , E n c ( k 2 , H 2 ( l ) ) .
S e a r c h ( F , I n d , T w , l ) is performed interactively between the cloud server and the data user. It takes the encrypted file features set F , the encrypted index set I n d and the trapdoor T w , l as input, and performs the following:
  • The cloud server: Given T w , l = ( T 1 , T 2 ) , search I n d [ T 1 ] to obtain the set S = { ( s 1 , s 2 , s 3 ) I n d [ T 1 ] : s 3 = T 2 } and send S to the data user.
  • The data user: Given S , create two index tables S 1 and S 2 such that S 1 [ r s ] = D e c ( k 2 , s 1 ) , S 2 [ r s ] = D e c ( k 2 , s 2 ) for every s = ( s 1 , s 2 , s 3 ) S , where k 2 = H 1 ( ( g k u ) k o ) and r s ( s S ) are randomly selected in { 0 , 1 } λ . Send S 1 to the cloud server and store S 2 .
  • The cloud server: Given S 1 , create the index table R such that R [ r s ] = F [ S 1 [ r s ] ] for every k e y r s in S 1 and send R to the data user.
  • The data user: Given S 2 and R , compute D e c ( S 2 [ r s ] , R [ r s ] ) for every k e y r s in S 2 .
Remark 1.
Please note that k 1 = H 1 ( ( g k u ) k o ) = H 1 ( ( g k u ) k o ) = k 2 , then T 1 = w , T 2 = l . Thus, s 1 = i d i l , s 2 = e k i l , S 1 [ r s ] = i d i l , S 2 [ r s ] = e k i l , R [ r s ] = F [ S 1 [ r s ] ] = F i l for every s = ( s 1 , s 2 , s 3 ) S , where w F i l 0 , l L i , 1 i n . Therefore, our proposed scheme is correct.
Given an FFL l 0 , creating the keyword set W l 0 of the file features subset { F i l 0 : 1 i n } means that F i l 0 , 1 i n must be text. Thus, our proposed scheme works for all file types including text, audio, image, video, etc. as long as there exists an FFL l 0 such that the file feature of the file at l 0 is text.
If the authorized FFL set of the ordinal file is only created by the data owner, then the data user cannot access to the unauthorized file features, thus our proposed scheme meets the different privacy needs of different individuals.
Our proposed scheme can be extended to the multi-user scenario. Let n o and n u be the number of the data owners and the data users, respectively. In the multi-user scenario, the public/private key pairs are first generated for every data owner and the data user; the file features stored on the cloud server is an n o -ary vector, where the i-th element is the encrypted file features set of the i-th data owner; the index stored on the cloud server is an n o × n u matrix, where the i-th row and j-th column element is the encrypted index set that the i-th data owner created for the j-th data user.
It is obvious that our proposed scheme needs increasing storage space when n f is getting bigger. In particular, our proposed scheme has similar storage space to the existing searchable encryption schemes when n f = 0 .

6. Security Analysis

In this section, we show that our proposed scheme satisfies the index indistinguishability and the trapdoor indistinguishability under chosen keyword-FFL pair attack.
Theorem 1.
If E = ( G e n , E n c , D e c ) is CPA-Secure and the CDH problem is hard relative to S e t u p , then our proposed scheme satisfies the index indistinguishability under chosen keyword-FFL pair attack.
Proof. 
If there exists a PPT, and adversary A wins G a m e I , then there exists a simulator B such that S E B , E c p a ( λ ) = 1 or C D H B , S e t u p c p a ( λ ) = 1 .
In the setup phase, C runs S e t u p ( λ ) and K e y G e n ( Λ ) to generate the global parameter Λ = ( G , q , g , E , H 1 , H 2 ) , and the public/private key pairs ( p k o = g k o , s k o = k o ) and ( p k u = g k u , s k u = k u ) of the data owner and the data user respectively. Then, C sends Λ , p k o = g k o and p k u = g k u to A.
In the adaptive query phase, assume A makes n q - 1 queries to C adaptively. The q-th query can be:
-
A adaptively selects the keyword-FFL pair ( w q , l q ) for the encrypted index query. C responds with I n d [ w q ] = { ( i d i l q , e k i l q , l q ) : w q D i , l q L i , 1 i n } , where L i is the authorized FFL set of F i , i d i l q = E n c ( k 1 , i d i l q ) , e k i l q = E n c ( k 1 , e k i l q ) , l q = E n c ( k 1 , H 2 ( l q ) ) , k 1 = H 1 ( ( g k o ) k u ) .
-
A adaptively selects the keyword-FFL pair ( w q , l q ) for the trapdoor query. C responds with T w q , l q = ( E n c ( k 2 , H 2 ( w q ) ) , E n c ( k 2 , H 2 ( l q ) ) , where k 2 = H 1 ( ( g k u ) k o ) .
In the challenge phase, A sends two challenged keyword-FFL pairs ( w 0 , l 0 ) , ( w 1 , l 1 ) to C. C picks a random number b { 0 , 1 } and sends the encrypted index I n d [ w b ] = { ( i d i l b , e k i l b , l b ) : w b D i , l b L i , 1 i n } of the keyword-FFL pair ( w b , l b ) to A, where i d i l b = E n c ( k 1 , i d i l b ) , e k i l b = E n c ( k 1 , e k i l b ) , l b = E n c ( k 1 , H 2 ( l b ) ) and k 2 = H 1 ( ( g k 1 ) k u ) .
In the guess phase, A outputs its guess b 1 { 0 , 1 } indicating whether the challenge I n d [ w b ] is the encrypted index of ( w 0 , l 0 ) or ( w 1 , l 1 ) .
From the perspective of A, i d i l q = E n c ( k 1 , i d i l q ) and e k i l q = E n c ( k 1 , e k i l q ) are random values in { 0 , 1 } λ for every 1 i n and 2 q n q . Please note that k 1 = H 1 ( ( g k u ) k o ) = H 1 ( ( g k o ) k u ) = k 2 . Then the information obtained by the adversary A in G a m e I was the same as the information obtained by a simulator B in the CPA indistinguishability experiment S E A , E c p a ( λ ) and in the CDH experiment C D H A , S e t u p ( λ ) . Thus, if A wins G a m e I then S E B , E c p a ( λ ) = 1 or C D H B , S e t u p ( λ ) = 1 , i.e.,
P r [ A wins G a m e I ] S E B , E c p a ( λ ) + C D H B , s e t u p ( λ ) 1 2 + negl ( λ ) .
Therefore, our proposed scheme satisfies the index indistinguishability under chosen keyword-FFL pair attack if E = ( G e n , E n c , D e c ) is CPA-Secure and the CDH problem is hard relative to S e t u p . □
Similarly, we can prove the following theorem:
Theorem 2.
If E = ( G e n , E n c , D e c ) is CPA-Secure and the CDH problem is hard relative to S e t u p , then our proposed scheme satisfies the trapdoor indistinguishability under chosen keyword-FFL pair attack.

7. Performance Analysis

As shown in Table 3, we present a comprehensive comparison of the computation cost between our proposed scheme and some existing searchable encryption schemes. The notations used in Table 3 are as follows:
  • T b p : Time cost for a bilinear pairing.
  • T h : Time cost for a hash function.
  • T e x p : Time cost for an exponentiation operation in G .
  • T m u l : Time cost for a multiplication operation in G .
  • T e n c : Time cost for an encryption process of E .
  • T d e c : Time cost for a decryption process of E .
To meet the basic security level for comparison, SHA-256 and AES-256 is selected as the collision-resistant hash function and the symmetric encryption scheme, respectively. The cyclic group G of order q is generated by a point on an elliptic curve E ( F p ) , where q and p are the 256-bits and 521-bits prime numbers, respectively. To evaluate the efficiency of the five schemes, we perform our experiments on a computer with 2.4 GHz Intel Core i7 and 8 GB RAM.
As shown in Figure 1, Figure 2 and Figure 3, our proposed scheme is the most efficient in storage phase and search phase. In trapdoor phase, our proposed scheme has a higher computational cost than that of Boneh et al. [7], although it is still lower than other schemes. In summary, the performance of our proposed scheme is more efficient than four schemes studied in [7,24,26,28].

8. Conclusions

In this paper, we have proposed a searchable encryption scheme for meeting personalized privacy needs. Our proposed scheme mainly includes three entities, i.e., the data owner, the data user, and cloud server. The data owner outsources the encrypted file features to the cloud server. The data user queries the encrypted file features containing a specific keyword to the cloud server. The cloud server stores and retrieves the encrypted file features. Compared with the existing searchable encryption schemes, our proposed scheme works for all file types including text, audio, image, video, etc., and meets different privacy needs of different individuals at the expense of high storage cost. We also show that our proposed scheme satisfies index indistinguishability and trapdoor indistinguishability under chosen keyword-FFL pair attack. In other words, our proposed scheme is secure against inside KGA. Performance analysis shows that our proposed scheme is efficient in storage phase, trapdoor phase, and search phase.
Considering the decreasing costs of storage, storage cost is not a problem if n f + 1 , i.e., the number of the FFL is small in our proposed scheme. However, storage cost is still a problem if n f is too large in our proposed scheme. Thus, choosing an appropriate n f is an important work in the future.

Author Contributions

Writing—original draft preparation, S.L.; writing—review and editing, M.L., H.X. and W.Z.

Funding

This work is supported by the National Key R&D Program of China (No. 2018YFB1003905) and the National Natural Science Foundation of China under Grant (No. U1603116, No. 61701020).

Acknowledgments

The authors would like to thank the editor and the anonymous reviewers for their valuable comments and suggestions that improved the quality of this paper.

Conflicts of Interest

The authors declare no conflicts of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
IOTInternet of Things
SSEsearchable symmetric encryption
PKESpublic key encryption with keyword search
KGAkeyword guessing attac
FFLThe file feature level

References

  1. Lohr, S. The age of big data. New York Times, 11 February 2012. [Google Scholar]
  2. John Walker, S. Big Data: A Revolution That Will Transform How We Live, Work, and Think; Houghton Mifflin Harcourt: Boston, MA, USA, 2014. [Google Scholar]
  3. Mehmood, A.; Natgunanathan, I.; Xiang, Y.; Hua, G.; Guo, S. Protection of big data privacy. IEEE Access 2016, 4, 1821–1834. [Google Scholar] [CrossRef]
  4. Bösch, C.; Hartel, P.; Jonker, W.; Peter, A. A survey of provably secure searchable encryption. ACM Comput. Surv. 2015, 47, 18. [Google Scholar] [CrossRef]
  5. Poh, G.S.; Chin, J.J.; Yau, W.C.; Choo, K.K.R.; Mohamad, M.S. Searchable symmetric encryption: Designs and challenges. ACM Comput. Surv. 2017, 50, 40. [Google Scholar] [CrossRef]
  6. Song, D.X.; Wagner, D.; Perrig, A. Practical techniques for searches on encrypted data. In Proceedings of the 2000 IEEE Symposium on Security and Privacy, Berkeley, CA, USA, 14–17 May 2000; pp. 44–55. [Google Scholar]
  7. Boneh, D.; Di Crescenzo, G.; Ostrovsky, R.; Persiano, G. Public key encryption with keyword search. In Proceedings of the International Conference on the Theory and Applications of Cryptographic Techniques, Interlaken, Switzerland, 2–6 May 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 506–522. [Google Scholar]
  8. Byers, S. Information leakage caused by hidden data in published documents. IEEE Secur. Privacy 2004, 2, 23–27. [Google Scholar] [CrossRef]
  9. Hand, D.J. Principles of data mining. Drug Safety 2007, 30, 621–622. [Google Scholar] [CrossRef] [PubMed]
  10. Feldman, R.; Sanger, J. The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data; Cambridge University Press: Cambridge, UK, 2007. [Google Scholar]
  11. Hirsch, H.G.; Pearce, D. The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In Proceedings of the ASR2000-Automatic Speech Recognition: Challenges for the New Millenium ISCA Tutorial and Research Workshop (ITRW), Beijing, China, 16–20 October 2000. [Google Scholar]
  12. Hong, Z.Q. Algebraic feature extraction of image for recognition. Pattern Recogn. 1991, 24, 211–219. [Google Scholar] [CrossRef]
  13. Goh, E.J. Secure indexes. IACR Cryptol. ePrint Arch. 2003, 2003, 216. [Google Scholar]
  14. Chang, Y.C.; Mitzenmacher, M. Privacy preserving keyword searches on remote encrypted data. In Proceedings of the International Conference on Applied Cryptography and Network Security, New York, NY, USA, 7–10 June 2005; pp. 442–455. [Google Scholar]
  15. Curtmola, R.; Garay, J.; Kamara, S.; Ostrovsky, R. Searchable symmetric encryption: Improved definitions and efficient constructions. J. Comput. Secur. 2011, 19, 895–934. [Google Scholar] [CrossRef]
  16. Cash, D.; Jarecki, S.; Jutla, C.; Krawczyk, H.; Roşu, M.C.; Steiner, M. Highly-scalable searchable symmetric encryption with support for boolean queries. In Advances in Cryptology–CRYPTO 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 353–373. [Google Scholar]
  17. Salam, M.I.; Yau, W.C.; Chin, J.J.; Heng, S.H.; Ling, H.C.; Phan, R.C.; Poh, G.S.; Tan, S.Y.; Yap, W.S. Implementation of searchable symmetric encryption for privacy-preserving keyword search on cloud storage. Hum. Centr. Comput. Inf. Sci. 2015, 5, 19. [Google Scholar] [CrossRef]
  18. Li, H.; Zhang, F.; Fan, C.I. Deniable searchable symmetric encryption. Inf. Sci. 2017, 402, 233–243. [Google Scholar] [CrossRef]
  19. Soleimanian, A.; Khazaei, S. Publicly verifiable searchable symmetric encryption based on efficient cryptographic components. Des. Codes Cryptogr. 2019, 87, 123–147. [Google Scholar] [CrossRef]
  20. Waters, B.R.; Balfanz, D.; Durfee, G.; Smetters, D.K. Building an Encrypted and Searchable Audit Log; NDSS: San Diego, CA, USA, 2004; Volume 4, pp. 5–6. [Google Scholar]
  21. Di Crescenzo, G.; Saraswat, V. Public key encryption with searchable keywords based on Jacobi symbols. In Proceedings of the International Conference on Cryptology in India, Chennai, India, 9–13 December 2007; pp. 282–296. [Google Scholar]
  22. Baek, J.; Safavi-Naini, R.; Susilo, W. Public key encryption with keyword search revisited. In Proceedings of the International conference on Computational Science and Its Applications, Perugia, Italy, 30 June–3 July 2008; pp. 1249–1259. [Google Scholar]
  23. Byun, J.W.; Rhee, H.S.; Park, H.A.; Lee, D.H. Off-line keyword guessing attacks on recent keyword search schemes over encrypted data. In Proceedings of the Workshop on Secure Data Management, Seoul, Korea, 10–11 September 2006; pp. 75–83. [Google Scholar]
  24. Rhee, H.S.; Park, J.H.; Susilo, W.; Lee, D.H. Trapdoor security in a searchable public-key encryption scheme with a designated tester. J. Syst. Softw. 2010, 83, 763–771. [Google Scholar] [CrossRef]
  25. Jeong, I.R.; Kwon, J.O.; Hong, D.; Lee, D.H. Constructing PEKS schemes secure against keyword guessing attacks is possible? Comput. Commun. 2009, 32, 394–396. [Google Scholar] [CrossRef]
  26. Xu, P.; Jin, H.; Wu, Q.; Wang, W. Public-key encryption with fuzzy keyword search: A provably secure scheme under keyword guessing attack. IEEE Trans. Comput. 2013, 62, 2266–2277. [Google Scholar] [CrossRef]
  27. Liang, K.; Susilo, W. Searchable attribute-based mechanism with efficient data sharing for secure cloud storage. IEEE Trans. Inf. Forensics Secur. 2015, 10, 1981–1992. [Google Scholar] [CrossRef]
  28. Chen, R.; Mu, Y.; Yang, G.; Guo, F.; Wang, X. Dual-server public-key encryption with keyword search for secure cloud storage. IEEE Trans. Inf. Forensics Secur. 2016, 11, 789–798. [Google Scholar] [CrossRef]
  29. Yang, Y.; Zheng, X.; Chang, V.; Tang, C. Semantic keyword searchable proxy re-encryption for postquantum secure cloud storage. Concurr. Comput. Pract. Exp. 2017, 29, e4211. [Google Scholar] [CrossRef]
  30. Wu, L.; Chen, B.; Zeadally, S.; He, D. An efficient and secure searchable public key encryption scheme with privacy protection for cloud storage. Soft Comput. 2018, 22, 7685–7696. [Google Scholar] [CrossRef]
  31. Yin, H.; Zhang, J.; Xiong, Y.; Ou, L.; Li, F.; Liao, S.; Li, K. CP-ABSE: A Ciphertext-Policy Attribute-Based Searchable Encryption Scheme. IEEE Access 2019, 7, 5682–5694. [Google Scholar] [CrossRef]
  32. Lindell, Y.; Katz, J. Introduction to Modern Cryptography; Chapman and Hall/CRC: Boca Raton, FL, USA, 2014. [Google Scholar]
Figure 1. Computation cost at storage phase.
Figure 1. Computation cost at storage phase.
Sensors 19 01059 g001
Figure 2. Computation cost at trapdoor phase.
Figure 2. Computation cost at trapdoor phase.
Sensors 19 01059 g002
Figure 3. Computation cost at search phase.
Figure 3. Computation cost at search phase.
Sensors 19 01059 g003
Table 1. A comparison of some existing searchable encryption schemes.
Table 1. A comparison of some existing searchable encryption schemes.
TypeLimitationCharacteristicLiterature
SSEneed key distributionmasked index[13,14]
boolean queries[16]
against the coercer[18]
publicly verifiable[19]
PEKSlower search efficiencywithout bilinear pairing[21]
without secure channel[22]
keyword update[27]
against inside KGA[28]
synonym keyword search[29]
fine-grained access control[31]
Table 2. Summary of notations.
Table 2. Summary of notations.
NotationDescription
λ The security parameter
G A cyclic group of order q
gA generator of G
negl ( λ ) A negligible function with respect to λ
G A cyclic group of order q
gA generator of G
( p k o , s k o ) The public/private key pairs for the data owner
( p k u , s k u ) The public/private key pairs for the data user
nThe number of the file of the data owner
F i The i-th file of the data owner ( 1 i n )
n f + 1 The number of the file feature level
lA file feature level ( 0 l n f )
L i The set of the authorized file feature level of F i
F i l The file feature of F i at level l
F The file features set { F i l : 1 i n , 0 l n f }
F The encrypted file features set
W l 0 The keyword set of the file features set { F i l 0 : 1 i n }
wA keyword in W l 0
I n d The index set
I n d The encrypted index set
T w , l The trapdoor with respect to w and l
Table 3. Computation cost: a comprehensive comparison.
Table 3. Computation cost: a comprehensive comparison.
SchemeComputation
Storage PhaseTrapdoor PhaseSearch Phase
Boneh et al. [7] T b p + 2 T h + 2 T e x p T h + T e x p T b p + T h
Rhee et al. [24] T b p + 2 T h + 2 T e x p 2 T h + 3 T e x p T b p + 2 T h + 2 T e x p + T m u l
Xu et al. [26] 2 T b p + 4 T h + 4 T e x p 2 T h + 2 T e x p 2 T b p + 2 T h
Chen et al. [28] T h + 4 T e x p + 2 T m u l T h + 4 T e x p + 2 T m u l 7 T e x p + 3 T m u l
Our scheme T e x p + 3 T h + 5 T e n c T e x p + 3 T h + 2 T e n c T e x p + T h + 2 T d e c

Share and Cite

MDPI and ACS Style

Li, S.; Li, M.; Xu, H.; Zhou, X. Searchable Encryption Scheme for Personalized Privacy in IoT-Based Big Data. Sensors 2019, 19, 1059. https://doi.org/10.3390/s19051059

AMA Style

Li S, Li M, Xu H, Zhou X. Searchable Encryption Scheme for Personalized Privacy in IoT-Based Big Data. Sensors. 2019; 19(5):1059. https://doi.org/10.3390/s19051059

Chicago/Turabian Style

Li, Shuai, Miao Li, Haitao Xu, and Xianwei Zhou. 2019. "Searchable Encryption Scheme for Personalized Privacy in IoT-Based Big Data" Sensors 19, no. 5: 1059. https://doi.org/10.3390/s19051059

APA Style

Li, S., Li, M., Xu, H., & Zhou, X. (2019). Searchable Encryption Scheme for Personalized Privacy in IoT-Based Big Data. Sensors, 19(5), 1059. https://doi.org/10.3390/s19051059

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop