Next Article in Journal
Research on Magnetic-Thermal-Force Multi-Physical Field Coupling of a High-Frequency Transformer with Different Winding Arrangements
Next Article in Special Issue
Enhancing Security and Trust in Internet of Things through Meshtastic Protocol Utilising Low-Range Technology
Previous Article in Journal
An Accuracy Enhanced Vision Language Grounding Method Fused with Gaze Intention
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Secure and Efficient Dynamic Analysis Scheme for Genome Data within SGX-Assisted Servers

Software College, Northeastern University, Shenyang 110169, China
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(24), 5004; https://doi.org/10.3390/electronics12245004
Submission received: 14 November 2023 / Revised: 11 December 2023 / Accepted: 12 December 2023 / Published: 14 December 2023
(This article belongs to the Special Issue Security and Trust in Internet of Things and Edge Computing)

Abstract

:
With the rapid development of the Internet of Things (IoT), more and more user devices access the network and generate large amounts of genome data. These genome data possess significant medical value when researched. However, traditional genome analysis confronts security and efficiency challenges, including access pattern leakage, low efficiency, and single analysis methods. Thus, we propose a secure and efficient dynamic analysis scheme for genome data within a Software Guard Extension (SGX)-assisted server, called SEDASGX. Our approach involves designing a secure analysis framework based on SGXs and implementing various analysis methods within the enclave. The access pattern of genome data is always obfuscated during the analysis and update process, ensuring privacy and security. Furthermore, our scheme not only achieves higher analysis efficiency but also enables dynamic updating of genome data. Our results indicate that the SEDASGX analysis method is nearly 2.5 times more efficient than non-SGX methods, significantly enhancing the analysis speed of large-scale genome data.

1. Introduction

Cloud computing platforms [1] offer elastic storage space and stronger computing power for gene data. With the rising development of e-healthcare technologies, the pool of genetic data collected from distributed healthcare devices and centers is growing explosively. Thus, the genetic data collected will be exposed and distributed among multiple healthcare devices or centers. However, genomes can range anywhere from 4000 bases to 670 Gb, and involve important personal privacy. For example, humans have two copies of their inherited genome of 3.2 Gb each. Genomes are stored in VCF file format. VCF is one of the important file formats in the biomedical domain because of its critical role in describing DNA and RNA variants. VCF can describe single- and multi-nucleotide polymorphisms (SNPs and MNPs), insertions and deletions (INDELs), and simple structural variants (SVs) against a reference genome [2]. The most common mutation in the human population is called single nucleotide polymorphism (SNP). It is the variation in a single nucleotide at a particular position of the genome. There are about 5 million SNPs observed per individual, and sensitive information about individuals (such as disease predispositions) are typically inferred by analyzing the SNPs. How to securely share genome data and efficiently analyze them in the IoT environment is needed to solve the problem of information islands [3]. Therefore, the designed scheme must not only ensure the privacy and security of genome data, but also ensure the security and efficiency of the genome data analysis process. The main reason is that personal genome data can carry sensitive information, including information that can reveal the identity of the owner [4] and even facial features [5]. For example, Claes et al. have developed a 3D model of human faces based on gender, ancestral genomes, and facial features [5], highlighting the potential risks of sharing sensitive genetic data.
Unfortunately, while traditional encryption algorithms, such as homomorphic encryption (HE) [6,7,8,9] and secure multi-party computing (SMC) [10,11,12] can ensure the confidentiality of gene data, they cannot be applied to massive genome data scenarios due to high computational overheads and low computational efficiency. The emergence of trusted execution environments (TEEs), such as Intel SGX, has made it possible to operate a device with genome data in a trusted, isolated region called an enclave. Thus, the TEE brings neither high computational overheads nor restrictions based on software technology and makes it possible to securely and efficiently analyze massive genome data.
Nevertheless, existing schemes have various drawbacks. Most of the traditional genetic data analysis schemes [6,7,8,9,10,11,12] are based on homomorphic encryption and secure multi-party computation. These schemes suffer from the problem that the communication cost between the client and the server is too high. In addition, the practical application of homomorphic encryption and secure multi-party computing technology still has the problem of low efficiency in computing large-scale data. The emergence of trusted execution environments has brought about a turning point in the above problems. While the emergence of a trusted execution environment can to some extent alleviate the efficiency issues in large-scale genetic data analysis, many schemes [13,14,15] based on trusted execution environments still have shortcomings in aspects such as data access patterns, single-gene data analysis methods, dynamic updates, and multi-user access control. For example, Chen et al. first proposed a secure outsourcing genetic testing framework based on SGX in [13]. Mandal et al. built a practical, private data oblivious genome variants search using Intel SGX in [14]. Can et al. proposed a hardware–software hybrid approach SkSES to perform statistical tests on genomic data presented as VCF files from different countries in [15]. Previous schemes have not addressed a series of issues such as data access pattern leakage, single analysis methods, and dynamic data update during multi-user analysis in IoT scenarios. Thus, we propose a secure and efficient dynamic analysis scheme for genome data within an SGX-assisted server. The main contributions of SEDASGX are summarized as follows.
  • SEDASGX provides a multi-party genome data analysis architecture for edge computing scenarios based on Intel SGX. This architecture uses the AES-GCM algorithm and the attributes of SGX to ensure the confidentiality and integrity of the code, genetic data, and analysis results. In addition, even if the terminal device does not require hardware support, it still meets the needs of users for uploading and analysis, reducing the hardware requirements of end users.
  • SEDASGX is used to construct an oblivious data storage structure based on Path ORAM to avoid the leakage of access patterns of genome data in the analysis and update process, and encrypts them with a tamper-proof encryption algorithm (ASE-GCM) to guarantee the confidentiality and integrity of gene data, as well as the correctness of the analysis results.
  • SEDASGX utilizes various genome analysis methods and dynamic updates. Additionally, the identity encryption technology based on the SGX security analysis architecture ensures that during the analysis process users cannot obtain each other’s analysis results.

2. Related Work

Privacy of genomic data has recently become a very hot research topic. Several privacy-preserving schemes have been proposed for processing of genomic data in different secure aspects. In the following, we present the detailed classifications of state-of-the-art work.
  • Homomorphic encryption-based schemes. Kim et al. [6] used homomorphic encryption technology to encrypt a DNA sequence and conduct a secure evaluation of χ 2 distribution over the encrypted data. Sarkar et al. [7] proposed a privacy-preserving genotype imputation using machine learning and a Paillier homomorphic encryption. Wang et al. [8] designed a homomorphic exact logistic regression model algorithm aiming at reducing the computational and storage costs. Blatt et al. [9] presented a privacy-preserving framework based on several advances in homomorphic encryption and demonstrated that it can perform an accurate GWAS analysis for a real dataset of more than 25,000 individuals, keeping all individual’s data encrypted and requiring no user interactions.
  • Secure multi-party computation (SMC)-based schemes. Kamm et al. [10] proposed secretly sharing the sensitive data among several parties and computing GWAS over the distributed data. Dong et al. [11] proposed a secure and efficient GWAS scheme. Zhu et al. [12] proposed a privacy-preserving framework for conducting genome-wide association studies over outsourced patient data.
  • Hardware-based schemes. Chen et al. [13] presented one of the first implementations of a Software Guard Extension (SGX)-based securely outsourced genetic testing framework, which leveraged multiple cryptographic protocols and a minimal perfect hash scheme to enable efficient and secure data storage and computation outsourcing. Mandal et al. [14] built a memory oblivious structure to search genome variants using Intel SGX. Kockan et al. [15] proposed SkSES, which employs sketch algorithm, data compression, and population stratification reduction methods to reduce the memory consumption.

3. Preliminaries

3.1. Intel SGX

Intel SGX [16,17,18,19,20] is a set of new instructions and modifications to the memory access architecture of Intel CPUs. Figure 1 summarizes the main features of SGX: memory isolation and remote attestation.
  • Memory Isolation. When a program runs on the SGX-enabled platform, it is divided into two parts: an untrusted storage region and a trusted isolated region (enclave). An enclave is a separate block of physical RAM that cannot be accessed by other applications, privileged software, OS, hypervisor, or the firmware on the system in Figure 1(1). Meanwhile, the enclave is used to protect sensitive data and codes. When an SGX program is hung or closed, the untrusted storage region mainly is utilized to store the encrypted sensitive data separated from the enclave.
  • Remote Attestation. SGX provides a cryptographic verification that an enclave is running securely on a remote server platform. When an enclave is created, an app enclave generates a set of claims (i.e., Key and REPORT), and an SGX component QE (quoting Enclave) generates an attestation signature of the report by using the EGETKEY instruction in Figure 1((2)-6). QE returns the signature to the App in Figure 1((2)-7). After that, the App sends the signature and key to the verifier in Figure 1((2)-8). Finally, the verifier checks the signature using the Intel attestation services (IASs). In particular, QE only accepts measurements from trusted hardware, and the hardware guarantees that only enclaves that have been correctly created can be measured. Furthermore, a secure channel between the enclave and the client can be established via ECDH [21] and ECDSA [22]. This trusted channel is used to share the secret between the enclave and the client.

3.2. Oblivious RAM

Oblivious RAM (ORAM) algorithms allow a user to hide the access pattern of data that are accessed on a remote server by continuously shuffling and re-encrypting them. An adversary can observe the physical locations of the data accessed, but the ORAM algorithms ensure that the adversary learns nothing about the true access pattern during frequent accesses between the enclave and the storage. Next, we give a definition of the access pattern, and more details are given in [23].
Definition 1 (Access Pattern). 
Let x =( q 1 , q 2 ,…, q n ) denote a data effective request sequence of length n, where q i = ( o p i , i d i , d a t a i ), o p i denotes a read i d i or a write ( i d i , d a t a i ) operation. Additionally, i d i represents the identifier of a data block, and d a t a i represents gene data written into a data block with an identifier i d i .

3.3. Genome Analysis Methods

  • Chisquare statistics are mainly used to study the relation of gene mutations in genome data. Because human beings are diploid and have two copies of all (non-sex)chromosomes, each person will have either the genotypes aa, ab(ba), or bb for each locus. For example, we sample the genomes of N individuals for particular single nucleotide variants (SNV), of which some have a particular disease (cases), and the rest do not (controls). The genotype aa, ab(ba), and bb represent 0, 1, and 2, respectively. Thus, person genomes can be represented by a vector g∈ { 0 , 1 , 2 } N . The i-th entry corresponds to the number of transcripts the person i has of a allele at locus j. Let y∈ { 0 , 1 } N be a vector that represents the gene mutation state of a person y i = 1 if the i-th person has the disease, and y i = 0 if he/she does not. More details are described in [24]. As we will see, these values in Table 1 are sufficient to compute the χ 2 statistic using the following Equation (1).
    χ 2 = i { 0 , 1 } j { 0 , 1 } ( m i j c j × r i 2 N ) 2 c j × r i 2 N
    Table 1. Genotype table.
    Table 1. Genotype table.
    GenotypeabSum
    Cases ( y i = 1) m 00 m 01 r 0
    Controls ( y i = 0) m 10 m 11 r 1
    Sum c 0 c 1 2N
  • Fisher’s exact test [25] is a statistical test used to determine whether there is a nonrandom association between two categorical variables. It is generally used to determine whether a gene locus is statistically associated with a factor, which is more accurate than the Chisquare test. As preparation for extending to R × C contingency tables, the cell counts in 2 × 2 tables are denoted by { m i j } for i = 0 , 1 and j = 0 , 1. Given the above Table 1, a more general formula is as follows (2). In simple terms, i = 0 , 1 and j = 0 , 1 are extended to i = 0 , 1,…, R and j = 0 , 1,…, C, followed by the row margins { c 0 , c 1 ,…, c R } and the column margins { r 0 , r 1 ,…, r C }. Meanwhile, the formula is extended to (3).
    p = c 0 ! c 1 ! r 0 ! r 1 ! ( 2 N ) ! m 00 ! m 01 ! m 10 ! m 11 !
    p = i ( c i ! ) j ( r j ! ) ( 2 N ) ! i j ( m i j ! )
  • Logistic regression [26] methods are commonly used in statistical analysis. They are also applied to genetic association studies due to the detection demand of massive genetic marker predictor variables, e.g., case/control status. Given a dichotomous phenotype vector Y of m observations, and a matrix of single nucleotide polymorphism (SNP) genotypes X , let p = P ( Y = 1 | X = x ) . The likelihood function is:
    L = Y = 1 p Y = 0 ( 1 p )
    where
    p = 1 1 + e ( α + β X )
    and β is the vector of coefficients.

3.4. Identity-Based Encryption (IBE)

The formal notion of an identity-based encryption scheme was developed in [27]. An IBE scheme Π contains four algorithms: Setup, KeyGen, Enc, and Dec.
  • Setup( 1 λ )→( p k , m s k ). This algorithm takes as input a security parameter λ . It outputs the public parameter p k and a private master key m s k .
  • KeyGen( p k , m s k , i d )→ s k i d . This algorithm takes as input the public key p k , the private master key m s k , and an identity i d . It outputs private key s k i d of i d .
  • Enc( s k i d , m )→C. This algorithm takes as input a public key p k and a message m. It outputs the ciphertext C for an identity i d .
  • Dec( s k i d , C )→m. This algorithm takes as input a private key s k i d and the ciphertext C. It outputs the message m.

4. SEDASGX Scheme

A secure and efficient dynamic analysis scheme for genome data within SGX-assisted is composed of seven polynomial time calculations, namely SEDASGX = (Setup, Enc, Preprocess, Init, Analysis, Dec, Update). In this section, we will show the system model, notations and definitions, and constructions.

4.1. System Model

As shown in Figure 2, SEDASGX consists of five entities (i.e., analysis users ( AU ), patients ( P ), edge server ( ES ), cloud server ( CS ), and authority ( AUT )), and their respective tasks are described as follows.
  • The AU generates encrypted analysis queries and sends them to the CS . Additionally, the AU decrypts the query results.
  • The P encrypts genome data and uploads them to the ES . Meanwhile, the P sends the update queries to the ES .
  • The ES is divided into two parts: enclave and storage region. The enclave preprocesses all genomic data within the jurisdiction. After that, the enclave encrypts the processed data and sends them to the CS . The storage region mainly stores source data.
  • The CS is divided into two parts: enclave and storage region. The enclave performs initialization operation, genome analysis operation, and update operation. The storage region mainly stores all encrypted data and oblivious storage structure.
  • The AUT is primarily responsible for remotely verifying the trusted execution environment of all edge servers and cloud services. Furthermore, the AUT is also responsible for generating keys for each entity within the system and distributing them through secure channels.
As shown in Figure 2, the AUT executes a setup operation to generate system master key pairs and secret key for each entity, and builds the secure channel by performing remote attestation with the CS and each ES in step (1). In step (2), the P encrypts genome data and uploads them to the ES . The ES receives all genome data within its jurisdiction and performs a preprocessing operation within the enclave in step (3). Then, each ES encrypts these processed genome data and sends them to the CS . After that, in step (4), the enclave uses the processed genome data to construct some oblivious data structures, a position map table, and a stash using an initialization operation. Then, the AU generates encrypted analysis queries and sends them to the enclave. The enclave decrypts these encrypted analysis (update) queries and performs genome analysis (update) via loading these oblivious data structures in step (5). The AU receives the encrypted analysis results returned by the CS and performs decryption operations in step (6). Finally, the P sends the update to the ES in step (7).

4.2. Notations and Definitions

We summarize some notations used in SEDASGX in Table 2 and define two security definitions as follows.
Definition 2 (Correctness). 
The SEDASGX scheme is correct if the following holds: First, AUT runs the Setup algorithm to generate public key p k and master key m s k . For the genome data M i j ,1 ≤ i ≤ N,1 ≤ j ≤ M, P executes the Enc algorithm to generate encrypted genome data C i j and sends them to ES . Then, the enclave of the ES performs the Preprocess algorithm to generate fixed-size data blocks M i η i and encrypts M i η i to C i η i . After that, the enclave of the CS calls the Init algorithm to generate encrypted ORAM trees T i , position map tables PM i , and stashes S i by using C i η i . Given an analysis request C q , the ciphertext analysis result C R can always decrypt into the corresponding plaintext analysis result and can be successfully verified by the Dec algorithm (AES-GCM).
Definition 3 (Query Unlinkability). 
Let q = ( q 1 , q 2 ,…, q n ) denote a set of analysis sequences with the same key and length. If any two analysis queries q i and q j are computationally indistinguishable, the query pattern of the SEDASGX is secure.

4.3. Constructions

We now give the detailed construction of each algorithm in the SEDASGX.
Setup( 1 λ )→( p k , m s k , s k i j , s k i , s k k , s k E ). Taking a security parameter λ as input, the enclave of the AUT generates a group G with order p, where g is a random generator of G . The enclave picks up a α R Z p , and selects a collision-resistant hash function H: { 0 , 1 } * G . Meanwhile, the AUT generates unique identities i d i j , i d k , i d i , and i d E for P , AU , ES , and CS , and generates private keys s k i j = H ( i d i j ) α , s k k = H ( i d k ) α , s k i = H ( i d i ) α , and s k E = H ( i d E ) α based on the these unique identities respectively, 1 ≤ iN, 1 ≤ jM, and 1 ≤ k ς . Finally, the AUT publishes the public key p k = ( G , p , g , H , i d i j , i d i , i d k , i d E ) of the system, and keeps the master key m s k = α secret. Meanwhile, the AUT sends the private keys s k i j , s k i , s k k , and s k E to the P , the ES , the AU , and the CS by the secure channel respectively.
Enc( m i j , i v i j , a a d i j , s k i j )→( C i j , t a g i j ). This algorithm takes as inputs data m i j , the initial vertor i v i j , the additional authentication data a d d i j , and the data key s k i j as input. It outputs ciphertext C i j and the encrypted tag t a g i j . Then, each P uploads the C i j and the t a g i j to the ES . Note that the Enc is an AES-GCM encryption algorithm.
Preprocess( m i j , S i z e )→ M i η i . This algorithm takes genome data m i j and a predetermined size S i z e as input. It outputs a set of fixed-size blocks of data M i η i , where i denotes the i-th edge server and η i represents the total number of data blocks preprocessed by the i-th the ES , 1 ≤ jM, 1 ≤ iN. In particular, η i = Σ j = 1 M η i j , j represents the j-th patient under the i-th ES , and η i j represents the number of data blocks after the m i j is split. Firstly, the enclave of each ES decrypts C i j and divides all genome data m i j into fixed-size data block M i η i , recursively. If the last remaining data point is not sufficient to meet the predetermined size, it needs to be randomly filled to reach the required size. Finally, the enclave encrypts M i η i to C i η i and sends them to the CS .
Init( C i η i , s k E , Z, s k i )→( T i , PM i , S i ). This algorithm takes all encrypted data blocks C i η i , the ORAM tree structure key s k E , the node capacity Z, and the data key s k i of the ES as input. It outputs position map tables PM i , stashes S i , and encrypted ORAM trees T i , 1≤iN, 1≤jM, according to Algorithm 1.
  • First, the enclave of the CS calculates the sum η i of all data blocks in each ES . Then, enclave computes N i , L i , p s i z e i , and s s z i e i using the Equations (6) and (7), where P o w is a function that finds the power of 2 closest to η i .
    η i = Σ j = 1 M η i j ; N i = P o w ( η i ) ; L i = l o g 2 ( N i + 1 ) 1
    p s i z e i = N i Z ; s s z i e i = ( L i + 1 ) Z
  • Second, the enclave creates the position map tables PM i and stashes S i according to p s i z e i and s s i z e i . Meanwhile, the enclave randomly generates s s i z e i dummy genome data blocks and encrypts them according to s k E . Then, the enclave writes them to the ORAM tree T i according to P i .
  • Finally, the enclave computes s k i by the α and utilizes them to decrypt C i η i to obtain M i η i . The enclave re-encrypts M i η i to C i η i * in the S i and writes them to T i according to updated P i .
Query( s k k , q)→ C q . This algorithm takes as inputs a query key s k k of k-th analysis user and an analysis query q. It outputs the encrypted query C q .
  • To generate a tailored analysis query q = H ( μ | | ν | | p o s | | f i s h e r ) , the k-th AU first selects the data from various ES ( μ and ν , 1 ≤ μ , ν N), a certain gene locus p o s , and a certain analysis method f i s h e r , C h i - s q u a r e , or L R based on their analysis requirements.
  • Subsequently, the AU employs their query key s k k to generate an encrypted analysis query C q using the Enc algorithm, and sends C q to the CS .
Algorithm 1: Init Algorithm
Input: ORAM structure key s k E , encrypted data blocks C i η i , data key s k i of ES , node capacity Z.
Output: The position map PM i , stashes S i , and encrypted ORAM trees T i , 1 i N .
Enclave:
1 Decrypt C i η i to obtain M i η i , η i = Σ j = 1 M η i j , 1 i N , 1 j M ;
2 For  1 i N  do
3   Compute N i = Pow ( η i ) , L i = l o g 2 ( N i + 1 ) 1 , s s z i e i = ( L i + 1 ) Z , p s i z e i = N i Z ,
     m a x p a = 2 L i ;
4   Initialize an S i = { c i , i d i , p o s i , p a i , n i , s t i } of size s s i z e i and PM i = { i d i , p o s i , p a i , n i } of size p s i z e i ;
5   For  1 t β i = p s i z e i / s s z i e i s s i z e i  do
6     Generate a dummy block m i t , i d i t = t , p a i t = R a n d o m ( 2 L i ) , p o s i t = r a n d o m , and n i t = i ;
7     Copy m i t , i d i t , p a i t , p o s i t , and n i t to S i . c i t , S i . i d i t , S i . p a i t , S i . p o s i t , S i . n i t and set S i . s t i t = 0;
8     IF  t mod s s i z e i = 0
9       Encrypt blocks in S i and write them to T i by PM i and empty S i ;
10 For  1 i N  do
11   Set β i = η i / s s i z e i s s i z e i and generate β i η i dummy blocks m i ι , η i < ι β
12   For  1 j β i  do
13     Copy j, M i η i . p o s , M i η i . n , M i η i . c to S i . i d i j , S i . p o s i j , S i . n i j , S i . c i j , and set S i . s t i j = 1
       S i . p a i j = Random ( m a x p a ) ;
14      PM i . i d i j = j , PM i . p a i j = S i . p a i j , PM i . n i j = S i . n i j , PM i . p o s i j = S i . p o s i j ;
15     IF  j mod s s i z e i = 0
16       Encrypt blocks in the S i and write them to T i by PM i , and empty S i ;
17 Return ORAM tree T i , position map PM i , and stashes S i ;
Analysis( s k E , T i , PM i , S i , s k k , C q )→ C R . This algorithm takes the ORAM tree structure key s k E , the encrypted ORAM tree T i , the position map table PM i , the stash S i , the query key s k k of k-th AU , and an encrypted query C q as inputs. It outputs the encrypted analysis result C R according to the following Algorithm 2.
  • First, the enclave decrypts the C q to obtain μ , ν , p o s , and f i s h e r .
  • Second, the enclave acquires its corresponding ORAM tree T μ and T ν based on the values of μ and ν , and subsequently read the data from the ORAM tree T μ and T ν to the stash S μ and S ν by utilizing the position map table PM μ and PM ν , respectively.
  • Finally, the enclave extracts the relevant information required for data analysis from the data blocks that have been read, and calls the corresponding analysis algorithm (e.g., f i s h e r ) to analyze the genome data according to the query request and obtain the corresponding analysis results R. After that, the enclave encrypts the R by using the corresponding query key s k k and sends the C R to the k-th AU .
Algorithm 2: Analysis Algorithm
Input: Encrypted ORAM tree T i , structure key s k E , query key s k κ , encrypted analysis query C q .
Output: Encrypted analysis result C R .
   AU ̲ :
1 Encrypt the analysis query q to C q with query key s k k ;
   E n c l a v e ̲ :
2 Decrypt C q to q = μ | | ν | | p o s | | f i s h e r with s k k , and read PM μ and PM ν to enclave;
3 For  1 i N  do
4   For  1 j β i  do
5     Find PM μ . i d μ j and PM μ . p a μ j corresponding to PM μ . p o s μ j = p o s ;
6     Read all blocks on PM μ . p a μ j in T μ and decrypt them to S μ ;
7   For 1≤ ω s s i z e i  do
8     IF  PM μ . i d μ j = S μ . i d μ ω
9       Get S μ . c μ ω , update S μ . p a μ ω , and PM μ . p a μ j , and re-encrypt S μ . c μ ω ;
10 Repeat steps 3–10 to obtain S ν . c ν ω of ν ;
11 Compute analysis result R according to S μ . c μ ω of μ and S ν . c ν ω of ν via Fisher’s exact algorithm;
12 Encrypt analysis result R to C R with s k k and send it to the k-th AU ;
   AU ̲ :
13 Decrypt encrypted analysis result C R and verify correctness;
   Dec( C R , s k k , t a g R )→R. This algorithm takes as inputs an encrypted analysis result C R , a query key s k k , and an encrypted tag t a g R . It outputs an analysis result R. In other words, the AU verifies and decrypts this analysis result C R via the AES-GCM algorithm.
Update( T i , PM i , S i , s k E , C i t , s k i )→( T i , PM i , S i ). This algorithm takes as inputs the ORAM tree T i , position map PM i , stash S i , ORAM tree structure key s k E , updated encrypted data C i t , and data key s k i . We are assuming that the updated data blocks have been preprocessed by the ES and already reside within the CS . Furthermore, given the vast volume of genomic data, we will only address scenarios involving the modification of individual data blocks and the addition of a specific number of data blocks. See Algorithm 3 for details.
  • Case 1: modify a single data block. First, the enclave decrypts the C i 1 to obtain updated data, e.g., μ , p o s , and M i 1 . Then, the enclave finds p o s in the PM μ to obtain PM μ . p a μ j and PM μ . i d μ j . After that, the enclave loads all data blocks on the PM μ . p a μ j in the T μ and decrypts them into the S μ . Next, the enclave searches the PM μ . i d μ j on the S μ and replaces S μ . c μ j with the updated content M i 1 . Meanwhile, it updates S μ . p a μ j and copies it to PM μ . p a μ j . Finally, the enclave re-encrypts all data blocks in the S μ and re-writes them back to the T μ according to the updated S μ . p a μ j .
  • Case 2: add small data blocks. Upon receipt of the updated data blocks C i t , η i t p s i z e i uploaded by the ES , the enclave uses the s k i to decrypt block by block. Then, the enclave generates the updated ORAM tree T i .
  • Case 3: add massive data blocks. This is performed after receiving the encrypted data blocks C i t , p s i z e i η i i t uploaded by ES . Because the actual number of genome data blocks in the storage region has exceeded the storage limit of the original ORAM trees, the enclave needs to call the Algorithm 1 to regenerate the new ORAM tree T i .
Algorithm 3: Update Algorithm
 Input: Encrypted ORAM tree T i , ORAM tree key s k E , data key s k i , encrypted update data blocks C i t ,
         the position map PM i , the stash S i .
 Output: Updated ORAM trees T i , updated position map tables PM i , updated stashes S i .
ES ̲ :
1 Preprocess update data to C i t , and upload them to the CS ;
E n c l a v e ̲ :
2 Decrypt C i t to M i t , μ and p o s i t with s k i , and find the PM μ ;
3 For  1 i N  do
4   IF  t = 1
5     For  1 j β i
6       Find PM μ . i d μ j and P μ . p a μ j corresponding to PM μ . p o s μ j = p o s i t ;
7       Read all blocks on PM μ . p a μ j in T μ and decrypt them to S μ ;
8     For 1≤w s s i z e i  do
9       IF  PM μ . i d μ j = S μ . i d μ ω
10           Update S μ . p a μ ω , S μ . c μ ω , and PM μ ;
11  ELSE IF  1 < t p s i z e i η i
12     For  1 l t  do
13        Update PM μ . i d i η i + l = η i + l , PM μ . p a i η i + l = R a n s o m ( 2 L i ) , PM μ . p o s i η i + l = p o s i t ;
14     For  1 ξ t / s s z i e i  do
15        For  1 ω s s i z e i  do
16           Copy PM μ . i d i η i + l , PM μ . p a i η i + l , M i l to S μ . i d i ω , S μ . p a i ω , S μ . c i ω and set S μ . s t i ω =1;
17        Encrypt S μ and write them to ORAM T μ by S μ . p a i ω , and empty S μ ;
18  ELSE  t > p s i z e i η i
19     Regenerate new T μ , PM μ , and S μ with the C μ η i and updated data blocks C i t ;
20 Return T i , P i , and S i

5. Security Analysis

Combining the threat model assumed in Figure 1(1) by Intel SGX itself, only the CPU can securely access data in the enclave. Thus, the adversary can impersonate cloud server administrators (or OS), other entities in the system, and external attackers.

5.1. Correctness

In SEDASGX, the AU , P , AUT , and enclave are trusted, and they can execute all algorithms correctly. The genome data separated from the enclave are encrypted by the AES-GCM tamper-proof encryption algorithm and stored in the untrusted memory. The correctness of the SEDASGX relies on the integrity of the data stored in the untrusted memory. Thus, the correctness of the SEDASGX depends on the AES-GCM tamper-proof encryption algorithm. Fortunately, the correctness of the AES-GCM algorithm has been proved in [28].

5.2. Query Unlinkability

When the adversary is an external attacker, the adversary cannot obtain the key due to the memory isolation feature of Intel SGX. The adversary cannot obtain any plaintext information about the query without the key. Therefore, it is only necessary to prove that the adversary cannot distinguish any two queries.
Theorem 1. 
SEDASGX can guarantee that the adversary cannot distinguish any two analysis queries that are generated from the same analysis content.
Proof. 
Assuming that with the two analysis queries q i and q j , the user randomly selects different initial vectors i v i , i v j . Then, the user adopts the AES-GCM algorithm to encrypt q i and q j with
C q i = E n c ( S C , i v i , q i , a a d i ) , C q j = E n c ( S C , i v j , q j , a a d j )
The security of the AES-GCM encryption algorithm is based on a cryptographic conjecture that the block cipher is a secure pseudo-random permutation. Even the same analysis content will be encrypted into different ciphertext due to the randomness of the initial vector. Thus, C q i and C q j are computationally indistinguishable. □

5.3. Access Pattern

When the adversary is an administrator, SEDASGX uses the ORAM mechanism to avoid access pattern leakage caused by frequent access to memory due to genome analysis.
Theorem 2. 
Let AP( x ) represent the access pattern of the storage sequence for a given analysis query. An ORAM is secure if (1) for any two analysis queries of the same length, their access patterns AP( x ) and AP( y ) are computationally indistinguishable except user and enclave, and (2) the ORAM is correct in the case that returns on input x data that is consistent with x probability 1 n e g l (| x |), i.e., the ORAM may fail with probability negl(| x |).
Proof. 
When a user sends an analysis request, the enclave loads all genome data blocks based on a certain path PM i . p a i η i in the ORAM tree T i each time. To prove the security of ORAM, we assume that Q = { i d 1 , i d 2 , , i d p s i z e i } is an block identifier sequence with size p s i z e i . Thus, the access pattern p observed by the adversary is as follows:
p = { p o s p s i z e i [ i d p s i z e i ] , . ; p o s 1 [ i d 1 ] }
where p o s κ [ i d κ ] is the position of κ -th genome data block on a certain path. Every data block is encrypted with the AES-GCM algorithm. Thus, any two access pattern sequences are computationally indistinguishable due to initial vector iv generated randomly. Moreover, PM i is accompanied by an update during the enclave data loads every time, e.g., any two positions p o s κ 1 [ i d κ 1 ] and p o s κ 2 [ i d κ 2 ] are statistically independent of each other under κ 1 < κ 2 and i d κ 1 = i d κ 2 . Likewise, p o s κ 1 [ i d κ 1 ] and p o s κ 2 [ i d κ 2 ] are statistically independent of each other under κ 1 < κ 2 and i d κ 1 i d κ 2 . Thus, we obtain the Equation (10) (by using Bayes rule).
Pr ( p ) = κ = 1 p s i z e i Pr ( p o s κ [ i d κ ] ) = >( 1 2 L i ) p s i z e i
This proves that AP( x ) is computationally indistinguishable from a random sequence of bit strings. The correctness of the ORAM was proven in detail in [23]. □

5.4. CCA Security

Theorem 3. 
If Π E is a CPA secure encryption scheme, and Π M is a message authentication code with a unique tag, then SEDASGX is a CCA secure encryption scheme.
Proof. 
In the AES-GCM algorithm, plaintext data are encrypted using the AES-CTR mode, and then an authentication tag (MAC) is generated through GHASH, and finally, the ciphertext is obtained by XOR operation. Among them, the AES-CTR has been proved in [29] to satisfy CPA security.
Now suppose there exists an adversary, denoted as A , who can distinguish the ciphertext. The A can choose two plaintexts m 0 and m 1 of the same length, | m 0 | = | m 1 | , and receives an encrypted ciphertext c b = E n c ( s k , m b ) , where s k is a symmetric key and b is 0 or 1, indicating the plaintext chosen by the A . Therefore, A ’s goal is to infer b from c b . To achieve this goal, A can construct two valid authentication tags (MACs) with different GHASH values, and select one of these tags to attempt to match the encrypted ciphertext. However, since the GHASH function is collision-resistant, A cannot construct two valid tags with the same GHASH value. Therefore, we prove that the SEDASGX scheme for AES-GCM algorithm encryption with keys generated by the IBE scheme is CCA secure. □

6. Experiment Analysis

We show the experimental results from experimental analysis of SEDASGX and comparison of SEDASGX with a non-SGX server.

6.1. Implementation

We realize a series of experimental evaluations using a real-world genome dataset [30] to evaluate SEDASGX in terms of preprocessing, update operation, and analysis efficiency. The dataset we used is from the third phase of the 1000 Genomes Project in the UCSC Genome Browser and represents 2504 samples on GRCh37. The 1000 Genomes Project utilizes advanced DNA sequencing technologies to analyze the genomes of a diverse set of individuals from various ethnic backgrounds. The 1000 Genomes Project dataset includes the following features: sample diversity, whole genome sequencing, data accessibility, data quality control, and clinical and population genetics applications.
We implemented SEDASGX in C/C++ and Python codes on real SGX hardware, and used Intel SGX SDK 2.11 version library and SGX-OpenSSL 1.1.1 version library for encryption and Setup, respectively. We used Python 3.11.5 version in The Python Community to implement genetic data preprocessing inside SGX, and we evaluated the performance of the algorithms in the SGX hardware debug mode. The experimental environment was deployed on a PC with an Intel ® Core TM i7-10510U CPU (1.8 GHz∗8), 32G memory, and Ubuntu 20.04.3LTS operating system.
Table 3 shows the size of the genome data blocks of each country under different edge servers and the size of the corresponding ORAM tree built on the cloud server.

6.2. Function Analysis

Table 4 presents the function comparison between SEDASGX and existing research solutions, including security, analysis method, and dynamic update. Here, reference [13] mainly proposes a secure genetic testing framework based on SGX, which can defend against malicious attacks. However, multi-analysis methods, dynamic update, and IBE are not considered. Reference [14] adopts an oblivious RAM mechanism to avoid the access pattern leakage of the interaction between the user and the server with SGX. Nonetheless, it only supports the Chisquare analysis method and dynamic update is not considered. Similarly, reference [15] also does not consider the diversity of analysis methods, IBE, and the application of actual scenarios.

6.3. Performance Analysis

Figure 3 shows the performance of each algorithm in the SEDASGX. In the preprocessing phase, the overhead of preprocessing increases with the increase in datasets under different edge servers. Among them, the overhead of enclave initialization is roughly the same for the same level of data volume. In the analysis process, the larger the ORAM tree T i created, the more genetic data on the read path, and the greater the analysis overhead.
Figure 4 illustrates the performance of data preprocessing by different edge servers. It can be seen intuitively that as the amount of data held by the edge server increases, the overhead of preprocessing operations will also increase, and there is a linear relationship between the two, but the increase is not very drastic.
Figure 5 shows the comparison of the update time of the two edge servers with the largest and smallest data volumes. The update cost of edge servers owned by China is higher than the update cost of edge servers owned by The Gambia. Through comparison, the update efficiency is not only related to the amount of original data but also related to the number of update gene loci. Notice: The content presented in Figure 5 is the update cost for a small number of loci on a chromosome. We only want to reflect the relationship between the update cost and the number of updated gene loci according to Figure 5. Our scheme is built based on real genetic datasets, so the framework of the scheme is easily scalable to handle massive genetic loci.
Figure 6 shows the computational overhead of testing the three analysis algorithms of Chisquare, Fisher, and Logistic Regression under the two cases of hardware SGX-assisted cloud servers and traditional cloud servers. Experimental results show that the three genetic data analysis methods under the SGX-assisted cloud server are significantly faster than the three genetic data analysis methods under traditional cloud servers, and each analysis algorithm is approximately 2.5 times faster. This is because the genetic data analysis on the corresponding plaintext is performed after the ciphertext of a certain path on the ORAM tree is decrypted in the enclave. Meanwhile, SEDASGX not only reduces the communication cost between the client (User/Patient) and the cloud server but also makes the client lightweight, so that the client does not need to preprocess their genomic data. In summary, SEDASGX has high analysis efficiency.
Furthermore, in terms of security, SEDASGX not only inherits the data confidentiality and integrity of the non-SGX traditional scheme, but also has a trusted hardware environment that the non-SGX scheme does not have to ensure the confidentiality and integrity of the code. Therefore, the SEDAGX scheme can provide more secure genetic data analysis. In terms of calculation speed, while ensuring the same security strength, the plaintext calculation rate of the SEDASGX scheme within the SGX hardware is much higher than the ciphertext calculation rate of the non-SGX traditional scheme.

7. Conclusions

In this paper, we construct a secure and efficient dynamic analysis scheme for genome data within an SGX-assisted server. First, we design a multi-party genetic data analysis architecture based on Intel SGX and IBE in edge computing scenarios. This framework relies on Intel SGX to ensure the confidentiality and integrity of genetic data while leveraging the IBE to enable the multi-party analysis scenario. To mitigate the threat of access pattern leakage, we employ SGX to construct an oblivious ORAM tree structure for obfuscating memory access patterns. Simultaneously, we not only implement plaintext genomic data analysis within trusted hardware but also provide various analytical methods for genomic data. Finally, the SEDASGX implements dynamic updates of genomic data to ensure more accurate analysis in cases of genetic mutations due to environmental and other factors. Moreover, the experimental results show that SEDASGX is more efficient than non-SGX in genome data analysis.
Future work includes deploying the scheme in a real-world environment (e.g., a large-scale hospital) with the aims of evaluating and refining the scheme (if necessary) to provide additional functionalities without compromising on security and efficiency. Additionally, we will also consider the situation of a more powerful adversary and pursue higher analysis efficiency.

Author Contributions

Conceptualization, B.L. and Q.W.; Methodology, B.L. and Q.W.; Software, B.L.; Formal analysis, Q.W.; Investigation, D.F.; Resources, F.Z.; Data curation, D.F.; Writing—original draft, B.L.; Writing—review and editing, B.L.; Supervision, F.Z.; Project administration, F.Z. and Q.W.; Funding acquisition, F.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under Grant 62202090, 62173101 and 62072090, by Liaoning Province Natural Science Foundation Medical-Engineering Cross Joint Fund under Grant 2022-YGJC-24, by Doctoral Scientific Research Foundation of Liaoning Province under Grant 2022-BS-077, and by the Fundamental Research Funds for the Central Universities under Grant N2217009.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Vellela, S.S.; Reddy, B.V.; Chaitanya, K.K.; Rao, M.V. An Integrated Approach to Improve E-Healthcare System using Dynamic Cloud Computing Platform. In Proceedings of the 2023 5th International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 23–25 January 2023; IEEE: Pisctaway, NJ, USA, 2023; pp. 776–782. [Google Scholar]
  2. Garrison, E.; Kronenberg, Z.N.; Dawson, E.T.; Pedersen, B.S.; Prins, P. A spectrum of free software tools for processing the VCF variant call format: Vcflib, bio-vcf, cyvcf2, hts-nim and slivar. PLoS Comput. Biol. 2022, 18, e1009123. [Google Scholar] [CrossRef]
  3. Xu, Y.; Ren, J.; Wang, G.; Zhang, C.; Yang, J.; Zhang, Y. A blockchain-based nonrepudiation network computing service scheme for industrial IoT. IEEE Trans. Ind. Inform. 2019, 15, 3632–3641. [Google Scholar] [CrossRef]
  4. Gürsoy, G.; Li, T.; Liu, S.; Ni, E.; Brannon, C.M.; Gerstein, M.B. Functional genomics data: Privacy risk assessment and technological mitigation. Nat. Rev. Genet. 2022, 23, 245–258. [Google Scholar] [CrossRef] [PubMed]
  5. Sero, D.; Zaidi, A.; Li, J.; White, J.D.; Zarzar, T.B.G.; Marazita, M.L.; Weinberg, S.M.; Suetens, P.; Vandermeulen, D.; Wagner, J.K.; et al. Facial recognition from DNA using face-to-DNA classifiers. Nat. Commun. 2019, 10, 2557. [Google Scholar] [CrossRef]
  6. Kim, M.; Lauter, K. Private genome analysis through homomorphic encryption. BMC medical informatics and decision making. BioMed Cent. 2015, 15, 1–12. [Google Scholar]
  7. Sarkar, E.; Chielle, E.; Gürsoy, G.; Mazonka, O.; Gerstein, M.; Maniatakos, M. Fast and scalable private genotype imputation using machine learning and partially homomorphic encryption. IEEE Access 2021, 9, 93097–93110. [Google Scholar] [CrossRef]
  8. Wang, S.; Zhang, Y.; Dai, W.; Lauter, K.; Kim, M.; Tang, Y.; Xiong, H.; Jiang, X. HEALER: Homomorphic computation of ExAct Logistic rEgRession for secure rare disease variants analysis in GWAS. Bioinformatics 2016, 32, 211–218. [Google Scholar] [CrossRef] [PubMed]
  9. Blatt, M.; Gusev, A.; Polyakov, Y.; Goldwasser, S. Secure large-scale genome-wide association studies using homomorphic encryption. Proc. Natl. Acad. Sci. USA 2020, 117, 11608–11613. [Google Scholar] [CrossRef] [PubMed]
  10. Kamm, L.; Bogdanov, D.; Laur, S.; Vilo, J. A new way to protect privacy in large-scale genome-wide association studies. Bioinformatics 2013, 29, 886–893. [Google Scholar] [CrossRef]
  11. Dong, C.; Weng, J.; Liu, J.N.; Yang, A.; Liu, Z.; Yang, Y.; Ma, J. Maliciously secure and efficient large-scale genome-wide association study with multi-party computation. IEEE Trans. Dependable Secur. Comput. 2022, 20, 1243–1257. [Google Scholar] [CrossRef]
  12. Zhu, X.; Ayday, E.; Vitenberg, R. A privacy-preserving framework for conducting genome-wide association studies over outsourced patient data. IEEE Trans. Dependable Secur. Comput. 2022, 20, 2390–2405. [Google Scholar] [CrossRef]
  13. Chen, F.; Wang, C.; Dai, W.; Jiang, X.; Mohammed, N.; Al Aziz, M.M.; Sadat, M.N.; Sahinalp, C.; Lauter, K.; Wang, S. PRESAGE: PRivacy-preserving gEnetic testing via SoftwAre guard extension. BMC Med. Genom. 2017, 10, 77–85. [Google Scholar] [CrossRef]
  14. Mandal, A.; Mitchell, J.C.; Montgomery, H.; Roy, A. Data oblivious genome variants search on Intel SGX. In Proceedings of the International Workshop on Data Privacy Management, Barcelona, Spain, 6–7 September 2018; Springer International Publishing: Cham, Swizterland, 2018; pp. 296–310. [Google Scholar]
  15. Kockan, C.; Zhu, K.; Dokmai, N.; Karpov, N.; Kulekci, M.O.; Woodruff, D.P.; Sahinalp, S.C. Sketching algorithms for genomic data analysis and querying in a secure enclave. Nat. Methods 2020, 17, 295–301. [Google Scholar] [CrossRef]
  16. Costan, V.; Devadas, S. Intel SGX explained. Cryptology ePrint Archive. Available online: https://eprint.iacr.org/2016/086 (accessed on 7 August 2022).
  17. Zheng, W.; Wu, Y.; Wu, X.; Feng, C.; Sui, Y.; Luo, X.; Zhou, Y. A survey of Intel SGX and its applications. Front. Comput. Sci. 2021, 15, 1–15. [Google Scholar] [CrossRef]
  18. Amjad, G.; Kamara, S.; Moataz, T. Forward and backward private searchable encryption with SGX. In Proceedings of the 12th European Workshop on Systems Security, Dresden, Germany, 25–28 March 2019; pp. 1–6. [Google Scholar]
  19. Jiang, Q.; Qi, Y.; Qi, S.; Zhao, W.; Lu, Y. Pbsx: A practical private boolean search using Intel SGX. Inf. Sci. 2020, 521, 174–194. [Google Scholar] [CrossRef]
  20. Will, N.C.; Maziero, C.A. Intel Software Guard Extensions Applications: A Survey. ACM Comput. Surv. 2023, 55, 322. [Google Scholar] [CrossRef]
  21. Djoko, J.B.; Lange, J.; Lee, A.J. Nexus: Practical and secure access control on untrusted storage platforms using client-side sgx. In Proceedings of the 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Portland, OR, USA, 24–27 June 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 401–413. [Google Scholar]
  22. Johnson, D.; Menezes, A.; Vanstone, S. The elliptic curve digital signature algorithm (ECDSA). Int. J. Inf. Secur. 2001, 1, 36–63. [Google Scholar] [CrossRef]
  23. Stefanov, E.; Dijk, M.V.; Shi, E.; Chan, T.H.H.; Fletcher, C.; Ren, L.; Yu, X.; Devadas, S. Path ORAM: An extremely simple oblivious RAM protocol. J. ACM (JACM) 2018, 65, 1–26. [Google Scholar] [CrossRef]
  24. LeMay, C. Privacy-Preserving Chi-Squared Tests Using Homomorphic Encryption. Available online: https://www.cs.utexas.edu/~dwu4/courses/sp22/static/projects/LeMay.pdf (accessed on 15 August 2023).
  25. Zhao, G.; Yang, H.; Yang, J.; Zhang, L.; Yang, X. A data-based adjustment for fisher exact test. Eur. J. Stat. 2021, 1, 74–107. [Google Scholar] [CrossRef]
  26. Ayers, K.L.; Cordell, H.J. SNP selection in genome-wide and candidate gene studies via penalized logistic regression. Genet. Epidemiol. 2010, 34, 879–891. [Google Scholar] [CrossRef]
  27. Naccache, D. Secure and practical identity-based encryption. IET Inf. Secur. 2007, 1, 59–64. [Google Scholar] [CrossRef]
  28. McGrew, D.A.; Viega, J. The security and performance of the Galois/Counter Mode (GCM) of operation. In Proceedings of the International Conference on Cryptology in India, Chennai, India, 20–22 December 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 343–355. [Google Scholar]
  29. Bard, G.V. Modes of Encryption Secure against Blockwise-Adaptive Chosen-Plaintext Attack. Cryptology ePrint Archive. Available online: https://eprint.iacr.org/2006/271, (accessed on 15 August 2023).
  30. GeneData Set. Available online: http://hgdownload-euro.soe.ucsc.edu/gbdb/hg19/1000Genomes/phase3/ (accessed on 7 August 2023).
Figure 1. Intel SGX Features.
Figure 1. Intel SGX Features.
Electronics 12 05004 g001
Figure 2. System Model.
Figure 2. System Model.
Electronics 12 05004 g002
Figure 3. Performance over each algorithm of SEDASGX.
Figure 3. Performance over each algorithm of SEDASGX.
Electronics 12 05004 g003
Figure 4. The performance of the preprocessing operation.
Figure 4. The performance of the preprocessing operation.
Electronics 12 05004 g004
Figure 5. The comparison of update time.
Figure 5. The comparison of update time.
Electronics 12 05004 g005
Figure 6. The comparison of analysis time.
Figure 6. The comparison of analysis time.
Electronics 12 05004 g006
Table 2. Notations.
Table 2. Notations.
NotationsDescriptions
λ , G , g, HSecurity parameter, group, the generator group G , and hash function
ine p k , m s k , s k E , s k i Public key, master key, ORAM tree structure key, the data key of i-th ES
ine s k k , s k i j , m i j The query key of k-th AU , the data key of P , the j-th data under the i-th ES
ine i v i j , t a g i j The j-th initial vector under the i-th ES , the j-th tag in the i-th ES
ine a d d i j , S i z e The j-th additional information under the i-th ES , preset block size
ine C i j , M i η i The j-th ciphertext in the i-th ES , the η i -th data block under the i-th ES
ine Z, p o s , nThe node capacity of ORAM tree, genome locus, the i-th nation (i-th ES )
ine N, M, p a , i d The number of ES , the number of P , the path of ORAM tree, block identifier
ine T i , PM i , S i The i-th ORAM tree, the i-th position map table, the i-th stash
ine η i , N i The sum number of data blocks under i-th ES , the size of T i
ine L i , p s i z e i , s s i z e i The high of ORAM tree, the size of PM i , the size of S i
ine m a x p a , s t The maximum path of T i , the state of data block (True:1 and false:0)
Table 3. Sample data size.
Table 3. Sample data size.
Edge ServerJapanGambiaBritainAmericanChina
Genome Blocks383414448647159680
ORAM Tree Size3066306649,14649,14698,298
Table 4. Contrast of functions.
Table 4. Contrast of functions.
SchemeChiSquareFisherLRUpdateObliviousnessSGXIBE
[13]
[14]
[15]
SEDASGX
The ✗represents not have this function and The ✓represents having this function in the Table 4.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, B.; Zhou, F.; Wang, Q.; Feng, D. A Secure and Efficient Dynamic Analysis Scheme for Genome Data within SGX-Assisted Servers. Electronics 2023, 12, 5004. https://doi.org/10.3390/electronics12245004

AMA Style

Li B, Zhou F, Wang Q, Feng D. A Secure and Efficient Dynamic Analysis Scheme for Genome Data within SGX-Assisted Servers. Electronics. 2023; 12(24):5004. https://doi.org/10.3390/electronics12245004

Chicago/Turabian Style

Li, Bao, Fucai Zhou, Qiang Wang, and Da Feng. 2023. "A Secure and Efficient Dynamic Analysis Scheme for Genome Data within SGX-Assisted Servers" Electronics 12, no. 24: 5004. https://doi.org/10.3390/electronics12245004

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop