1. Introduction
Electronic medical records (EMRs) play an important role in people’s healthcare [
1]. With the increasing demand of cross-institution sharing, massive data processing, and medical quality improving, the current centralized healthcare service system cannot keep up with the rapid development of modern healthcare [
2,
3]. In recent years, blockchain technology [
4,
5] has been applied to solve the weak points in traditional systems, and hence, the distributed healthcare blockchain system appears [
6,
7]. In order to protect user privacy and defend sensitive information exposure, EMRs should be encrypted before uploading to the healthcare blockchain system. Traditional data encryption schemes are stuck in the high complexity and inefficient data processing. Thus, exploring privacy-preserving approaches based on a lightweight message sharing scheme is of paramount importance. Massive medical data processing in the healthcare blockchain system is particularly challenging as it is extremely difficult to meet all the requirements of performance, system security, and efficiency.
In traditional healthcare service systems, the centralized organizations control the whole system, and all the EMRs are locally stored. In this case, the adversaries can tamper with the historical records for their benefit regardless of the patients’ lawful rights and interests. EMRs contain the sensitive information of the patient and medical institution, such as the patient’s name, ID number, telephone number, medical institution name, etc. The centralized cloud storage structure cannot provide full protection for EMRs. Moreover, the integrity of the EMRs can be also easily destroyed by the inevitable software/hardware failures and human errors in the cloud. In addition, different medical institutions are loath to share their data due to the privacy concerns and competitive advantages [
8,
9]. The consistency and interoperability of the different types of data from different medical institutions are big problems for data sharing [
10].
Recently, outsourcing the local EMRs to the public cloud has attracted more and more attention. This is reasonable considering that compared with local data management systems, the cloud service is more cost-effective, green, and extensible. However, similar to the centralized healthcare service systems, the cloud-based methods also have to establish sharing channels through different public cloud platforms for different data users and institutions [
11,
12]. Apparently, these methods cannot break away from the drawbacks of the centralized systems. In conclusion, although it can facilitate EMRs cross-institutional sharing compared with the traditional healthcare service systems, the information redundancy always makes the data exchanging process inefficient [
5].
The healthcare blockchain system presents a new possibility to solve the information isolated island problem in traditional centralized systems [
6,
7,
13,
14]. Similar to the Bitcoin system [
4], the blockchain provides a public, auditable, and inalterable ledger, which can guarantee the data security and transparency for transactions’ implementation. The patients can obtain continuous and trackable treatment by freely accessing the healthcare information of their EMRs from the healthcare blockchain system. The cross-institutional sharing of EMRs will be easy with many medical institutions joining in this healthcare blockchain system, so the patient do not need to construct many EMRs at different medical institutions. However, the integrated EMRs data are always too large, which will lead the system to be more bloated and inefficient. This can be explained by the fact that each EMR needs to be stored in each node of the blockchain, and hence, the total needed storage space is extremely large. Considering the great amount of the EMRs, the storage efficiency and data transmission efficiency need to be further improved.
It can be observed from the above schemes that there exists a common problem in centralized healthcare service systems and distributed healthcare blockchain systems that the data storage and data sharing processing are not efficient with the massive data. Fortunately, lightweight message sharing can solve this problem perfectly. Data storage in a distributed manner between different medical institutions is an extremely important field, and the security and integrity of EMRs also cannot be ignored. In this paper, we introduce the secret sharing technique to the blockchain, and this improves the data storage efficiency, data transmission efficiency, and the security of the EMRs. Specifically, we establish a lightweight privacy-preserving mechanism for the distributed healthcare blockchain system.
In order to protect the data privacy and improve the system efficiency, we first design an interleaving encoding algorithm and propose a lightweight message sharing scheme. The interleaving encoder divides the original EMRs into t pieces, which can hide the sensitive information of EMRs by destroying the semantic meanings. The message sharing scheme is a -threshold scheme, which constructs the former t pieces into n shares for storage. Then, the original EMRs can be reconstructed with only shares. Therefore, this message storage and sharing scheme is lightweight with shorter shares and an efficient reconstruction process. After constructing the shares of the new generated EMRs, all the shares are transmitted to different nodes on the blockchain. Note that each share of an EMR is only stored in one blockchain node. This is totally different from the traditional blockchains in which the data are repeatedly stored in all the blockchain nodes. Another challenge is how to retrieve the EMRs for the data users based on the blockchain. In our scheme, all the nodes can generate blocks and append the blocks of the chain similar to existing blockchains. However, in our blockchain, the shares of EMRs are not stored in the block, and instead, the hash values of the EMR identifiers that are related to the shares are stored in the blocks. In the retrieval process, the data users can first search the public blocks to locate the nodes where the shares of an EMR are stored and then request the shares from the nodes. Once at least t shares are received, the data user can at last recover the original EMR. Security analysis and simulation results show that the proposed scheme can not only make the EMRs data complete and secure, but also make the processes of data storage and sharing more efficient.
The main contributions of this paper are summarized as follows:
We propose a more lightweight and efficient privacy-preserving mechanism for EMRs. The EMRs can be securely and freely exchanged among different medical institutions through the distributed healthcare blockchain system.
We apply the interleaving encoder technique to the privacy-preserving mechanism. It can protect the sensitive information of the patient and medical institution by destroying the semantic meanings of the original EMRs.
We propose a new lightweight ()-threshold message sharing scheme to improve the efficiency of data processing in the healthcare blockchain system. We also present a detail security analysis of the EMRs’ privacy protection protocol, which shows the correctness and security of the proposed scheme.
We give the performance evaluation and analysis of the proposed scheme. The simulation results show that it can provide strong protection of the patient’s and medical institution’s privacy. Meanwhile, the proposed scheme is more efficient than similar literature with respect to the energy consumption and storage space.
The rest of this paper is organized as follows: In
Section 2, some related works about the healthcare service system, healthcare blockchain, and message sharing scheme are given. In
Section 3, the lightweight privacy-preserving mechanism with the interleaving encoder algorithm and (
)-threshold message sharing scheme is proposed. In
Section 4, the security analysis of the proposed scheme is presented. In
Section 5, we give the performance evaluation and analysis of the proposed scheme. In the end, the conclusions are given in
Section 6.
3. Lightweight Privacy-Preserving Mechanism of EMRs Based on Blockchain and Secret Sharing
In this section, we propose the lightweight privacy-preserving mechanism for EMRs based on secret sharing in the distributed healthcare blockchain system. The framework is shown in
Figure 2, and the main terms are listed in
Table 1. In order to improve the security and scalability of the EMR sharing system, we designed a lightweight (
)-threshold message sharing scheme for the privacy-preserving of healthcare blockchain system. Meanwhile, we discuss how to store and search the shares of EMRs. The framework mainly was comprised of two main parts: creation and storage of the shares of EMRs and the recovery and use of EMRs. The detailed steps of the protocol are shown as follows.
3.1. Creation and Storage of the Shares of EMRs
The patient and medical doctor are the main participants who create the original EMRs R. In order to hide the sensitive information and protect the whole original EMRs, they upload and store the shares rather than the plaintext EMRs in the blockchain nodes. The steps of constructing and storing the shares are presented in the following.
EMR interleaving encoding: First and foremost, the original EMRs
R will be encoded into a series of sub-messages by an interleaving encoder as shown in
Figure 3. We first divide the
l-bit original EMRs
R into
groups, and each group has
t bits. Here, we always add
-bit 0 at the end of the
l-bit string
R. Then, we encode them into
t sub-messages
with the length of
. By splitting and recombining the original information, the adversary can only obtain insignificant messages even if they can obtain several shares, because the interleaving encoder has destroyed the semantic meanings of the shares.
Construction of the EMR shares: In this step, the encoded EMRs
will be constructed into
n different shares
based on Equation (
1).
Here, p is defined as the largest prime number that is not greater than . The size of is always smaller than of p. As the size of shares is much smaller than the size l of the original message, it will make the message sharing scheme more lightweight and greatly improve the efficiency of data processing. The EMRs’ construction encrypts t sub-messages into n shares, which can further strengthen the protection of user privacy.
Storage of the shares in blockchain nodes: Through the interleaving encoder and construction of the shares, the original EMRs are encrypted into
n shares. Then, the shares will be sent to different blockchain nodes, and they are stored locally in the nodes. Meanwhile, the indexes of these shares will be uploaded into the healthcare blockchain system. Similar to the transaction verification in Bitcoin [
4,
30], all the indexes of the shares and the corresponding identifiers of the block nodes are combined together and broadcast to the whole healthcare blockchain network, i.e., all the blockchain nodes, for verification.
EMRs’ confirmation and generating a new block: When one node obtains the rights for creating a new block by the consensus mechanism, the indexes of EMR shares and the information about where they are stored in the nodes will be recorded and stored in the healthcare blockchain system. Considering that the information stored in the blocks cannot be modified, the blockchain nodes cannot deny that the corresponding EMR shares are stored by them.
3.2. Recovery and Use of EMRs
When an authorized data user wants to search an EMR, he/she first needs to search the index of the EMR on the blockchain and locate all the nodes that store the shares of the EMR. In theory, the data user needs to request at least t nodes to get the shares, and then, the original EMR can be recovered.
EMRs’ reconstruction: The authorized data users can collect a set of EMR shares
and then reconstruct the EMR
R with a specific coefficient matrix
, which will be discussed in
Section 4.
Here,
EMR shares
can reconstruct the original subsections,
, of the EMR even if a few shares are tampered with or discarded. This efficient data reconstruction process can not only make the message sharing scheme more lightweight, but also improve the efficiency of the verifying and recovering processes. A theoretical analysis of this message sharing scheme is shown in
Section 4.
EMRs’ decoding. When the subsections, , of the EMR have been reconstructed, the recovered subsections of the EMR will be decoded by the interleaving decoder and the original EMR R obtained. After that, these recovered EMRs can be processed by authorized consumers with different purposes. Apparently, the EMRs stored in the healthcare blockchain system can be used by not only the patients and the medical doctors, but also the insurance companies, researchers, and others.
In addition, in order to improve efficiency, the processes of EMRs’ interleaving encoding and construction and EMRs’ reconstruction and interleaving decoding can be embedded into the smart contract [
29]. This computer trading agreement can prevent the malicious users or adversaries from destroying the EMRs. In addition, blockchain technology makes the EMRs’ data more transparent and credible. Each EMR serves as a transaction that can be recorded into the healthcare blockchain system, which can be verified by the universal verifiable or end-to-end verifiable open blockchain audit trail. Neither the shares’ data processing in the healthcare blockchain system, nor the smart contract are within the scope of this paper, and we will devote ourselves to the security proof and performance evaluation of our proposed scheme in the following sections.
4. Security Proof and Analysis
In this paper, we assumed that the shares were encrypted before being transmitted in the blockchain system and that any proper secret negotiation algorithm could be employed to generate the secret keys. The adversary wants to access the EMRs without authorization. Apparently, the adversary can obtain all the private information about the patients and healthcare institutions once the EMRs are leaked. To get an EMR, the adversary needs to capture the shares transmitted in the network. In the following, we first analyze the correctness of the proposed privacy protection scheme. In the healthcare blockchain system, the patient and medical doctor have the rights to create the EMRs, but the unauthorized user is not allowed to join this system. As the original EMRs have been encoded with a special method and sent to many mining nodes in the form of different shares, we prove that the adversary who intercepts no more than t shares cannot recover the original EMRs. Even if the adversary obtains all the information of original EMRs, he/she cannot tamper with it without knowing the rule of the interleaving encoder. After that, the EMRs can be correctly verified and recorded by the mining nodes. If one mining node attempts to tamper with the shares, the malicious behavior will be discovered, because it cannot pass the verification of other mining nodes in the blockchain. Therefore, the proposed privacy protection scheme is correct, and the valid EMRs will be correctly collected and recorded in the blockchain.
The security analysis of the proposed privacy protection scheme is presented as follows. Here, we mainly prove that the ()-threshold lightweight message sharing scheme is secure as shown in Theorems 1 and 2.
Theorem 1. Any EMR shares can recover the integrated information of the original EMR R.
Proof. We consider the worst case that the least t shares can recover the EMR and take the following two cases as the proof of Theorem 1.
Case 1: In this case, we consider that the first
t EMR shares
constructed by Equation (1) can recover the integrated information of EMR
. The first
t EMR shares are calculated as follows:
We can present Equation (
3) in a matrix form, which is shown in Equation (
4):
where
is a coefficient matrix, which can be denoted as Equation (
5):
Next, according to Equation (
3), the coefficients in matrix
can be generated as shown in Equation (
6):
In order to prove that the first
t EMR shares
can recover the integrated information of the EMR
R, we should prove that matrix
is invertible first. With one meaning, the determinant of
, i.e.,
, is a non-zero number. Next, we transform the matrix
with the following Algorithm 1 into a diagonal matrix. Then, we can conclude that the determinant of matrix
is
, which is not zero as
. Therefore, we can uniquely obtain
according to Equation (
3) when we get the first
t EMR shares
.
Algorithm 1 Matrix transform algorithm. |
Input: Square matrix |
Output: Lower triangular determinant |
1: | Count the size of which is composed of t rows and columns |
2: | for to t do |
3: | |
4: | end for |
5: | for to t do |
6: | ; |
7: | for to do |
8: | ; |
9: | end for |
10: | for to t do |
11: | ; |
12: | end for |
13: | end for |
Case 2: We take another situation in which the first
congruence equations are chosen from Equation (
1); the other
congruence equations are obtained from the last
equations constructed by Equation (
1). The last
equations are shown as follows in Equation (
7):
Next, we plan to prove that any subset of
t EMR shares from
is equivalent to the first
t EMR shares
. Suppose we choose
EMR shares:
,
from
and choose
EMR shares:
,
from
. In this case, congruence equations can be described in the matrix form as shown in Equation (
8):
where
is a coefficient matrix, which can be denoted as Equation (
9):
Now, we need to prove that the equations constructed by
are equivalent to those constructed by
. In other words, these two equation sets should have the same solution. We can calculate the determinant of
as shown in Equation (
10):
Therefore, we can derive that the matrix is invertible since the determinant of is a non-zero number. The first t EMR shares can be linearly expressed by . Next, based on the proof of Case 1, we can derive that there should be a unique solution for the congruence equations in Case 2. □
In fact, we can rewrite the congruence Equation (
1) in the matrix form as follows:
Any subset of t EMR shares from corresponds to t rows of the matrix M. According to Case 1, we derive that the EMR information can be uniquely recovered from the first t EMR shares . According to Case 2, we derive that any subset of t EMR shares is equivalent to the first t EMR shares . Combining Case 1 and Case 2, we can derive that any t rows of the matrix M are linearly independent and any t EMR shares can decide the EMR information . Further, the original EMR R is reconstructed successfully by an interleaving decoder. This completes the proof of Theorem 1.
Based on Theorem 1, we can guarantee that it can recover the original EMR R with only t shares without obtaining all the n shares. Even if a few shares have been destroyed by the system problem, this does not affect the reconstruction of the original EMRs. Therefore, our proposed scheme can greatly improve the fault-tolerant capability of the healthcare blockchain system, which will be shown in the following performance evaluation section.
Next, we analyze the security of the message sharing scheme and prove its security in the other situation. In the healthcare blockchain system, the adversary may eavesdrop and decrypt the shares. However, we can prove that even if the adversary successfully decrypts a set of the shares, they cannot recover the original EMR R in Theorem 2.
Theorem 2. Any EMR shares of each EMR R cannot recover the integrated information of the EMR R.
Proof. Suppose that the malicious adversary can successfully obtain
EMRs shares
. According to Equation (
1), the malicious adversary can rebuild a set of congruence equations with
t variables
as follows:
Here, and . The is a matrix over a field F. Let be a field and M be a matrix over . In consideration of the augmented matrix of , , we can derive that . Depending on the ranks of and A, Theorem 2 can be proven by two cases as follows:
Case 1: The rank of matrix is not much more than that of matrix . In this case, there is no solution for the equation set, and the integrated information of EMR R cannot be recovered by the malicious adversary. It is more likely to happen once the malicious adversary obtains the wrong number of EMR shares .
Case 2: The rank of matrix is the same as the rank of matrix . In this case, there exist equations, but t variables. Hence, the malicious adversary cannot recover the integrated information of the EMR R. Then, we consider the worst case that the malicious adversary can obtain EMR shares. Unfortunately, the malicious adversary can only obtain lawful solutions.
From the above Cases 1 and 2, we can derive that the malicious adversary cannot recover integrated information of the EMR R by EMRs shares in a large-sized field. This completes the proof of Theorem 2. □
Now that Theorems 1 and 2 prove that only shares can successfully recover the integrated information of EMRs. In our proposed scheme, the EMRs that contain the sensitive information of patient and medical institution were split and reconstructed. Even though the adversary collects part of the (less than t) shares, he/she cannot recover the integrated original EMRs. Even worse, he/she obtains more (no less than t) shares, and he/she cannot obtain any information since he/she does not know the principle of the interleaving coder. Consequently, this scheme not only can ensure the data security, but also can protect the privacy of the patient and medical institution.
6. Conclusions
In this paper, we proposed a lightweight privacy-preserving cross-institution EMR sharing scheme based on the blockchain technique and a lightweight ()-threshold message sharing scheme. The interleaving encoding algorithm was employed to destroy the semantic meanings of the original EMRs and hide the sensitive information of the patient and medical institution. The ()-threshold message sharing scheme first constructed the encoded EMRs into n shorter shares, and this would improve the efficiency of the data processing. Different from existing blockchains, the shares rather than the original EMRs were stored in the blockchain nodes in a random manner. In the EMR retrieval process, the data users needed to first locate the blockchain nodes that stored the shares of the EMR of interest and requested all the related shares. Then, the original EMR could be reconstructed with at least shares. This scheme could not only protect the data security, but also improve the efficiency of data sharing between institutions and data users. Moreover, we performed a series of experiments to evaluate the performance of the proposed scheme, and the simulation results showed that it significantly decreased the energy consumption and storage space compared to existing schemes.
Our scheme could be further improved in several aspects. First, we will make an effort to design a more lightweight message sharing scheme to improve the efficiency of the EMRs data processing in our future work. Second, we will research the combination of blockchain and mobile edge computation in efficient healthcare service systems with the explosive increase of data terminals. Third, our scheme did not provide an efficient EMR retrieval mechanism; hence, we will design a novel index structure for the shares of EMRs. This could greatly improve the experience of both the data users and healthcare institutions.