Efficient and Secure Management of Medical Data Sharing Based on Blockchain Technology

Mao, Xiangke; Li, Chao; Zhang, Yong; Zhang, Guigang; Xing, Chunxiao

doi:10.3390/app14156816

Open AccessArticle

Efficient and Secure Management of Medical Data Sharing Based on Blockchain Technology

by

Xiangke Mao

¹,

Chao Li

^1,2,*,

Yong Zhang

^1,2,

Guigang Zhang

³

and

Chunxiao Xing

^1,2,4

¹

Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing 100084, China

²

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

³

Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

⁴

Institute of Internet Industry, Tsinghua University, Beijing 100084, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(15), 6816; https://doi.org/10.3390/app14156816

Submission received: 22 June 2024 / Revised: 10 July 2024 / Accepted: 11 July 2024 / Published: 5 August 2024

(This article belongs to the Special Issue Blockchain Technologies: Trends, Challenges, Potentials and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

In the current landscape of medical data management, processing data across diverse institutions and maximizing their value are paramount. However, traditional methods lack a secure and efficient mechanism for end-to-end traceability and supervision, posing challenges in distributed scenarios lacking mutual trust. Leveraging blockchain’s decentralized, tamper-proof, and traceable features, this paper introduces a blockchain-based medical data management platform. This platform enables full-process management of raw data, operational behaviors, intermediate data, and final data, meeting the needs of trusted storage and supervision of data. We propose two methods, namely, naive method and DAG-based method, to realize forward tracking and backward tracing of medical data stored on the blockchain, respectively. We validated and analyzed the storage and query performance of the medical data management platform on real medical data, and we also conducted experimental analyses on the efficiency of the proposed traceability algorithm under different data scales and processing path lengths. The results demonstrate that our platform and traceability methods effectively meet the management needs of medical data distributed across institutions.

Keywords:

blockchain technology; data provenance; data management; DAG

1. Introduction

In recent years, with the continuous improvement of human economic levels and quality of life, people’s demand for medical health has been increasing. The medical industry has become one of the largest and fastest-growing industries in the world in recent years, especially with the spread of COVID-19 worldwide, which seriously threatens human life and health, affects global economic development, and leads to a significant increase in the proportion of countries’ investment in the medical industry. Owing to the continuous updating of medical testing equipment, the continuous improvement of medical information systems, and the continuous development of artificial intelligence technology, the medical industry is developing towards intelligence, precision, and digitization. Medical data, as the foundation of smart medicine and precision medicine, plays a crucial role in biomedicine, new drug research and development, and prevention and treatment. However, medical data are affected by factors such as regional development, medical institutions’ medical level, and informationization level, and they have characteristics such as scattered distribution, inconsistent specifications, and complex types, making it difficult to form unified and effective management and even more challenging to fully explore their potential value.

To solve the problems in medical data management and better integrate medical data resources, a large number of medical data processing platforms [1,2] have emerged to facilitate data usage by medical institutions, research institutes, pharmaceutical companies, and others. These platforms collaborate with multiple hospitals and invite professional medical practitioners to annotate medical data, using professional medical data processing software, or designing new algorithms to process medical data. These processed data are widely used to establish medical models for real-world disease, which will aid in medical research [3], medical management [4], governmental public decision-making [5], and innovative drug development [6]. Although these platforms process the originally scattered and heterogeneous medical data, improving the value of medical data, it is difficult for the platforms to prove the authenticity of the data provided to customers, such as scientific research units, pharmaceutical companies, and government agencies, using existing medical data management platforms. The authenticity verification of medical data includes the authenticity verification of the original data, the new data generated during processing, the final data obtained, and the operations performed by the processors. In short, all contents related to medical data should be verifiable and traceable.

The main objective of this paper is to address the issue of authenticity verification of medical data throughout the entire management and processing process, and to provide an efficient tracking and tracing solution. In recent years, blockchain technology has been successfully applied in data tamper-proofing, data integrity verification, and data security. Therefore, this paper proposes a blockchain-based medical data management platform to ensure the security and authenticity of medical data throughout the entire processing process. The blockchain records all the intermediate data generated during the process from raw data to final data, the operational behaviors performed by operators, and even the authorizations and query access operations performed. The immutability of the blockchain is used to ensure the authenticity of all records. The platform uses a method of storing data digests on the chain and storing medical data off the chain to reduce storage pressure on the blockchain. Smart contracts are used for trusted storage and authenticity verification of data. In particular, the platform provides a fast and efficient data provenance method. Given any data record, not only can the authenticity of the data be verified, but also the data provenance algorithm provided can be used to obtain the original data, the processing operations performed, and the intermediate data generated. Similarly, given any data, the platform can identify the operational behaviors that were performed to process the data into new data. After obtaining authorization, users can access all data within the scope of their permissions through smart contracts.

This paper describes a medical data processing management platform based on the open-source blockchain platform Fabric [7]. The platform is designed to support the trustworthy storage of medical data, processing data, authorization data, and query records from multiple medical institutions. The platform also supports regulators to monitor all transaction data within the platform using smart contracts. Based on business requirements, two different provenance algorithms were designed to support provenance of medical data and operational behaviors. Finally, the effectiveness of the proposed data provenance and query methods was verified in real-world scenarios and medical data. The experimental results show that the designed platform can facilitate the management of medical data by regulatory departments and also verify the efficiency of data queries and provenance. The main contributions of this paper are as follows:

We designed and implemented a medical data storage, processing, sharing, and regulatory platform based on Fabric.
Based on the characteristics of medical and operational behavior data stored on the blockchain, we propose a DAG-based forward tracking and backward tracing method to ensure the efficiency of tracking and tracing within the limits of accessible block intervals and time intervals.
The throughput of data transaction storage on the designed blockchain platform was verified using medical data and operational behaviors data, as well as the efficiency of data provenance algorithms and query methods.

The rest of the paper is organized as follows. In Section 2, we discuss the research work related to this paper. In Section 3, we design the structure of the data management platform based on Fabric from the perspective of medical data management and provenance business requirements. In Section 4, we introduce the data structure of medical data and operational behavior and elaborate on the two provenance algorithms proposed. In Section 5, we design experiments based on the designed blockchain platform, test the throughput of transaction storage, provenance, and query efficiency, and analyze the results of the tests. In Section 6, we summarize the entire paper and highlight possible future research work.

2. Medical Data Management Platform

Since Satoshi Nakamoto published the white paper of Bitcoin [8], the blockchain technology that supports Bitcoin has made great progress in the fields of Internet of Things [9,10,11], healthcare [12,13], finance [14,15], etc. At present, there are some review papers that have described blockchain technology from the aspects of consensus algorithm [16,17,18], application scenarios [19,20,21], and transaction data management [22,23]. As the foundation of smart healthcare, medical data are closely related to personal privacy, medical research, and drug research. In order to explore the value of medical data, many research works have used technologies such as data storage, data fusion, privacy protection, and federated learning to break the data island and realize the management of medical data [24,25,26,27]. However, the traditional methods lack a trust mechanism, and it is difficult to meet the needs of trusted sharing and secure provenance of data in a multi-party untrusted environment. In view of the decentralized, tamper-resistant, traceable, and other features of blockchain, some studies have used blockchain technology to solve problems such as storage and sharing of medical data, privacy protection, and provenance, so as to achieve trusted management and provenance of medical data among multiple organizations.

2.1. Blockchain and Medical Data Storage

Ref. [28] first proposed the application of blockchain technology to the trusted management and sharing of medical data. They designed a blockchain-based medical information sharing scheme called MedRec based on Ethereum [29]. This scheme achieved decentralized integration of medical data across different medical organizations, allowing third-party users to access the data with permission. However, the PoW (proof of work) protocol used in MedRec has issues such as low throughput and high energy consumption. To overcome the limitations of block size and PoW consensus cost on large-scale medical data storage management, Ref. [30] designed the MedBlock system to address the problem of patient medical records being scattered and difficult to integrate in different hospital databases. This system uses an improved consensus mechanism to achieve efficient medical record access and retrieval. MediChain [31] is based on Hyperledger Fabric and implements patient-centered medical data management. It uses off-chain storage for data assets to reduce blockchain storage pressure and allows users to access data through standard browsers and mobile applications. The approach of storing raw data off-chain has also been used in [32,33]. Ref. [34] developed a medical data sharing model called MedChain to enable flexible sharing of real-time data generated by sensors and monitoring devices, as well as efficient support for metadata updates. Ref. [35] proposed a distributed multi-level management solution for EMR (Electronic Medical Record) data storage security based on blockchain, utilizing smart contracts to support automatic management and sharing of EMRs, as well as patient control over data access permissions. Ref. [36] analyzed security, privacy, and efficiency issues in medical data management based on public and permissioned blockchains, and proposed a medical data sharing and privacy protection electronic health system called SPChain. SPChain uses special key-blocks and micro-blocks to store patient EMRs, achieved improved retrieval speed, and used a re-proxy encryption scheme to enable patient medical data sharing while protecting privacy.

2.2. Blockchain and Medical Data Privacy

To address privacy protection issues in blockchain-based medical data management solutions, Ref. [37] proposed a medical data privacy protection scheme based on blockchain and group signatures, and asymmetric encryption was proposed to achieve reliable medical data sharing among medical institutions while protecting patient data privacy. To solve the problems of traditional EHR systems being vulnerable to attacks and lacking privacy protection, Ref. [38] established a trusted platform for sharing encrypted EHRs based on blockchain and smart contracts. The platform uses ABE (Attribute-Based Encryption) scheme with fixed-size attributes to achieve fine-grained access control. It also features an efficient multi-keyword Boolean search scheme to facilitate users in searching encrypted EHRs. Ref. [39] introduced an authorization mechanism and Attribute-Based Encryption (ABE) algorithm to protect data privacy while sharing data among multiple medical institutions. To enhance the security and privacy of IoT medical devices, Ref. [40] utilized blockchain technology and re-encryption proxies on the basis of data encryption. They also adopted the PoA (Proof of Activity) protocol to accelerate data storage speed. Ref. [41] combined federated learning and blockchain technology to ensure the security of IoMT data transmitted to the cloud server and completed model training while ensuring data privacy. Ref. [42] proposed a blockchain-based privacy-protected distributed application to create and maintain medical certificates, and used smart contracts to establish security rules to prevent various attacks such as collusion, phishing, and impersonation in traditional medical schemes.

2.3. Blockchain and Medical Data Provenance

Ref. [43] uses blockchain and trusted execution environment (TEE) technology to solve the fragmentation of mHealth data information traceability and security issues. Blockchain is used to record and enforce data access policies, ensuring that only authorized users can access data. TEE is used to protect the confidentiality and integrity of data during processing. Ref. [44] combined blockchain and IoT to develop an intelligent health supply chain management system that tracks medical products, avoids counterfeit products and medical device damage, reduces management costs, and allows users to understand the usage status of products from manufacturing to use. Ref. [45] proposed an effective and reliable vaccine-tracking and -monitoring scheme based on blockchain and smart contracts, allowing for the complete history of each vaccine from receipt to use or expiration date to be traced to ensure the safety of vaccine use. Similar methods were also used in [46,47] for drug tracking. Ref. [48] designed an intelligent tracking and tracing platform for decentralized drug supply chain tracing using a five-layer blockchain and IoT technology. The platform establishes on-chain and off-chain standards to determine whether drug information is stored on-chain or off-chain, and uses smart contracts to address drug quality problems.

In summary, the aforementioned works have confirmed the function of blockchain, smart contracts, and IoT technologies in ensuring the trusted storage, secure sharing, and traceability of medical data. Nevertheless, to the best of our knowledge, the majority of current research endeavors primarily focus on managing original medical data, without exploring the utilization of blockchain for credibly managing the entire life-cycle of medical data, encompassing raw data, operational behaviors data, intermediate data, and final data. Additionally, no relevant tracking and tracing methods have been proposed on the blockchain, especially under the constraints of block intervals, time intervals, and other permissions.

3. Medical Data Management Platform

In order to manage medical data, operational behaviors, authorized data, and access records in a trustworthy manner, this study designed and implemented a data management platform based on Fabric according to the actual business process. The process of medical data management is shown in Figure 1. The process can be divided into four stages.

Patients authorize the hospital to use the medical data. Since medical data involve personal privacy, these medical data recorded in the system must be authorized by patients.
Data processing. Medical data have are multi-source, heterogeneous, and unstructured. In order to better mine the value of medical data, it is necessary to process the tables and files that store medical data through manual or software processing so that the processed data can better serve medical research, drug development, health assessment, etc.
The data managers of hospitals upload the summary of original data, operational behavior (human/software), and processing results to the blockchain system. The operational behaviors are linked to the original data and newly generated data, which is the key to tracing the data.
Users can query and trace data through the interface provided by the platform. At the application layer of the platform, we designed interfaces for query and provenance, and users can trace data according to query conditions. On the platform, we provide two types of data traceability: forward tracking and backward tracing. For forward tracking, users can find out what new data are generated after any data are processed. For backward tracing, users can find out which data and operational behaviors are used to obtain any data.

In this study, we do not focus on the authorization process in the first stage and the concrete processing methods in the second stage. In the first stage, we can obtain the original data. In the second stage, we obtain the newly generated data, operational behaviors, and processing results. In the third stage, we store the original data, intermediate data, and processing results in a database or file system and store the hash value, storage location, and operational behaviors of the data on the blockchain. In the fourth stage, we query and trace medical data based on the data stored on the blockchain.

The core of building a medical data management platform lies in the construction of a blockchain network. We implemented the blockchain network based on Hyperledger Fabric. The structure of the blockchain network is shown in Figure 2. To meet regulatory requirements, we created a regulatory organization based on the method in [49]. Whenever a new channel is created, the platform automatically adds the regulatory organization to the channel. Peers in the regulatory organization can only query and trace data through chaincode to complete supervision. Next, we introduce several key concepts in blockchain networks.

In Figure 2, organizations represent entities such as enterprises and institutions. In this paper, organizations mainly refer to hospitals, regulatory agencies, or platforms. In each organization, there are several peer nodes, which refer to logical nodes that provide transaction endorsement, transaction verification, and ledger submission services. The Orderer node is mainly responsible for ordering transactions. It provides a pluggable consensus protocol, and in this platform, the Raft protocol is used to achieve consensus between Orderers. Channel is used to store blockchain ledgers, and only nodes within the channel can access the ledger data. Multiple channels can be established in a network to achieve data isolation. Client is mainly used for transaction submission and provenance queries, and does not require the storage of all transaction data. For a more detailed understanding of the concepts and transaction processes in Fabric, you can refer to the official documentation https://hyperledger-fabric.readthedocs.io/en/release-2.4/whatis.html (accessed on 10 July 2024) provided by Hyperledger Fabric.

Based on the constructed blockchain network, it is possible to store and access medical data, operational behaviors, authorization records, and access records through the construction of chaincodes. Next, we elaborate on how to achieve trusted provenance of medical data stored on the blockchain.

4. Medical Data Provenance Methods

4.1. Definition and Examples of Medical Data Provenance

When performing medical data provenance on the platform, we mainly focus on two types of transactional data: medical data and operational behaviors. Operational behaviors can link the original medical data, intermediate data, and final data, ultimately allowing the provenance-related transactional data on the chain to be represented in a graph. In Figure 3, we provide a schematic diagram of the topological structure between medical data and operational behaviors. Each vertex in the graph represents a unique record of medical data, and the edges represent operational behaviors. Edges with the same number belong to the same operational behaviors. Vertices with zero in-degree represent the original data, and vertices with zero out-degree represent the final data. In this paper, the processing process of medical data represented by a graph is a directed acyclic graph.

In this paper, we study two types of data provenance methods: forward tracking and backward tracing. Forward tracking refers to the process of finding out what processing operations have been performed on a given data record and what new data have been generated. Backward tracing refers to the process of finding out which original data were used to generate a given data record, what operational behaviors were performed on it, and what intermediate data were generated. Below, we provide a formal description of data provenance in this paper.

Let S represent a complete blockchain with n blocks, i.e.,

S = {B_{1}, B_{2}, \dots, B_{n}}

. Each block

B_{i}

is represented as a tuple

B_{i} = 〈 H_{i}, K_{i} 〉

, where

H_{i}

represents the block header information of block

B_{i}

;

K_{i}

represents the transaction data in block

B_{i}

, which is a set composed of m transactions, i.e.,

K_{i} = {T X_{1}, T X_{2}, \dots, T X_{m}}

. The provenance query can be expressed as

Q = 〈 P, S, [f, t] 〉

, where P represents the query conditions, and

[f, t]

represents the time or block interval for query or provenance. After executing the query based on the query conditions P, all transaction data that exist directly or indirectly dependent on the transaction

T X_{i}

within the interval

[f, t]

are returned to form Q. In this paper, if

[f, t]

represents a time interval, then

i n d e x = t i m e (T X)

indicates the time when

T X_{i}

was generated. If

[f, t]

represents a block number interval, then

i n d e x = b l o c k (T X)

indicates the block number where

T X_{i}

is stored. If

i n d e x < f

, forward tracking will be performed to obtain all related transactions within the interval

[f, t]

with transaction

T X_{i}

. If

i n d e x \in [f, t]

, both forward tracking and backward tracing will be performed. If

i n d e x \geq t

, backward tracing will be performed.

Let us take Figure 3 as an example to illustrate the medical data provenance described in this paper. For the vertex L, the out-degree is 0, so we can only perform backward tracing. The results of backward tracing for L can be represented as ((J, K), g, L), ((G, H), e, J), ((G), f, K), ((C, D), b, G), ((D, E), c, H). For the vertex G, both the in-degree and out-degree are not 0, so we can perform both forward tracking and backward tracing. The results of forward tracking for G are ((F, G), d, (I)), ((G, H), e, (J)), ((G), f, (K)), ((J, K), g, (L)), and the results of backward tracing for G are

((C, D), b, (G))

. For the vertex A, the in-degree is 0, so we can only perform forward tracking. The results of forward tracking for A are

((A, B, C), a, (F)), ((F, G), d, (I))

.

4.2. Data Structure of Medical Data and Operational Behaviors

The medical data and operational behaviors data stored on the blockchain are the foundation for medical data provenance. Combining the characteristics of the business and the requirements for provenance, we designed the data structure of medical data and operational behaviors as shown in Table 1 and Table 2.

Based on the structures of medical data and operational behaviors, the original medical data, intermediate results, and operational behaviors can be organized into a directed acyclic graph (DAG) structure as shown in Figure 3, which facilitates the implementation of subsequent provenance algorithms. Next, we introduce two provenance methods: the naive provenance method and the DAG-based provenance method.

4.3. Naive Provenance Method

In data provenance, based on the data structure of medical data and operational behaviors data, medical data and operational behaviors data can be represented as shown in Figure 4.

In Figure 4, we divide medical data into raw medical data and processed medical data. According to the designed data structure, the BehaviorSign field in the raw medical data is NULL, while the BehaviorSign field in the processed medical data points to a specific operational behavior record, indicating that the processed medical data must have been obtained through an operation. When given a piece of processed medical data, the values of the EventInputs and EventOutputs fields can be obtained through the operational behaviors data pointed to by the BehaviorSign field, allowing us to know which data and how they were processed to obtain these data.

Based on the designed data structure, the forward tracking and backward tracing methods are shown in Algorithms 1 and 2, respectively.

Algorithm 1: Naive provenance—backward tracing

Algorithm 2: Naive provenance—forward tracking

From the algorithm description, we can see that the time complexity in the backward tracing phase is

O (m)

, while the time complexity in the forward tracking phase is

O (t * p)

, where m and t are the path lengths of forward tracking and backward tracing, respectively, and p represents the number of al behaviors data stored in the interval

[l, r]

. We can see that when the value of the interval range

| r - l |

increases, the value of p also increases rapidly, which will cause the efficiency of forward tracking to decrease rapidly, while having almost no effect on the efficiency of backward tracing. In order to improve the efficiency of querying, we combine the characteristics of medical data and operational behaviors and propose a DAG-based algorithm under the premise of ensuring data security and trustworthiness. Next, we will introduce this algorithm in detail.

4.4. DAG-Based Algorithm

In addition to the original medical data, all processed medical data correspond to a certain operational behavior. Considering the dependence between medical data and operational behaviors data, this paper uses DAG to reconstruct all medical data and operational behaviors data stored on the blockchain, thereby converting the data provenance on the chain into a DAG. In the reconstructed DAG, vertices represent medical data and edges represent operational behaviors data. The schematic diagram of the reconstruction is shown in Figure 3.

In order to facilitate forward tracking and backward tracing, we construct two DAGs

G_{1}

and

G_{2}

to reconstruct the data on the chain.

G_{1}

and

G_{2}

have the same content except for the opposite direction of edges. In order to save storage space, we use adjacency lists to store the graph and load the adjacency list into memory for tracking or tracing. To avoid reconstructing the DAG each time a node is disconnected and reconnected, we serialize the adjacency list in memory to the local. In order to support data tracing within the blockchain interval, we design an adjacency list with block numbers to indicate which block each transaction appears in. Below, we describe the design of the adjacency list with the DAG shown in Figure 3 as an example.

First, we number the vertices and edges in the graph using a hash table. To facilitate data access, we encode the keys of the hash table in an increasing order from 0. The encoding method for vertices is shown in Table 3 and Table 4. Examples of constructing forward and backward tracing adjacency tables are shown in Figure 5 and Figure 6, respectively.

According to the constructed forward tracking and backward tracing adjacency list, performing a breadth-first search algorithm on the DAG can quickly achieve results. The query process can be described in Algorithm 3.

Algorithm 3: DAG-based provenance

In the DAG-based data provenance method, since the file storing the DAG is not stored on the blockchain, in order to prevent the file from being tampered with and affecting the accuracy of the query results, we designed a verification method in the implementation process to determine whether the file storing the DAG has been tampered with. This method updates the DAG, and the file storing the DAG every n new blocks is generated and stores the updated hash value of the file storing the DAG as a special transaction on the blockchain.When the system reads and loads the DAG file every time, it needs to verify whether the hash value of the file storing the DAG is consistent with that stored on the blockchain, and if it is consistent, it will load directly, otherwise it will reconstruct the DAG file and load it again.

5. Experiments

In order to verify the effectiveness of the constructed medical data management platform, we conducted data upload and query experiments, as well as data provenance experiments. We elaborate on the experiments from the experimental environment, experimental data, and experimental results and analysis.

5.1. Software and Hardware Environment

In the experiment, a total of three hosts on Alibaba Cloud were used, all of which were equipped with Ubuntu 20.04 systems, 64G memory, 500G hard drives, of which the machine deployed the Orderer cluster had 32 cores, while the other two had 16 cores. The network bandwidth was 1 Gbps. The software used included Docker v20, Hyperledger Fabric 2.4, Fabric-CA 1.5, caliper 0.5, Jmeter 5.6.3, Golang 1.8+, shell script, and NodeJs. Among them, Caliper, Jmeter, and Nodejs were used to implement throughput testing for the blockchain network built, while the rest were used for blockchain network construction. Finally, seven organizations were deployed on three machines, including one regulatory organization, one medical cloud organization, and five hospital organizations. Three peer nodes were set up in each organization. The distribution of organizations among the three machines is shown in Figure 7.

5.2. Experimental Data

In the experiment, we only used medical data and operational behaviors data for testing. The data structure of medical data and operational behaviors data are given in Section 4.2. In the experiment, we used 5 groups of medical data of different scales and corresponding operational behaviors data. In the 5 groups of data, after a statistical analysis of the processing path length of medical data, it was found that the original data with a path length of 1 were about 25%, 2 were about 20%, 3 were about 20%, 4 were about 15%, 5 were about 10%, 6 were about 5%, and the rest were about 5%. The data are shown in Table 5.

In the data upload and query experiment, we stored medical data and operational behaviors data on the blockchain by smart contracts and used data signatures as query conditions to query medical data and operational behaviors data on the chain. In the data provenance experiments, we used the signature of medical data as the query condition to perform backward tracing and forward tracking of medical data.

5.3. Experimental Results and Analysis

5.3.1. Data Upload and Query Experiment

In order to test the performance of the blockchain network that we built, we conducted data upload and query experiments. We tested the impact of the number of organizations on network performance by changing the number of organizations in the blockchain network. In the experiment, we selected caliper as the testing tool and performed performance tests on networks containing 3–7 organizations. In caliper, we set up 10 workers to complete 100,000 random uploads and queries of medical and operational behaviors data. We used metrics such as Send Rates, Maximum Latency, Min Latency, Avg Latency, and Throughput to measure the performance of the medical data management platform. The results after five tests and their average are shown in Table A1, Table A2, Table A3, Table A4 and Table A5.

For uploading medical data, when the number of organizations is 3, the send rate is 1361.7 TPS (transaction per second) and the throughput is 1349.3 TPS. However, as the number of organizations increases to 7, the send rate decreases to 794.7 TPS and the throughput drops to 782.8 TPS. Simultaneously, the maximum latency increases from 2.05 s to 2.28 s, and the average latency increases from 0.11 s to 0.37 s. Similar results are observed for uploading operational behaviors data, where the send rate decreases from 1256.3 TPS to 789.9 TPS, the throughput decreases from 1247.1 TPS to 778.6 TPS, the maximum latency increases from 2.03 s to 2.04 s, and the average latency increases from 0.10 s to 0.37 s.

When querying medical data, the increase in the number of organizations leads to a decrease in the send rate from 1645.3 TPS to 1192.4 TPS and in throughput from 1603.2 TPS to 1179.9 TPS. The maximum latency increases from 1.95 s to 2.01 s, and the average latency increases from 0.09 s to 0.26 s. Similar results are observed for querying operational behaviors data, where the send rate decreases from 1640.8 TPS to 1184.7 TPS, throughput decreases from 1599.8 TPS to 1173.8 TPS, maximum latency increases from 1.92 s to 2.08 s, and average latency increases from 0.09 s to 0.27 s.

In the upload and query experiments, the core metric is throughput. To facilitate observing the relationship between throughput and the number of organizations, we use Figure 8 to display the throughput values in Table A1, Table A2, Table A3, Table A4 and Table A5. From Figure 8, we can observe that as the number of organizations increases, both upload and query throughput tend to decrease. This is because as the number of organizations increases, the cost of consensus and communication between organizations rises, requiring more resources, leading to a decrease in throughput. In the platform, queries are also considered as transactions, which leads to a smaller difference in throughput between queries and uploads. However, in all experiments, the throughput value of queries is higher than uploads. We speculate that when the platform processes query transactions, the overhead of writing query transactions is lower than that of writing medical or operational behaviors data. The throughput of uploading medical data is slightly higher than that of operational behaviors data, because the size of individual operational behaviors data is slightly larger than that of medical data, leading to more time spent on writing operational behaviors data and a subsequent decrease in throughput. In the tests of querying operational behaviors and medical data, their throughputs are quite close. This is because the time spent on reading and obtaining data in nodes is negligible compared to the time for consensus among nodes for query transactions, and ultimately the throughput rate of query depends on the efficiency of reaching consensus among nodes in the platform.

By testing the throughput of uploading and querying data in different numbers of organizations, and considering the actual business requirements of medical data management, the platform we have built can meet the needs of data uploading and querying. Next, we describe a test of the provenance performance of the platform.

5.3.2. Data Provenance Experiments

In this paper, we propose two different data provenance methods. To test the performance of the two provenance methods, we conducted experiments based on the five sets of data listed in Table 5. We first uploaded the medical and operational behaviors data to the blockchain through the platform interface, and then used the interfaces and testing tools to conduct performance testing.

First, we tested the average provenance performance of the proposed method on the platform. In this experiment, we used the first set of data listed in Table 5, which includes 200,000 medical data and 76,853 operational behaviors data for testing. During the testing process, due to the low efficiency of the naive method’s forward tracking and the absence of smart contracts in DAG-based methods, caliper tools cannot be used for testing. Therefore, we used Jmeter tool for performance testing. We used time cost, transfer rate, time per request, and throughput metrics to measure performance. In the naive method’s forward tracking test, we only sent 1000 requests, while the rest were sent with 100,000 requests and the number of concurrent requests was set to 10. The experimental results are shown in Table 6.

From the experimental results in Table 6, we can see that the TPS of the DAG-based method is far superior to the naive method, especially in the naive method where the TPS of forward tracking is only 0.54, which cannot meet the performance requirements of data provenance. In the naive-forward method, the TPS of backward tracing is approximately 718 times that of forward tracking, which is due to the fact that each time forward tracking is executed, it takes a significant amount of time to obtain all operational behaviors data, while backward tracing does not take a lot of time. The TPS of the DAG-based method is approximately 26 times that of naive-backward because in naive-backward, each query needs to query the blockchain ledger, while the DAG-based method only needs to perform breadth-first search on a pre-constructed DAG to obtain all paths. Considering the low TPS of naive-forward, we did not test it in subsequent experiments.

In the next experiment, we tested the impact of data scale and path length on the TPS of the naive-backward, DAG-forward, and DAG-backward methods. We uploaded the data of different sizes in Table 5 to the blockchain and then conducted throughput tests based on path lengths of 1, 2, 3, 4, 5, 6, and ≥7. The test results are shown in Table 7.

From Table 7, we can find that as the path length increases, the TPS values of all methods show a downward trend. The TPS of naive-backward method at path 1 is about 10 times that at paths with length

\geq 7

, and the TPS of DAG-forward and DAG-backward are about 20 times, indicating that the TPS of tracing is greatly affected by path length. Under different data scales, the TPS of all methods at the same path length are relatively close, indicating that the TPS of tracing is less affected by data scale. We speculate that this is because both data on blockchain and in DAG are stored in the form of key–values, and changing the data scale has little impact on the speed of data reading. The TPS values of DAG-forward and DAG-backward methods are very close under different data scales and path lengths. In theory, the TPS values of DAG-forward and DAG-backward should be the same, but due to the impact of network environment, the TPS values fluctuated.

5.4. Discussion

During the data provenance experiments, we did not take into account constraints for provenance. Now, let us qualitatively analyze data provenance with constraints. Due to factors such as network, operators, and latency, medical and operational behaviors data on the blockchain are not stored in a strictly sequential order. Therefore, conducting provenance within a specific block range or time interval may not yield complete results. For example, when conducting provenance for data d with given constraints

[f, t]

, if data d is not within the range

[f, t]

but there are some medical data related to d within

[f, t]

, tracking or tracing solely within

[f, t]

will not yield the desired results. Consequently, it is necessary to first obtain the complete provenance results from the entire blockchain and then filter them based on the constraints

[f, t]

. The primary difference between constrained and unconstrained provenance lies in the final filtering step. Compared to the cost of data provenance process, the time taken for filtering based on

[f, t]

can be negligible. As a result, constrained and unconstrained data provenance have almost similar efficiencies.

In the above experiments, although the DAG-based methods are much better than the naive method in terms of TPS, the DAG-based methods require additional storage space to store the forward and backward DAG files and need to load the DAG from the file into memory. They also need to store the HASH value of the DAG file as a transaction on the blockchain for anti-tampering verification. Therefore, for applications with low TPS requirements and few forward-tracking queries, the naive method may be more suitable. For large-scale data tracing applications, the use of DAG-based methods is necessary to meet performance requirements. In subsequent research, we will improve the data storage and verification methods in the DAG and comprehensively improve the data tracking and tracing methods based on the DAG.

6. Conclusions

In this study, we built an efficient medical data management platform based on Fabric and implemented data uploading and querying for medical, operational behaviors, authorization, and other data through smart contracts. We designed data structures for medical data and operational behaviors data and proposed two data provenance methods, naive and DAG-based, to achieve full-process tracking or tracing of medical data. In our experiments, we tested the platform’s data upload and query throughput and found that the platform meets the storage and query performance requirements for medical data management. Furthermore, we tested the provenance performance of naive and DAG-based methods under different data sizes and path lengths. The results show that the DAG-based method significantly outperforms the naive method in terms of TPS. For instance, with 200,000 records and path length 1, the DAG-forward and DAG-backward methods achieved a TPS of 21,099.91 and 21,767.52, respectively, while the naive method only achieved 537.20. Even with an increased data size of 1,000,000 records, the DAG-forward and DAG-backward methods maintained a high TPS of 20,815.98 and 19,981.06, compared to the naive method’s 510.49.

However, the DAG-based method requires more memory and additional storage space and necessitates extra mechanisms to prevent tampering of off-chain stored files. Therefore, in our future work, we plan to integrate a graph database into the platform’s blockchain network to reduce storage and memory overheads of the DAG methods, simplify the data verification method, and further improve the performance of data provenance. In subsequent research, we will also consider more complex scenarios of user permission restrictions, design specific index structures for different constraints, and, thus, enhance the efficiency of data provenance under various constraints.

Author Contributions

X.M.: Conceptualization, Writing, Data analysis, Methodology, Validation, Data curation. C.L.: Conceptualization, Supervision, Investigation. Y.Z.: Conceptualization, Formal analysis, Writing—review and editing. G.Z.: Conceptualization, Investigation, Methodology. C.X.: Conceptualization, Resources, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Social Science Fund of China (22&ZD141).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available due to [privacy].

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Experimental results of 3 organizations.

Name	Succ	SendR	MaxL (s)	MinL (s)	AvgL (s)	Throughput
upload medical data	100,000	1361.7	2.05	0.03	0.11	1349.3
upload OB data	100,000	1256.3	2.03	0.03	0.10	1247.1
query medical data	100,000	1645.3	1.95	0.02	0.09	1603.2
query OB data	100,000	1640.8	1.92	0.02	0.09	1599.8

Where SendR is Send Rates, MaxL is Max Latency, MinL is Min Latency, AvgL is Avg Latency.

Table A2. Experimental results of 4 organizations.

Name	Succ	SendR	MaxL (s)	MinL (s)	AvgL (s)	Throughput
upload medical data	100,000	1261.9	2.06	0.04	0.14	1247.8
upload OB data	100,000	1162.0	2.04	0.03	0.14	1103.2
query medical data	100,000	1565.2	2.03	0.03	0.13	1519.2
query OB data	100,000	1569.3	2.02	0.02	0.14	1510.8

Table A3. Experimental results of 5 organizations.

Name	Succ	SendR	MaxL (s)	MinL (s)	AvgL (s)	Throughput
upload medical data	100,000	1165.2	2.06	0.04	0.23	1159.9
upload OB data	100,000	1091.5	2.04	0.03	0.23	1042.6
query medical data	100,000	1470.1	2.04	0.03	0.23	1451.2
query OB data	100,000	1465.8	2.05	0.03	0.22	1438.8

Table A4. Experimental results of 6 organizations.

Name	Succ	SendR	MaxL (s)	MinL (s)	AvgL (s)	Throughput
upload medical data	100,000	900.6	2.06	0.04	0.32	885.6
upload OB data	100,000	843.9	2.06	0.03	0.32	807.8
query medical data	100,000	1288.7	2.05	0.03	0.29	1243.5
query OB data	100,000	1276.3	2.05	0.03	0.30	1248.6

Table A5. Experimental results of 7 organizations.

Name	Succ	SendR	MaxL (s)	MinL (s)	AvgL (s)	Throughput
upload medical data	100,000	794.7	2.28	0.04	0.37	782.8
upload OB data	100,000	789.9	2.04	0.04	0.37	778.6
query medical data	100,000	1192.4	2.01	0.03	0.26	1179.9
query OB data	100,000	1184.7	2.08	0.04	0.27	1173.8

References

Yang, C.-T.; Liu, J.-C.; Chen, S.-T.; Lu, H.-W. Implementation of a big data accessing and processing platform for medical records in cloud. J. Med. Syst. 2017, 41, 149. [Google Scholar] [CrossRef] [PubMed]
Semenov, I.; Osenev, R.; Gerasimov, S.; Kopanitsa, G.; Denisov, D.; Andreychuk, Y. Experience in developing an FHIR medical data management platform to provide clinical decision support. Int. J. Environ. Res. Public Health 2020, 17, 73. [Google Scholar] [CrossRef] [PubMed]
Dong, Q.; Zhang, W.; Wu, J.; Li, B.; Schron, E.H.; McMahon, T.; Shi, J.; Gutman, B.A.; Chen, K.; Baxter, L.C.; et al. Applying surface-based hippocampal morphometry to study APOE-E4 allele dose effects in cognitively unimpaired subjects. Neuroimage Clin. 2019, 22, 101744. [Google Scholar] [CrossRef] [PubMed]
Batko, K.; Śezak, A. The use of Big Data Analytics in healthcare. J. Big Data 2022, 9, 3. [Google Scholar] [CrossRef] [PubMed]
Park, S.; Bekemeier, B.; Flaxman, A.; Schultz, M. Impact of data visualization on decision-making and its implications for public health practice: A systematic literature review. Inform. Health Soc. Care 2022, 47, 175–193. [Google Scholar] [CrossRef] [PubMed]
Rebane, J.; Samsten, I.; Papapetrou, P. Exploiting complex medical data with interpretable deep learning for adverse drug event prediction. Artif. Intell. Med. 2020, 109, 101942. [Google Scholar] [CrossRef] [PubMed]
Androulaki, E.; Barger, A.; Bortnikov, V.; Cachin, C.; Christidis, K.; De Caro, A.; Enyeart, D.; Ferris, C.; Laventman, G.; Manevich, Y.; et al. Hyperledger fabric: A distributed operating system for permissioned blockchains. In Proceedings of the Thirteenth EuroSys Conference, Porto, Portugal, 23–26 April 2018; pp. 1–15. [Google Scholar]
Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash System. Decentralized Business Review (2008). Available online: https://assets.pubpub.org/d8wct41f/31611263538139.pdf (accessed on 3 March 2024.).
Zhang, Y.; Gai, K.; Xiao, J.; Zhu, L.; Choo, K.-K.R. Blockchain-empowered efficient data sharing in internet of things settings. IEEE J. Sel. Areas Commun. 2022, 40, 3422–3436. [Google Scholar] [CrossRef]
Liu, Y.; Wang, J.; Yan, Z.; Wan, Z.; Jantti, R. A survey on blockchain-based trust management for internet of things. IEEE Internet Things J. 2023, 10, 5898–5922. [Google Scholar] [CrossRef]
Tsang, Y.; Lee, C.; Zhang, K.; Wu, C.; Ip, W. On-chain and off-chain data management for blockchain-internet of things: A multi-agent deep reinforcement learning approach. J. Grid Comput. 2024, 22, 16. [Google Scholar] [CrossRef]
Marichamy, V.S.; Natarajan, V. Blockchain based securing medical records in big data analytics. Data Knowl. Eng. 2023, 144, 102122. [Google Scholar] [CrossRef]
Wang, T.; Wu, Q.; Chen, J.; Chen, F.; Xie, D.; Shen, H. Health data security sharing method based on hybrid blockchain. Future Gener. Comput. Syst. 2024, 153, 251–261. [Google Scholar] [CrossRef]
Mhlanga, D. Block chain technology for digital financial inclusion in the industry 4.0, towards sustainable development? Front. Blockchain 2023, 6, 1035405. [Google Scholar] [CrossRef]
Baliker, C.; Baza, M.; Alourani, A.; Alshehri, A.; Alshahrani, H.; Choo, K.R. On the applications of blockchain in fintech: Advancements and opportunities. IEEE Trans. Eng. Manag. 2024, 71, 6338–6355. [Google Scholar] [CrossRef]
Lashkari, B.; Musilek, P. A comprehensive review of blockchain consensus mechanisms. IEEE Access 2021, 9, 43620–43652. [Google Scholar] [CrossRef]
Gramoli, V.; Tang, Q. The future of blockchain consensus. Commun. ACM 2023, 66, 79–80. [Google Scholar] [CrossRef]
Xu, J.; Wang, C.; Jia, X. A survey of blockchain consensus protocols. ACM Comput. Surv. 2023, 55, 278. [Google Scholar] [CrossRef]
Krichen, M.; Ammi, M.; Mihoub, A.; Almutiq, M. Blockchain for modern applications: A survey. Sensors 2022, 22, 5274. [Google Scholar] [CrossRef] [PubMed]
Das, D.; Banerjee, S.; Chatterjee, P.; Ghosh, U.; Biswas, U. Blockchain for intelligent transportation systems: Applications, challenges, and opportunities. IEEE Internet Things J. 2023, 10, 18961–18970. [Google Scholar] [CrossRef]
Ghosh, P.K.; Chakraborty, A.; Hasan, M.; Rashid, K.; Siddique, A.H. Blockchain application in healthcare systems: A review. Systems 2023, 11, 38. [Google Scholar] [CrossRef]
Yaqoob, I.; Salah, K.; Jayaraman, R.; Al-Hammadi, Y. Blockchain for healthcare data management: Opportunities, challenges, and future recommendations. Neural Comput. Appl. 2021, 34, 11475–11490. [Google Scholar] [CrossRef]
Wei, Q.; Li, B.; Chang, W.; Jia, Z.; Shen, Z.; Shao, Z. A survey of blockchain data management systems. ACM Trans. Embed. Comput. Syst. (TECS) 2022, 21, 1–28. [Google Scholar] [CrossRef]
Dash, S.; Shakyawar, S.K.; Sharma, M.; Kaushik, S. Big data in healthcare: Management, analysis and future prospects. J. Big Data 2019, 6, 54. [Google Scholar] [CrossRef]
Kaissis, G.A.; Makowski, M.R.; Ruckert, D.; Braren, R.F. Secure, privacy-preserving and federated machine learning in medical imaging. Nat. Mach. Intell. 2020, 2, 305–311. [Google Scholar] [CrossRef]
Tao, X.; Velasquez, J.D. Multi-source information fusion for smart health with artificial intelligence. Inf. Fusion 2022, 83, 93–95. [Google Scholar] [CrossRef]
Nguyen, D.C.; Pham, Q.-V.; Pathirana, P.N.; Ding, M.; Seneviratne, A.; Lin, Z.; Dobre, O.; Hwang, W.-J. Federated learning for smart healthcare: A survey. ACM Comput. Surv. (CSUR) 2022, 55, 1–37. [Google Scholar] [CrossRef]
Azaria, A.; Ekblaw, A.; Vieira, T.; Lippman, A. Medrec: Using blockchain for medical data access and permission management. In Proceedings of the 2016 2nd International Conference on Open and Big Data (OBD), Vienna, Austria, 22–24 August 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 25–30. [Google Scholar]
Wood, G. Ethereum: A secure decentralised generalised transaction ledger. Ethereum Proj. Yellow Pap. 2014, 151, 1–32. [Google Scholar]
Fan, K.; Wang, S.; Ren, Y.; Li, H.; Yang, Y. Medblock: Efficient and secure medical data sharing via blockchain. J. Med. Syst. 2018, 42, 136. [Google Scholar] [CrossRef]
Rouhani, S.; Butterworth, L.; Simmons, A.D.; Humphery, D.G.; Deters, R. Medichain^TM: A secure decentralized medical data asset management system. In Proceedings of the 2018 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), Halifax, NS, Canada, 30 July–3 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1533–1538. [Google Scholar]
Azbeg, K.; Ouchetto, O.; Andaloussi, S.J. Blockmedcare: A healthcare system based on iot, blockchain and ipfs for data management security. Egypt. Inform. J. 2022, 23, 329–343. [Google Scholar] [CrossRef]
Jayabalan, J.; Jeyanthi, N. Scalable blockchain model using off-chain ipfs storage for healthcare data security and privacy. J. Parallel Distrib. Comput. 2022, 164, 152–167. [Google Scholar] [CrossRef]
Shen, B.; Guo, J.; Yang, Y. Medchain: Efficient healthcare data sharing via blockchain. Appl. Sci. 2019, 9, 1207. [Google Scholar] [CrossRef]
Zaghloul, E.; Li, T.; Ren, J. d-emr: Secure and distributed electronic medical record management. High-Confid. Comput. 2023, 3, 100101. [Google Scholar] [CrossRef]
Zou, R.; Lv, X.; Zhao, J. Spchain: Blockchain-based medical data sharing and privacy-preserving ehealth system. Inf. Process. Manag. 2021, 58, 102604. [Google Scholar] [CrossRef]
Wang, B.; Li, Z. Healthchain: A privacy protection system for medical data based on blockchain. Future Internet 2021, 13, 247. [Google Scholar] [CrossRef]
Wang, M.; Guo, Y.; Zhang, C.; Wang, C.; Huang, H.; Jia, X. Medshare: A privacy-preserving medical data sharing system by using blockchain. IEEE Trans. Serv. Comput. 2021, 16, 438–451. [Google Scholar] [CrossRef]
Xu, G.; Qi, C.; Dong, W.; Gong, L.; Liu, S.; Chen, S.; Liu, J.; Zheng, X. A privacy-preserving medical data sharing scheme based on blockchain. IEEE J. Biomed. Health Inform. 2022, 27, 698–709. [Google Scholar] [CrossRef]
Azbeg, K.; Ouchetto, O.; Andaloussi, S.J. Access control and privacy-preserving blockchain-based system for diseases management. IEEE Trans. Comput. Soc. Syst. 2022, 10, 1515–1527. [Google Scholar] [CrossRef]
Mantey, E.A.; Zhou, C.; Anajemba, J.H.; Hamid, Y.; Kingsley, J. Blockchain-enabled technique for privacy-preserved medical recommender system. IEEE Access 2023, 11, 40944–40953. [Google Scholar] [CrossRef]
Sharma, P.; Namasudra, S.; Chilamkurti, N.; Kim, B.-G.; Gonzalez Crespo, R. Blockchain-based privacy preservation for iot-enabled healthcare system. ACM Trans. Sens. Netw. 2023, 19, 1–17. [Google Scholar] [CrossRef]
Hardin, T.; Kotz, D. Amanuensis: Information provenance for health-data systems. Inf. Process. Manag. 2021, 58, 102460. [Google Scholar] [CrossRef]
Nanda, S.K.; Panda, S.K.; Dash, M. Medical supply chain integrated with blockchain and iot to track the logistics of medical products. Multimed. Tools Appl. 2023, 82, 32917–32939. [Google Scholar] [CrossRef]
Biswas, K.; Muthukkumarasamy, V.; Bai, G.; Chowdhury, M.J.M. A reliable vaccine tracking and monitoring system for health clinics using blockchain. Sci. Rep. 2023, 13, 570. [Google Scholar] [CrossRef] [PubMed]
Zhu, P.; Hu, J.; Zhang, Y.; Li, X. A blockchain based solution for medication anti-counterfeiting and traceability. IEEE Access 2020, 8, 184256–184272. [Google Scholar] [CrossRef]
Abdallah, S.; Nizamuddin, N. Blockchain-based solution for pharma supply chain industry. Comput. Ind. Eng. 2023, 177, 108997. [Google Scholar] [CrossRef]
Liu, X.; Barenji, A.V.; Li, Z.; Montreuil, B.; Huang, G.Q. Blockchain-based smart tracking and tracing platform for drug supply chain. Comput. Ind. Eng. 2021, 161, 107669. [Google Scholar] [CrossRef]
Mao, X.; Li, X.; Guo, S. A blockchain architecture design that takes into account privacy protection and regulation. In Proceedings of the Web Information Systems and Applications: 18th International Conference, WISA 2021, Kaifeng, China, 24–26 September 2021; Springer: Berlin/Heidelberg, Germany, 2021. Proceedings 18. pp. 311–319. [Google Scholar]

Figure 1. Medical data management process.

Figure 2. Blockchain network structure diagram.

Figure 3. Example of medical data provenance.

Figure 4. Linkage between medical data and operational behaviors data.

Figure 5. Forward tracking adjacency list.

Figure 6. Backward tracing adjacency list.

Figure 7. Deployment of organizations.

Figure 8. Throughput changes with the number of organizations.

Table 1. Data structure of medical data.

{
    TxnType: String[data], // String [data] Types of content that can be put on
          the blockchain, such as data or events
    PartyId: String, // Participant ID information that occurs on the chain,
          usually a serial number of a hospital.
    DataSign: String, // The signature of the data.
    DataStage: String, Business-level data stage information. Generally refers
          to raw data or data in the
          intermediate processing stage.
    DataType: String, // The data type at the business level, generally refers to
          the data storage format (file / *db / from API, etc.).
    OpSign: String, // The signature information of the operation data itself on
          the blockchain, calculated by the business (the unique identifier of
          operational behavior).
    ……
}

Table 2. Data structure of operational behaviors data.

{
    TxnType: String[data], // String [data] Types of content that can be put
          on the blockchain, such as medical data or operational behaviors data.
    PartyId: String, // Participant ID information that occurs on the chain,
          usually a serial number of a hospital.
    OpSign: String, // The signature information of the operational behaviors
          data itself on the blockchain, calculated by the business (the unique
          identifier of operational behaviors data).
    EventInputs: JsonArray, // The signature information list of input data
          involved in this operational behavior. such as

[{DataSign}_{1}, {DataSign}_{2}, \dots, {DataSign}_{n}]

    EventOutputs: JsonArray, // The signature information list of output
          data involved in this operation behavior. such as

[{DataSign}_{a}, {DataSign}_{b}, \dots, {DataSign}_{x}]

……
}

Table 3. Mapping from indexes to vertexes.

Key	Value
0	(A, v1, t1)
1	(B, v1, t2)
2	(C, v1, t3)
3	(D, v2, t4)
4	(E, v2, t5)
5	(F, v1, t6)
6	(G, v2, t7)
7	(H, v2, t8)
8	(I, v3, t9)
9	(J, v3, t10)
10	(K, v3, t11)
11	(L, v4, t12)

Table 4. Mapping from vertexes to indexes.

Key	Value
A	(0, v1, t1)
B	(1, v1, t2)
C	(2, v1, t3)
D	(3, v2, t4)
E	(4, v2, t5)
F	(5, v1, t6)
G	(6, v2, t7)
H	(7, v2, t8)
I	(8, v3, t9)
J	(9, v3, t10)
K	(10, v3, t11)
L	(11, v4, t12)

Table 5. Detailed experimental data.

Data	OB	Path 1	Path 2	Path 3	Path 4	Path 5	Path 6	Path ≥ 7
200,000	76,853	49,200	38,956	41,644	29,348	20,652	10,653	9547
400,000	161,284	98,268	80,522	78,712	61,728	40,944	19,800	19,996
600,000	239,457	148,002	121,098	119,268	93,792	60,036	28,950	28,854
800,000	332,345	200,088	161,904	151,784	121,848	82,672	43,232	38,472
1,000,000	396,543	242,341	212,303	199,734	142,321	101,415	53,940	47,946

Where OB is operational behaviors.

Table 6. Experimental results of data provenance.

Method	Succ	Time Cost (s)	Time per Request (ms)	Throughput
naive-forward	1000	1837.85	1837.85	0.54
naive-backward	100,000	258.02	2.58	387.56
DAG-forward	100,000	9.80	0.098	10,204.08
DAG-backward	100,000	9.72	0.097	10,288.06

Table 7. Experimental results of data tracing with different sizes and path lengths.

Size	Method	Path 1	Path 2	Path 3	Path 4	Path 5	Path 6	Path ≥ 7
200,000	naive-backward	537.20	508.93	443.14	312.97	223.08	124.46	50.20
	DAG-forward	21,099.91	20,076.28	19,534.87	16,343.04	10,042.84	41,232.02	1066.23
	DAG-backward	21,767.52	20,593.08	19,829.466	16,116.035	9953.22	4008.06	1089.26
400,000	naive-backward	531.39	498.12	437.53	305.24	204.74	114.52	45.02
	DAG-forward	21,238.64	19,998.81	19,589.63	15,801.44	9001.27	3766.84	940.39
	DAG-backward	21,612.27	20,165.35	19,634.55	15,822.13	9093.49	3733.99	947.36
600,000	naive-backward	528.04	495.95	430.63	298.58	217.45	108.64	47.17
	DAG-forward	19,896.56	19,086.33	18,996.92	15,323.47	8705.18	3518.66	890.32
	DAG-backward	20,925.02	19,696.81	18,889.35	15,082.45	8654.32	3543.23	886.29
800,000	naive-backward	520.78	492.86	426.29	294.68	210.71	106.74	46.12
	DAG-forward	20,213.68	19,754.32	18,765.41	15,487.99	89,612.38	3657.28	904.36
	DAG-backward	20,348.64	19,801.40	18,800.51	15,501.67	9045.37	3701.33	919.26
1,000,000	naive-backward	510.49	490.21	428.46	296.65	200.87	103.08	45.32
	DAG-forward	20,815.98	19,563.42	18,654.32	15,032.18	8643.24	3523.60	865.47
	DAG-backward	19,981.06	19,032.77	18,543.39	15,163.24	8732.18	3543.61	870.96

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mao, X.; Li, C.; Zhang, Y.; Zhang, G.; Xing, C. Efficient and Secure Management of Medical Data Sharing Based on Blockchain Technology. Appl. Sci. 2024, 14, 6816. https://doi.org/10.3390/app14156816

AMA Style

Mao X, Li C, Zhang Y, Zhang G, Xing C. Efficient and Secure Management of Medical Data Sharing Based on Blockchain Technology. Applied Sciences. 2024; 14(15):6816. https://doi.org/10.3390/app14156816

Chicago/Turabian Style

Mao, Xiangke, Chao Li, Yong Zhang, Guigang Zhang, and Chunxiao Xing. 2024. "Efficient and Secure Management of Medical Data Sharing Based on Blockchain Technology" Applied Sciences 14, no. 15: 6816. https://doi.org/10.3390/app14156816

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Efficient and Secure Management of Medical Data Sharing Based on Blockchain Technology

Abstract

1. Introduction

2. Medical Data Management Platform

2.1. Blockchain and Medical Data Storage

2.2. Blockchain and Medical Data Privacy

2.3. Blockchain and Medical Data Provenance

3. Medical Data Management Platform

4. Medical Data Provenance Methods

4.1. Definition and Examples of Medical Data Provenance

4.2. Data Structure of Medical Data and Operational Behaviors

4.3. Naive Provenance Method

4.4. DAG-Based Algorithm

5. Experiments

5.1. Software and Hardware Environment

5.2. Experimental Data

5.3. Experimental Results and Analysis

5.3.1. Data Upload and Query Experiment

5.3.2. Data Provenance Experiments

5.4. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI