Next Article in Journal
Physical Layer Security Design for Polar Code Construction
Next Article in Special Issue
Light Weight Authentication Scheme for Smart Home IoT Devices
Previous Article in Journal / Special Issue
A Batch Processing Technique for Wearable Health Crowd-Sensing in the Internet of Things
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

BFV-Based Homomorphic Encryption for Privacy-Preserving CNN Models

1
Department of Electrical Engineering and Computer Science, University of Stavanger, 4021 Rogaland, Norway
2
Department of Electrical and Computer Engineering, Virginia Commonwealth University, Richmond, VA 23284, USA
3
Batten College of Engineering and Technology, Old Dominion University, Norfolk, VA 23529, USA
*
Author to whom correspondence should be addressed.
Cryptography 2022, 6(3), 34; https://doi.org/10.3390/cryptography6030034
Submission received: 13 May 2022 / Revised: 24 June 2022 / Accepted: 29 June 2022 / Published: 1 July 2022
(This article belongs to the Special Issue Privacy-Preserving Techniques in Cloud/Fog and Internet of Things)

Abstract

:
Medical data is frequently quite sensitive in terms of data privacy and security. Federated learning has been used to increase the privacy and security of medical data, which is a sort of machine learning technique. The training data is disseminated across numerous machines in federated learning, and the learning process is collaborative. There are numerous privacy attacks on deep learning (DL) models that attackers can use to obtain sensitive information. As a result, the DL model should be safeguarded from adversarial attacks, particularly in medical data applications. Homomorphic encryption-based model security from the adversarial collaborator is one of the answers to this challenge. Using homomorphic encryption, this research presents a privacy-preserving federated learning system for medical data. The proposed technique employs a secure multi-party computation protocol to safeguard the deep learning model from adversaries. The proposed approach is tested in terms of model performance using a real-world medical dataset in this paper.

1. Introduction

Machine learning (ML) is a widely used technique in almost all fields, where a computer system can learn from data to improve its performance. This technique is widely used in many application areas such as image recognition, natural language processing, and machine translation. Federated learning is a machine learning technique where the training data is distributed across multiple machines, and the learning process is performed in a collaborative manner [1]. This technique can be used to improve the privacy and security of medical data [2].
Medical data is highly sensitive and is often subject to data privacy and security concerns [3]. For example, a person’s health information is often confidential and can be used to identify the person. Thus, it is essential to protect the privacy and security of medical data. The Health Insurance Portability and Accountability Act (HIPAA) (US Department of Health and Human Services, 2014) and General Data Protection Regulation (GDPR) (The European Union, 2018) strictly mandate personal health information privacy. There are various methods to safeguard the private information. Federated learning is one of the techniques that can be utilized for the protection of sensitive data during multi-party computation tasks. This technique can be used to improve the privacy and security of medical data by preventing the data from being centralized and vulnerable.
Keeping the data local is not sufficient for the security of the data and the ML model. However, there are several privacy attacks on deep learning models to get the private data [4,5]. For example, the attackers can use the gradient information of the deep learning model to get the sensitive information. Thus, the deep learning model itself should be protected from the adversaries as well. One of the solutions for this problem is homomorphic encryption-based model protection from the adversary collaborator. Homomorphic encryption is a technique where the data can be encrypted, and the operations can be performed on the encrypted data [6]. This technique can be used to protect the deep learning model from the adversaries.
This paper proposes a privacy-preserving federated learning algorithm based on a convolutional neural network (CNN) for medical data using homomorphic encryption. The proposed algorithm uses a secure multi-party computation protocol to protect the deep learning model from the adversaries. We evaluate the proposed algorithm using a real-world medical dataset and show that the proposed algorithm can protect the deep learning model from the adversaries. We limited our work to binary classification. In our subsequent work, we plan to use multi-class approaches in the literature [7,8].
The main contributions of this paper are as follows:
  • A recent European Data Protection Board (EDPB) Public Consultation stated the use of Secure Multi-Party Computation as an additional measure to the General Data Protection Regulation’s (GDPR) Article 46 transfer tools. Here, we provide a method to implement practically secure multi-party computation in federated learning to improve the privacy and security of medical data (https://edpb.europa.eu/sites/default/files/webform/public_consultation_reply/inpher-_edpb_supplementary_measures_comment.pdf (27 June 2022)).
  • A homomorphic encryption-based federated learning algorithm is proposed to protect the confidentiality of the sensitive medical data.
  • A secure multi-party computation protocol is proposed to protect the deep learning models from the adversaries.
  • A real-world medical dataset is used to evaluate the proposed algorithm. The experimental results show that the proposed algorithm can protect the deep learning model from the adversaries.
The rest of the paper is organized as follows: Section 2 describes related work, while Section 3 describes the preliminaries, including homomorphic encryption and federated learning. Section 4 provides detailed information of the proposed system model. Section 5 and Section 6 discuss the results, while Section 7 concludes the paper.

2. Related Work

Data-driven ML models provide unprecedented opportunities for healthcare with the use of sensitive health data. These models are trained locally to protect the sensitive health data. However, it is difficult to build robust models without diverse and large datasets utilizing the full spectrum of health concerns. Prior proposed works to overcome this problem include federated learning techniques. For instance, the studies [9,10,11] reviewed the current applications and technical considerations of the federated learning technique to preserve the sensitive biomedical data. The impact of federated learning is examined through the stakeholders, such as patients, clinicians, healthcare facilities, and manufacturers. In another study, the authors in [12] utilized federated learning systems for brain tumor segmentation on the BraTS dataset, which consists of magnetic resonance imaging brain scans. The results show that performance is decreased by privacy protection costs. The same BraTS dataset is used in [13] to compare three collaborative training techniques, i.e., federated learning, institutional incremental learning (IIL), and cyclic institutional learning (CIIL). In IIL and CIIL, institutions train a shared model successively, where CIIL adds a cycling loop through organizations. The results indicate that federated learning achieves similar Dice scores to that of models trained by sharing data. It outperforms the IIL and CIIL methods since these methods suffer from catastrophic forgetting and complexity.
Medical data is also safeguarded by encryption techniques such as homomorphic encryption. In [14], authors propose an online secure multi-party computation sharing patient information to hospitals using homomorphic encryption. Bocu et al. [15] proposed a homomorphic encryption model that is integrated with a personal health information system utilizing heart rate data. The results indicate that the described technique successfully addressed the requirements for secure data processing for the 500 patients with expected storage and network challenges. In another study by Wang et al. [16], a data division scheme based on homomorphic encryption for wireless sensor networks was proposed. The results show that there is a trade-off between resources and data security. In [17], the applicability of homomorphic encryption is shown by measuring the vitals of the patients with a lightweight encryption scheme. Sensor data such as respiration and heart rate are encrypted using homomorphic encryption before transmitting to the non-trusting third party, while encryption takes place only in a medical facility. The study in [18] developed an IoT-based architecture with homomorphic encryption to combat data loss and spoofing attacks for chronic disease monitoring. The results suggest that homomorphic encryption provides cost-effective and straightforward protection of the sensitive health information. Blockchain technologies are also utilized in cooperation with homomorphic encryption for the security of medical data. Authors in [19] proposed a practical pandemic infection tracking tool using homomorphic encryption and blockchain technologies in intelligent transportation systems using automatic healthcare monitoring. In another study, Ali et al. [20] developed a search-able distributed medical database on a blockchain using homomorphic encryption. The increase need to secure sensitive information leads to the use of various techniques together. In the scope of this study, a multi-party computation tool using federated learning with homomorphic encryption is developed and analyzed.

3. Preliminaries

3.1. Homomorphic Encryption

The definition of the homomorphic encryption (HE) scheme is given in [21] as follows:
Definition 1 (Homomorphic Encryption).
A family of schemes { E k } k Z + is said to be homomorphic with respect to an operator if there exist decryption algorithms { D k } k Z + such that for any two ciphertexts c 1 , c 2 C , the following equality is satisfied:
D k ( E k ( m 1 , r 1 ) E k ( m 2 , r 2 ) ) = m 1 m 2 , m 1 , m 2 M ,
where r 1 , r 2 R are the corresponding randomness.
A homomorphic encryption scheme is a pair of algorithms, Enc and Dec , with the following properties:
  • Enc takes as input a plaintext m Z N , and outputs a ciphertext c such that c is a homomorphic image of m, i.e., Dec ( c ) = m ;
  • Dec takes as input a ciphertext c, and outputs a plaintext m such that m is a homomorphic image of c;
  • Enc and Dec are computationally efficient.
There are two types of homomorphic encryption: additively homomorphic and multiplicatively homomorphic.
Additively homomorphic encryptionconsists of a pair of algorithms Enc and Dec such that, for all m 1 , m 2 Z N , c 1 = Enc ( m 1 ) , c 2 = Enc ( m 2 ) , and c 3 = c 1 + c 2 , we have Dec ( c 3 ) = m 1 + m 2 .
Multiplicatively homomorphic encryption consists of a pair of algorithms Enc and Dec such that, for all m 1 , m 2 Z N , c 1 = Enc ( m 1 ) , c 2 = Enc ( m 2 ) , and c 3 = c 1 c 2 , we have Dec ( c 3 ) = m 1 m 2 .
Partially homomorphic encryption is a variant of homomorphic encryption where homomorphism is only partially supported, i.e., the encryption scheme is homomorphic for some operations while not homomorphic for others.
Somewhat homomorphic encryption is a variant of fully homomorphic encryption where homomorphism is only limited supported, i.e., the encryption scheme is homomorphic for all operations for a limited number of operations.
Fully homomorphic encryption (FHE) is a variant of homomorphic encryption which allows for homomorphism over all functions, i.e., the encryption scheme is homomorphic for all operations. In other words, an FHE scheme consists of a pair of algorithms Enc and Dec such that, for all m 1 , m 2 Z N , c 1 = Enc ( m 1 ) , c 2 = Enc ( m 2 ) , and c 3 = c 1 c 2 , we have Dec ( c 3 ) = m 1 m 2 .
Table 1 shows a summary of the major homomorphic encryption schemes.

3.2. Brakerski–Fan–Vercauteren (BFV) Scheme

Since the work of Brakerski, Fan, and Vercauteren (BFV), the somewhat homomorphic encryption (SHE) scheme has become one of the most important research topics in cryptography. In this section, we give the definition of this scheme.
Definition 2 (BFV scheme).  An SHE scheme E is said to be in the BFV family of schemes if it consists of the following three algorithms:
  • Key generation algorithm: It takes the security parameter k as input, and outputs a public key p k and a secret key s k .
  • Encryption algorithm: It takes the message m M , a public key p k , and a randomness r R as inputs, and outputs a ciphertext c C .
  • Decryption algorithm: It takes a ciphertext c C , a secret key s k , and an integer i Z + as inputs, and outputs a message m M .
Remark 1.
In the above definition, the integer i is called the decryption index. It is introduced to allow for efficient decryption of ciphertexts that are the result of homomorphic operations. For example, when the ciphertext c 1 is the result of homomorphic operations on ciphertexts c 2 and c 3 , that is, c 1 = c 2 c 3 , then c 1 can be decrypted by taking the decryption index i = 2 .
In the following, we give a brief description of the BFV scheme.
The key generation algorithm of the BFV scheme consists of the following two steps.
1.
Let t be the security parameter. For a positive integer t, define a number n = b ( t ) and a positive integer p where b : Z + Z + is a polynomial, and p is a prime number satisfying p > 2 n .
2.
Let d be a positive integer such that d < p . Choose a monic polynomial f ( x ) of degree d with f ( x ) x a ˜ ( mod p ) for some a ˜ Z p . Let T ( x ) = x n f ( x ) ( mod p ) . Choose a quadratic nonresidue b of Z p , and let L ( x ) = T ( x ) b x n 2 ( mod p ) .
Let q = 2 n L ( 0 ) . The secret key s k is chosen to be a nonnegative integer s less than q. The public key p k is chosen to be the sequence ( p , T ( x ) , L ( x ) , n , q ) .
The encryption algorithm of the BFV scheme consists of the following three steps.
1.
Let p k = ( p , T ( x ) , L ( x ) , n , q ) be the public key. Choose a random polynomial R ( x ) Z p [ x ] of degree less than d.
2.
Given a message m Z p , compute u ( x ) = m + 1 2 R ( x ) 2 L ( x ) 1 ( mod p ) .
3.
Choose a random integer t ˜ Z p , and output the ciphertext c = ( t ˜ , u ( x ) ) .
The decryption algorithm of the BFV scheme consists of the following two steps.
1.
Let s k = s be the secret key. Compute v ( x ) = L ( x ) 1 ( s + 1 2 T ( x ) 2 b 1 x n ) ( mod p ) .
2.
Given a ciphertext c = ( t ˜ , u ( x ) ) , compute m = u ( x ) 1 2 v ( x ) 2 ( mod p ) .
Remark 2.
In the BFV scheme, the message space is M = Z p .

3.2.1. Homomorphic Operations

Additive Homomorphism

In the BFV scheme, the additive homomorphism is defined as follows:
Definition 3 (Additive homomorphism). 
Let c 1 = ( t ˜ 1 , u 1 ( x ) ) and c 2 = ( t ˜ 2 , u 2 ( x ) ) be two ciphertexts. The additive homomorphism is defined to be the ciphertext c 1 + c 2 = ( t ˜ 1 + t ˜ 2 , u 1 ( x ) + u 2 ( x ) ) .
Remark 3.
In the BFV scheme, the standard polynomial addition algorithm implements the additive homomorphism.

Multiplicative Homomorphism

In the BFV scheme, the multiplicative homomorphism is defined as follows:
Definition 4 (Multiplicative homomorphism). 
Let c 1 = ( t ˜ 1 , u 1 ( x ) ) be a ciphertext and m Z p be a message. The multiplicative homomorphism is defined to be the ciphertext c 1 · m = ( t ˜ 1 · m , u 1 ( x ) · m ) .
Remark 4.
In the BFV scheme, the standard polynomial multiplication algorithm implements the multiplicative homomorphism.
Remark 5.
The multiplicative homomorphism is sometimes called the “plaintext multiplication” or the “scalar multiplication”.

3.2.2. Relinearization

Relinearization is a homomorphic operation used in the BFV scheme to reduce the number of ciphertexts generated by homomorphic operations. In the following, we give the definition of this operation.
Definition 5 (Relinearization). 
Let c 1 = ( t ˜ 1 , u 1 ( x ) ) and c 2 = ( t ˜ 2 , u 2 ( x ) ) be two ciphertexts. The relinearization homomorphism is defined to be the ciphertext c 1 + c 2 T ( x ) = ( t ˜ 1 + t ˜ 2 T ( x ) , u 1 ( x ) + u 2 ( x ) T ( x ) ) .
Remark 6.
In the BFV scheme, the relinearization homomorphism is implemented by the standard polynomial addition and multiplication algorithms.

3.2.3. Rotation

Rotation is a homomorphic operation used in the BFV scheme to implement the power operation efficiently. It can be used to implement a large class of homomorphic operations on encrypted data. In the following, we give the definition of this operation.
Definition 6 (Rotation). 
Let c = ( t ˜ , u ( x ) ) be a ciphertext. The rotation homomorphism is defined to be the ciphertext c r = ( t ˜ , u ( x ) r ) , where r is an integer.
Remark 7.
In the BFV scheme, the rotation homomorphism is implemented by the standard polynomial multiplication algorithm.
Remark 8.
The rotation is sometimes called the “power operation”.

3.3. Federated Learning

In this section, we briefly describe the federated learning (FL) framework. We refer to [22,23] for more details.
Definition 7(FL model).Let N be a positive integer, and X be a probability space. Let m be a positive integer such that m < N , and P = { p 1 , p 2 , , p m } be a collection of random variables on X with p i L 1 ( X ) for i = 1 , 2 , , m . The FL model consists of the following four algorithms:
  • Initialization algorithm: It takes the security parameter k as input, and outputs the global model w 0 R n , where n is the number of free parameters in w 0 .
  • Local training algorithm: It takes the global model w t R n , a local dataset D i D , and a positive integer t as inputs, and outputs a local model w t + 1 i R n .
  • Upload algorithm: It takes the local model w t i R n , and a positive integer t as inputs, and outputs a vector v t i R n .
  • Aggregation algorithm: It takes a set of vectors v t i R n , and a positive integer t as inputs, and outputs the global model w t + 1 R n .
In the above definition, the integer t is called the training round. The global model w t is a function of the training round t. The global model w t is trained by the local models w t i , which are trained on the local datasets D i . The global model w t is trained on the aggregated dataset i = 1 m D i . The global model w t is initialized to be the global model w 0 .
Remark 9.
In the FL model, the local training algorithm, upload algorithm, and aggregation algorithm can be implemented by any machine learning algorithm.
Remark 10.
The global model w t can be trained on the aggregated dataset i = 1 m D i using any machine learning algorithm.
Remark 11.
In the FL model, the global model w t is shared among all the participating clients, and the local models w t i are not shared among the clients.

4. System Model

This section gives a high-level system overview of the proposed BFV crypto-scheme-based privacy-preserving federated learning COVID-19 detection training method. The proposed privacy-preserving scheme is a two-phase approach: (1) local model training at each client and (2) encrypted model weight aggregation at the server. In the local model training phase, each client builds their local CNN-based DL model using their local electronic health record dataset. The clients encrypt the model weights matrix using the public key. In the second step, the server aggregates all clients’ encrypted weight matrices and sends the final matrix to the clients. Each client decrypts the aggregated encrypted weight matrix to update the model weights of their DL model. Figure 1 shows the system overview.
Figure 2 shows the CNN-based COVID-19 detection model used in the experiments.

4.1. Notations

  • Boldface lowercase letters show the vectors (e.g., x );
  • W shows the ciphertext of a matrix W;
  • ⊕ shows the homomorphic encryption-based addition, ⊗ homomorphic encryption-based multiplication;
  • ( k e y p u b , k e y p r i v ) shows public/private key pairs.

4.2. Client Initialization

Algorithm 1 shows the overall process in the initialization phase. Each client trains the local classifier, h i , with their private dataset D i . The trained model’s weight matrix, W, is encrypted, W , and shared with the server
Algorithm 1 Model training in each client
Require: 
The dataset at client c: D c = { ( x , y ) | x R m , y R } i = 0 m , public key: K e y p u b
1:
X t r a i n , X t e s t , y t r a i n , y t e s t t r a i n _ t e s t _ s p l i t ( D )
2:
h g l o b a l _ m o d e l
3:
h . f i t ( X t r a i n , y t r a i n )
4:
W // Create an empty matrix for the encrypted layer weights
5:
for each  l a y e r h   do
6:
     W e n c r y p t _ f r a c t i o n a l ( l a y e r . w e i g h t s , k e y p u b ) // Encrypt the layer weights ( l a y e r . w e i g h t s R m ) with public key.
7:
end for
8:
Return W // The encrypted weight matrix

4.3. Model Aggregation

The server collects all encrypted weight matrices, { W 0 , , W c } , from the clients. It calculates the average weight value of each neuron in the encrypted domain. Algorithm 2 shows the overall process in the aggregation phase.
Algorithm 2 Model aggregation at the server
Require: 
public key: K e y p u b , the number of clients: c, client model weights: H = { W i } i = 0 c
1:
W a g g r
2:
for each  h H   do
3:
    for each  r o w h  do
4:
         W a g g r W a g g r r o w // Homomorphic addition
5:
    end for
6:
end for
7:
for each r o w W a g g r do
8:
     r o w r o w c 1 // Homomorphic multiplication.
9:
end for
10:
Return W a g g r // Return the aggregated weight matrix in the encrypted domain

4.4. Client Decryption

The last step is client decryption in which each client decrypts the aggregated and encrypted weight matrix, W a g g r , and updates their local model, h. Algorithm 3 shows the overall process in the client decryption phase.
Algorithm 3 Client decryption
Require: 
private key: K e y p r i v , encrypted aggregated weights: W a g g r
1:
h g l o b a l _ m o d e l
2:
for each  l a y e r h do
3:
     r o w W a g g r ( l a y e r ) // Get the corresponding row for layer
4:
     l a y e r d e c r y p t _ f r a c t i o n a l ( r o w , k e y p r i v ) // Decrypt the row and update the layer weights
5:
end for
6:
h . s a v e _ m o d e l ( g l o b a l _ m o d e l ) // Save the aggregated model as global_model at client.

5. Results

5.1. Dataset

In this work, a COVID-19 radiography dataset collected in previous works related to a COVID-19 detection model [24,25] was used. The dataset contains X-ray lung images with four different classifications, which are COVID, Lung_OPACITY, Normal, and Viral Pneumonia. In this work, we utilized two classifications, which are COVID and Normal, focusing only on the COVID-19 detection machine learning process.
Samples of the dataset are depicted in Figure 3 and Figure 4.
From the original dataset, we obtained the first 1000 records from each classification, with 80% of the sample used for the training set and the remaining 20% as the test set. The training dataset was further split with 10% of the dataset as the train-validation dataset.
We obtained 1000 records for each classification with 800 records used as the training dataset and 200 records used as the test dataset.

5.2. Preprocessing

Data preprocessing performed in the work consists of data augmentation and rescaling for the training dataset, while only data rescaling was applied for the test dataset. Data augmentation was necessary in order to provide data variety in the training dataset. The dataset was rescaled by multiplying the pixel value with 1 / 255 . This was aimed to transform the pixel value range from [0, 255] to [0, 1] so that pixels were treated in the same way.

6. Implementation

6.1. Experimental Setup

Table 2 shows the CNN model used to predict COVID-19 detection.
The implementation was developed with Python 3.8.8, using existing libraries. There were standard libraries used, such as Keras and Tensorflow, which were used in the machine learning processes; Numpy, which was used to process weight arrays and data structures; pickle, to serialize exported weights; and most importantly, Pyfhel, which is used for homomorphic encryption. Pyfhel [26] is basically a Python wrapper for Microsoft SEAL, which provides the same functionalities as the Microsoft Simple Encrypted Arithmetic Library (SEAL).
Microsoft SEAL is a homomorphic encryption library developed by Microsoft, which was released in 2015. The SEAL library implements both Brakerski–Fan–Vercauteren (BFV) [27,28] and Cheon–Kim–Kim–Song (CKKS) [29] homomorphic encryption schemes and provides standard SHE functions starting from encoding, key generation, encryption, decryption, additive, multiplicative, and relinearization functions.
In the Pyfhel library implementation, we implemented the BFV scheme and used pre-defined default values in the HE context parameters, but with exception to parameter s e c . Parameter s e c is used to determine the bit-wise security level provided. At the time of writing, there are two possible values—128 and 192. We experimented with these values and observed the overall model performance in terms of time and accuracy. The parameter s e c determines the length of coefficient modulus ( q ) based on polynomial degree ( n ) parameter, as described in the below Table 3 [30].
We have implemented our proposed protocols and the classifier training phase in Python by using the Keras/Tensorflow libraries for the model building and the Microsoft SEAL library for the somewhat homomorphic encryption implementation. To show the training phase time performance of the proposed protocols, we tested the COVID-19 X-ray scans public dataset with different numbers of clients and the ciphertext modulus, q = { 128 , 192 } , which determines how much noise can accumulate before decryption fails. Table 4 shows the dataset details.
The dataset is arbitrarily partitioned among each client ( c { 2 , 3 , 5 , 7 } ), and then the prediction performance results in the encrypted-domain are compared with the results of the plain-domain.

6.2. Experimental Results

We first experimented with the COVID-19 detection model with no encryption and federated learning. In this case, there is only one client observed. Table 5 shows the model performance scores, and the running time was 599.169577s.
We then applied federated learning in the model and observed the model performance based on evaluation matrices. Table 6 shows the performance results of federated learning without encryption.
We then continued our observation by adjusting the hyperparameter secbetween values 128 and 192.
Table 7 shows the performance measurements of federated learning with encryption level is 128.
Table 8 shows the performance measurements of federated learning with encryption level is 192.
Apart from measuring the model performance with evaluation matrices, we also measured total processing time, starting from model training at clients until the end of the federated learning process, and prediction result using the aggregated model. Table 9 shows the processing time in federated learning with various numbers of clients.
Below are histograms by evaluation matrices, accuracy, precision, recall, F1 score to visualize influence of encryption and encryption key length to model performance. Figure 5 the prediction performance score with different metrics.
From the histograms above, we see that homomorphic encryption and its key length variations do not have much influence to model performance.
We also used histogram to visualize running time growth against homomorphic encryption and its key length variations. From Figure 6, it shows that running time increased quite significantly with homomorphic encryption implemented. It also shows that running time was also influenced by encryption length—the longer the key, the longer the execution time.
One last thing we observed was that the pickle file size of encrypted weights was around 7 GB regardless of execution time.

7. Conclusions

Privacy-preserving has become an essential practice of healthcare institutions as both the EU, and we, mandate it. Federated learning and homomorphic encryption will play a critical role in maintaining data security and model training. By benefiting from both techniques, the proposed model achieves competitive performance while there is a significant trade-off for the execution time and the number of clients. In some cases, where privacy-preserving is very crucial, for instance, in healthcare fields, the trade-off is very acceptable.
The classification metrics, i.e., accuracy, F1, precision, and recall, reached over 80% using both encrypted and plain data for each federated learning case, which means that homomorphic encryption, in this case, SHE, does not deteriorate model performance.
Privacy attacks will cause immense damage to the security and privacy of patient information. This will hinder the advancement in healthcare using data-driven models. Therefore, it is indispensable to take crucial steps to strengthen the safety of the information, and the way data is processed. This study demonstrated that federated learning with homomorphic encryption could successfully enhance data-driven models by eliminating and minimizing the share of sensitive data. It is envisioned that this study could be helpful for the scientists and researchers working on sensitive healthcare data in multi-party computation settings.

Author Contributions

Conceptualization, F.W. and F.O.C.; methodology, F.W.; software, F.W.; validation, F.W. and F.O.C.; formal analysis, S.S. and M.K.; writing—original draft preparation, F.W. and F.O.C.; writing—review and editing, F.W., F.O.C., S.S. and M.K.; visualization, F.W.; supervision, F.O.C.; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used in this simulation is publicly avaliable COVID-19 dataset https://github.com/ieee8023/covid-chestxray-dataset accessd on 1 April 2022.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Kairouz, P.; McMahan, H.B.; Avent, B.; Bellet, A.; Bennis, M.; Bhagoji, A.N.; Bonawitz, K.; Charles, Z.; Cormode, G.; Cummings, R.; et al. Advances and open problems in federated learning. Found. Trends Mach. Learn. 2021, 14, 1–210. [Google Scholar] [CrossRef]
  2. Şahinbaş, K.; Ozgur Catak, F. Secure Multi-Party Computation based Privacy Preserving Data Analysis in Healthcare IoT Systems. arXiv 2021, arXiv:2109.14334. [Google Scholar]
  3. Abouelmehdi, K.; Beni-Hssane, A.; Khaloufi, H.; Saadi, M. Big data security and privacy in healthcare: A Review. Procedia Comput. Sci. 2017, 113, 73–80. [Google Scholar] [CrossRef]
  4. Catak, F.O.; Aydin, I.; Elezaj, O.; Yildirim-Yayilgan, S. Practical Implementation of Privacy Preserving Clustering Methods Using a Partially Homomorphic Encryption Algorithm. Electronics 2020, 9, 229. [Google Scholar] [CrossRef] [Green Version]
  5. Özgür Çatak, F.; Mustacoglu, A.F. CPP-ELM: Cryptographically Privacy-Preserving Extreme Learning Machine for Cloud Systems. Int. J. Comput. Intell. Syst. 2018, 11, 33–44. [Google Scholar] [CrossRef] [Green Version]
  6. Alloghani, M.; Alani, M.M.; Al-Jumeily, D.; Baker, T.; Mustafina, J.; Hussain, A.; Aljaaf, A.J. A systematic review on the status and progress of homomorphic encryption technologies. J. Inf. Secur. Appl. 2019, 48, 102362. [Google Scholar] [CrossRef]
  7. Molina-Carballo, A.; Palacios-López, R.; Jerez-Calero, A.; Augustín-Morales, M.C.; Agil, A.; Muñoz-Hoyos, A.; Muñoz-Gallego, A. Protective Effect of Melatonin Administration against SARS-CoV-2 Infection: A Systematic Review. Curr. Issues Mol. Biol. 2022, 44, 31–45. [Google Scholar] [CrossRef]
  8. Checa-Ros, A.; Muñoz-Hoyos, A.; Molina-Carballo, A.; Muñoz-Gallego, A.; Narbona-Galdó, S.; Jerez-Calero, A.; del Carmen Augustín-Morales, M. Analysis of Different Melatonin Secretion Patterns in Children With Sleep Disorders: Melatonin Secretion Patterns in Children. J. Child Neurol. 2017, 32, 1000–1008. [Google Scholar] [CrossRef]
  9. Xu, J.; Glicksberg, B.S.; Su, C.; Walker, P.; Bian, J.; Wang, F. Federated learning for healthcare informatics. J. Healthc. Inform. Res. 2021, 5, 1–19. [Google Scholar] [CrossRef]
  10. Rieke, N.; Hancox, J.; Li, W.; Milletari, F.; Roth, H.R.; Albarqouni, S.; Bakas, S.; Galtier, M.N.; Landman, B.A.; Maier-Hein, K.; et al. The future of digital health with federated learning. NPJ Digit. Med. 2020, 3, 1–7. [Google Scholar] [CrossRef]
  11. Antunes, R.S.; da Costa, C.A.; Küderle, A.; Yari, I.A.; Eskofier, B. Federated Learning for Healthcare: Systematic Review and Architecture Proposal. ACM Trans. Intell. Syst. Technol. (TIST) 2022, 13, 1–23. [Google Scholar] [CrossRef]
  12. Li, W.; Milletarì, F.; Xu, D.; Rieke, N.; Hancox, J.; Zhu, W.; Baust, M.; Cheng, Y.; Ourselin, S.; Cardoso, M.J.; et al. Privacy-preserving federated brain tumour segmentation. In Proceedings of the International Workshop on Machine Learning in Medical Imaging; Springer: Berlin/Heidelberg, Germany, 2019; pp. 133–141. [Google Scholar]
  13. Sheller, M.J.; Reina, G.A.; Edwards, B.; Martin, J.; Bakas, S. Multi-institutional deep learning modeling without sharing patient data: A feasibility study on brain tumor segmentation. In Poceedings of the International MICCAI Brainlesion Workshop; Springer: Berlin/Heidelberg, Germany, 2018; pp. 92–104. [Google Scholar]
  14. Kumar, A.V.; Sujith, M.S.; Sai, K.T.; Rajesh, G.; Yashwanth, D.J.S. Secure Multiparty computation enabled E-Healthcare system with Homomorphic encryption. In Proceedings of the IOP Conference Series: Materials Science and Engineering; IOP Publishing: Bristol, UK, 2020; Volume 981, p. 022079. [Google Scholar]
  15. Bocu, R.; Costache, C. A homomorphic encryption-based system for securely managing personal health metrics data. IBM J. Res. Dev. 2018, 62, 1:1–1:10. [Google Scholar] [CrossRef]
  16. Wang, X.; Zhang, Z. Data division scheme based on homomorphic encryption in WSNs for health care. J. Med. Syst. 2015, 39, 1–7. [Google Scholar] [CrossRef] [PubMed]
  17. Kara, M.; Laouid, A.; Yagoub, M.A.; Euler, R.; Medileh, S.; Hammoudeh, M.; Eleyan, A.; Bounceur, A. A fully homomorphic encryption based on magic number fragmentation and El-Gamal encryption: Smart healthcare use case. Expert Syst. 2022, 39, e12767. [Google Scholar] [CrossRef]
  18. Talpur, M.S.H.; Bhuiyan, M.Z.A.; Wang, G. Shared–node IoT network architecture with ubiquitous homomorphic encryption for healthcare monitoring. Int. J. Embed. Syst. 2015, 7, 43–54. [Google Scholar] [CrossRef]
  19. Tan, H.; Kim, P.; Chung, I. Practical homomorphic authentication in cloud-assisted vanets with blockchain-based healthcare monitoring for pandemic control. Electronics 2020, 9, 1683. [Google Scholar] [CrossRef]
  20. Ali, A.; Pasha, M.F.; Ali, J.; Fang, O.H.; Masud, M.; Jurcut, A.D.; Alzain, M.A. Deep Learning Based Homomorphic Secure Search-Able Encryption for Keyword Search in Blockchain Healthcare System: A Novel Approach to Cryptography. Sensors 2022, 22, 528. [Google Scholar] [CrossRef]
  21. Gentry, C. Fully Homomorphic Encryption Using Ideal Lattices. In Proceedings of the Forty-First Annual ACM Symposium on Theory of Computing, Bethesda, MD, USA, 31 May–2 June 2009; pp. 169–178. [Google Scholar] [CrossRef] [Green Version]
  22. Brendan McMahan, H.; Moore, E.; Ramage, D.; Hampson, S.; Agüera y Arcas, B. Communication-Efficient Learning of Deep Networks from Decentralized Data. arXiv 2016, arXiv:1602.05629. [Google Scholar]
  23. Konečný, J.; Brendan McMahan, H.; Yu, F.X.; Richtárik, P.; Theertha Suresh, A.; Bacon, D. Federated Learning: Strategies for Improving Communication Efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
  24. Chowdhury, M.E.H.; Rahman, T.; Khandakar, A.; Mazhar, R.; Kadir, M.A.; Mahbub, Z.B.; Islam, K.R.; Khan, M.S.; Iqbal, A.; Emadi, N.A.; et al. Can AI Help in Screening Viral and COVID-19 Pneumonia? IEEE Access 2020, 8, 132665–132676. [Google Scholar] [CrossRef]
  25. Rahman, T.; Khandakar, A.; Qiblawey, Y.; Tahir, A.; Kiranyaz, S.; Abul Kashem, S.B.; Islam, M.T.; Al Maadeed, S.; Zughaier, S.M.; Khan, M.S.; et al. Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images. Comput. Biol. Med. 2021, 132, 104319. [Google Scholar] [CrossRef] [PubMed]
  26. Ibarrondo, A.; Viand, A. Pyfhel: Python for homomorphic encryption libraries. In Proceedings of the 9th Workshop on Encrypted Computing & Applied Homomorphic Cryptography, Seoul, Korea, 15 November 2021. [Google Scholar]
  27. Brakerski, Z. Fully Homomorphic Encryption without Modulus Switching from Classical GapSVP. In Advances in Cryptology—CRYPTO 2012; Safavi-Naini, R., Canetti, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 868–886. [Google Scholar]
  28. Fan, J.; Vercauteren, F. Somewhat Practical Fully Homomorphic Encryption. Cryptology ePrint Archive, Report 2012/144. 2012. Available online: https://ia.cr/2012/144 (accessed on 15 March 2022).
  29. Cheon, J.H.; Kim, A.; Kim, M.; Song, Y. Homomorphic Encryption for Arithmetic of Approximate Numbers. In Advances in Cryptology—ASIACRYPT 2017; Takagi, T., Peyrin, T., Eds.; Springer International Publishing: Cham, Switzerland, 2017; pp. 409–437. [Google Scholar]
  30. Laine, K. Simple Encrypted Arithmetic Library 2.3.1. Available online: https://www.microsoft.com/en-us/research/uploads/prod/2017/11/sealmanual-2-3-1.pdf (accessed on 15 March 2022).
Figure 1. Overall system overview of the proposed method.
Figure 1. Overall system overview of the proposed method.
Cryptography 06 00034 g001
Figure 2. CNN-based COVID-19 detection model.
Figure 2. CNN-based COVID-19 detection model.
Cryptography 06 00034 g002
Figure 3. COVID-19 positive X-ray image dataset samples.
Figure 3. COVID-19 positive X-ray image dataset samples.
Cryptography 06 00034 g003
Figure 4. COVID-19 negative X-ray image dataset samples.
Figure 4. COVID-19 negative X-ray image dataset samples.
Cryptography 06 00034 g004
Figure 5. Prediction performance score with different metrics. (a) Precision. (b) Recall. (c) F1. (d) Accuracy.
Figure 5. Prediction performance score with different metrics. (a) Precision. (b) Recall. (c) F1. (d) Accuracy.
Cryptography 06 00034 g005
Figure 6. Running time in seconds.
Figure 6. Running time in seconds.
Cryptography 06 00034 g006
Table 1. Major homomorphic encryption schemes.
Table 1. Major homomorphic encryption schemes.
SchemeKey-SizeAdditive/MultiplicativePartially/Somewhat/Fully
Paillier2048 bitsAdditivePartially
ElGamal1024 bitsAdditivePartially
BFV2048 bitsAdditive, MultiplicativeSomewhat
CKKS2048 bitsMultiplicativeSomewhat
FV2048 bitsMultiplicativeSomewhat
Table 2. Model summary.
Table 2. Model summary.
Layer (Type)OutputShapeNo. of Parameters
conv2d(Conv2D)(None, 254, 254, 32)896
max_pooling2d(MaxPooling2D)(None, 127, 127, 32)0
conv2d_1(Conv2D)(None, 125, 125, 32)9248
max_pooling2d_1(MaxPooling 2D)(None, 62, 62, 32)0
conv2d_2(Conv2D)(None, 60, 60, 32)9248
max_pooling2d_2(MaxPooling 2D)(None, 30, 30, 32)0
conv2d_3(Conv2D)(None, 28, 28, 64)18,496
max_pooling2d_3(MaxPooling 2D)(None, 14, 14, 64)0
conv2d_4(Conv2D)(None, 12, 12, 64)36,928
max_pooling2d_4(MaxPooling 2D)(None, 6, 6, 64)0
conv2d_5(Conv2D)(None, 4, 4, 128)73,856
max_pooling2d_5(MaxPooling 2D)(None, 2, 2, 128)0
flatten(Flatten)(None, 512)0
dense(Dense)(None, 128)65,664
dense_1(Dense)(None, 64)8256
dense_2(Dense)(None, 2)130
Total params: 222,722. Trainable params: 222,722. Non-trainable params: 0.
Table 3. Default pairs (n, q) for 128-bit and 192-bit security levels.
Table 3. Default pairs (n, q) for 128-bit and 192-bit security levels.
Bit-Length of Default q.
n128-bit Security192-bit Security
10242719
20485437
409610975
8192218152
16,384438300
32,768881600
Table 4. Dataset description.
Table 4. Dataset description.
DatasetRowsLabel
Training800Negative
800Positive
Test200Negative
200Positive
Table 5. Prediction result without federated learning.
Table 5. Prediction result without federated learning.
PrecisionRecallF1 ScoreAccuracy
0.8689240.8400000.8368010.840000
Table 6. Performance measurements: federated learning without encryption.
Table 6. Performance measurements: federated learning without encryption.
Number of Clients2357
Precision0.8721280.8651120.8592880.850277
Recall0.8450000.8375000.8350000.827500
F1 Score0.8421230.8343690.8321640.824649
Accuracy0.8450000.8375000.8350000.827500
Table 7. Performance measurements: federated learning with encryption sec = 128.
Table 7. Performance measurements: federated learning with encryption sec = 128.
Number of Clients2357
Precision0.8673370.8572930.8539250.869584
Recall0.8375000.8400000.8300000.852500
F1 Score0.8341320.8380400.8270780.850776
Accuracy0.8375000.8400000.8300000.852500
Table 8. Performance measurements: federated learning with encryption sec = 192.
Table 8. Performance measurements: federated learning with encryption sec = 192.
Number of Clients2357
Precision0.8667350.8689240.8556240.86800
Recall0.8400000.8400000.8325000.84500
F1 Score0.8370300.8368010.8297320.84254
Accuracy0.8400000.8400000.8325000.84500
Table 9. Running time in seconds.
Table 9. Running time in seconds.
Number of ClientsWithout EncryptionEncryption (sec = 128)Encryption (sec = 192)
2594.1654484333.6723334765.874634
3647.9637125124.8415247504.239611
5720.1757866777.24909910,518.012003
7948.7048339223.28134613,277.904182
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wibawa, F.; Catak, F.O.; Sarp, S.; Kuzlu, M. BFV-Based Homomorphic Encryption for Privacy-Preserving CNN Models. Cryptography 2022, 6, 34. https://doi.org/10.3390/cryptography6030034

AMA Style

Wibawa F, Catak FO, Sarp S, Kuzlu M. BFV-Based Homomorphic Encryption for Privacy-Preserving CNN Models. Cryptography. 2022; 6(3):34. https://doi.org/10.3390/cryptography6030034

Chicago/Turabian Style

Wibawa, Febrianti, Ferhat Ozgur Catak, Salih Sarp, and Murat Kuzlu. 2022. "BFV-Based Homomorphic Encryption for Privacy-Preserving CNN Models" Cryptography 6, no. 3: 34. https://doi.org/10.3390/cryptography6030034

Article Metrics

Back to TopTop