Next Article in Journal
The Role of ML, AI and 5G Technology in Smart Energy and Smart Building Management
Next Article in Special Issue
Privacy-Enhanced Federated Learning: A Restrictively Self-Sampled and Data-Perturbed Local Differential Privacy Method
Previous Article in Journal
Study on Frequency Stability of an Independent System Based on Wind-Photovoltaic-Energy Storage-Diesel Generator
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Vertically Federated Learning with Correlated Differential Privacy

1
Software College, Northeastern University, Shenyang 110169, China
2
Huawei Technologies Co., Ltd., Huawei Industrial Base, Bantian, Shenzhen 518129, China
3
Faculty of Computer Science, Dalhousie University, Halifax, NS B3H 4R2, Canada
*
Author to whom correspondence should be addressed.
Electronics 2022, 11(23), 3958; https://doi.org/10.3390/electronics11233958
Submission received: 10 November 2022 / Revised: 26 November 2022 / Accepted: 28 November 2022 / Published: 29 November 2022
(This article belongs to the Special Issue Artificial Intelligence Based on Data Mining)

Abstract

:
Federated learning (FL) aims to address the challenges of data silos and privacy protection in artificial intelligence. Vertically federated learning (VFL) with independent feature spaces and overlapping ID spaces can capture more knowledge and facilitate model learning. However, VFL has both privacy and utility problems in framework construction. On the one hand, sharing gradients may cause privacy leakage. On the other hand, the increase in participants brings a surge in the feature dimension of the global model, which results in higher computation costs and lower model accuracy. To address these issues, we propose a vertically federated learning algorithm with correlated differential privacy (CRDP-FL) to meet FL systems’ privacy and utility requirements. A privacy-preserved VFL framework is designed based on differential privacy (DP) between organizations with many network edge devices. Meanwhile, feature selection is performed to improve the algorithm’s efficiency and model performance to solve the problem of dimensionality explosion. We also propose a quantitative correlation analysis technique for VFL to reduce the correlated sensitivity and noise injection, balancing the utility decline due to DP protection. We theoretically analyze the privacy level and utility of CRDP-FL. A real vertically federated learning scenario is simulated with personalized settings based on the ISOLET and Breast Cancer datasets to verify the method’s effectiveness in model accuracy, privacy budget, and data correlation.

1. Introduction

Machine learning (ML) has developed rapidly in artificial intelligence (AI) applications, such as computer vision, automatic speech recognition, natural language processing, and recommendation system. Big data plays a particularly prominent role in machine learning, especially deep learning [1,2,3,4,5]. In this context, various countries and institutions are gradually becoming aware of the importance of data ownership and security. To ensure the safe management and use of data, China promulgated the network security law in 2017. The United States and the European Union passed the general data protection regulations, the California consumer privacy law, and other laws to strictly control data collection and transactions [6,7]. In this new legislative environment, it becomes more and more difficult for different organizations to collect and share data. Due to competition, user privacy, data security, complex management procedures, and other reasons, data islands are formed. At present, data islands and privacy protection have become two significant challenges for the development of data-driven artificial intelligence [8,9,10].
Federated learning (FL) is a framework that can effectively help multiple organizations model data use and machine learning while meeting the requirements of user privacy protection, data security, and government regulations [11]. Different organizations or institutions span the fields of health care, banking, finance, and retail and face legal, administrative, or moral constraints, requiring keeping data locally and forming an independent data shaft. At the same time, due to the popularity of Internet of Things technology and wearable edge devices, datasets of different organizations may be obtained by different network edge devices. Because the ID spaces between different data silos may coincide, while the feature spaces are independent, vertically federated learning (VFL) can acquire more modeling knowledge, thus promoting machine learning. At present, this vertically federated learning scenario has attracted the attention of academia and industry [12,13,14]. However, the increase in vertically federated learning participants will lead to a sharp increase in global model dimensions. The vast feature space will lead to an increase in computational complexity and a decrease in model accuracy. In traditional machine learning, feature selection is an essential means to improve the model’s accuracy. However, the existing vertically federated learning algorithm needs more research on the feature selection of the global model.
Moreover, the research shows that parameter transmission may cause privacy disclosure. The client’s private data faces a single attack or collusion attack from server attackers and other malicious clients during federated learning [15]. In the federated framework, the client holds local data for training, while the server only shares local model parameters to protect privacy to a certain extent. However, it has been shown that sharing original local updates damages users’ privacy [11,16]. Much research concerns privacy issues in federated learning [8,17,18,19,20]. Among them, differential privacy (DP) has attracted wide attention because of its solid theoretical guarantee and small communication cost. It provides a privacy guarantee for ML by adding a small amount of noise to the model parameters. Traditional differential privacy techniques rely on a trusted analyzer to collect local updates. However, a trusted third party may not exist. Since it does not rely on a trusted analyzer, local differential privacy (LDP) has been widely concerned [21]. However, research shows that correlated data increases the risk of privacy disclosure. The correlated data in different data sets can be divided into direct and indirect correlations. The direct correlation is defined as two identical records. Compared with direct correlation, indirect correlation is more complex, and the degree of correlation between records needs to be considered. To address the problem, correlated differential privacy (CRDP) has been widely studied [22,23]. Some correlation analysis technologies such as group DP [24], Correlated DP [25], and Bayesian DP [26] have improved the effectiveness of the algorithm at the same privacy level by reducing sensitivity. However, the existing correlated differential privacy technologies only aim at the scenario of centralized data training. There needs to be more research on the impact of correlated data in differentially private federated learning. In particular, in the vertically federated learning scenario, the personalized research on the correlated differential privacy technology, such as selecting the optimal feature set of the global model to balance the impact of data correlation while ensuring accuracy, is noteworthy.
To solve the above problems, we proposed a correlated differentially private federated learning (CRDP-FL) algorithm based on the scenario of vertically federated learning and considering the utility requirements of privacy-preserving federated learning. The vertically federated learning framework is designed for the ubiquitous scenario with many network edge devices and the proposed vertically federated learning through organizations. That is, the data of different network edge devices are uploaded to the organization, and the organizations jointly train a global federated model to ensure privacy. Focusing on privacy disclosure issues caused by parameter transmission and untrusted analyzers, we introduce differential privacy technology into a vertically federated learning framework to protect privacy. We first design the CRDP-FL framework, which contains two layers. The top layer is designed as a decentralized differential privacy-preserving vertically federated learning with N data centers (organizations), each having at least one client (edge device) in the bottom layer. Laplace noise is added to local parameters before communicating to meet the differential privacy protection requirements. The data organizations form a complete graph structure, which can provide bilateral communication. Furthermore, they are decentralized, without a trusted central server in the top layer, and also trustless with each other. Therefore, CRDP-FL further satisfies local differential privacy. The bottom layer of CRDP-FL is client-side data aggregation for VFL between data organizations and injecting DP noise for privacy protection. There is no communication between the bottom-layer clients. A correlated differential privacy utility optimization strategy is proposed to achieve utility optimization of CRDP-FL, including privacy-preserved feature selection and privacy-preserved feature optimization based on correlated analysis for VFL. Aiming at the problem of the global model dimension mentioned above, we perform feature selection to improve the global models’ efficiency performance. In the process, we also introduce DP to protect privacy. A correlated differential privacy utility optimization strategy under the federated framework is also proposed, which provides a quantitative analysis technology between data correlation and dimensions, reduces the correlated sensitivity, and effectively balances the utility decline caused by privacy protection operations. For privacy protection in feature optimization, the DP exponential mechanism is used to obtain the minimum correlated sensitivity with the highest score.
Therefore, the proposed CRDP-FL algorithm can effectively improve the model’s performance in protecting privacy. The contributions of our work are as follows:
  • We propose a vertically federated learning algorithm with correlated differential privacy. A general vertically federated learning framework is constructed, collecting data from the client and jointly training the global model between organizations. Furthermore, by injecting DP noise into the VFL framework, CRDL-FL achieves local differential privacy and provides a strong privacy guarantee for VFL.
  • To balance the utility and privacy of CRDP-FL, a utility optimization strategy is proposed, including feature selection and optimization for vertically federated learning, thereby improving the algorithm’s efficiency and the global model’s performance. A quantitatively correlated analysis method and correlated sensitivity in VFL (CS-VFL) are proposed to reduce additional noise injecting from DP operation due to data correlation.
  • We analyze and verify the algorithm from theoretical and practical aspects. The privacy and utility analysis is performed theoretically. Comperhanserve experiments based on the ISOLET and Breast Cancer datasets demonstrate that CRDP-FL is superior to existing methods in model accuracy, privacy budget, and data correlation.

2. Related Works

2.1. Federated Learning and Privacy Concern

Federated learning addresses the issue of privacy or confidentiality by providing a distributed machine learning model to reduce the risk of exposing training data. From a technical classification perspective, existing federated learning research can be classified by network topology and data distribution features. Based on the network topology, federated learning can be classified into centralized and decentralized federated learning [27]. Although federated learning is fundamentally based on the decentralized data approach, quite a few algorithms still require a central server to collect the trained models from participants in the federated learning framework, build a global model, and share it with all participants. This strategy is mainly implemented by establishing a trusted third party as the server and building trust between the server and the participants. In contrast, decentralized federated learning uses reliability algorithms to replace the reliance on the central server for the model aggregation process. A typical algorithm is an adaptive averaging algorithm based on the Byzantine, which assumes that more than 2/3 of the systems in FL are honest. Through this approach, a group of clients from different domains with a common goal can collaborate, share data, build machine learning models, and take advantage of high accuracy without relying on a third-party centralized server [28,29,30]. According to the data distribution features, federated learning can be divided into three types: federated transfer learning, horizontal federated learning, and vertically federated learning. Federated transfer learning applies to the case where the participants slightly overlap the data samples and data features, such as training a new requirement on a model that has been trained on a similar dataset and used to solve an entirely different problem. Horizontal federated learning applies when the participants’ data have overlapping data features but different data samples. A representative of such algorithms for sample partitioning is federated averaging (FedAvg) [31,32], which derives from the stochastic gradient descent algorithm, also called stochastic parallel gradient descent. Vertically federated learning applies when the participants have overlapping data samples but different data features, and its algorithm has the characteristic of feature partitioning. A representative algorithm of vertically federated learning is distributed block coordinate descent [18]. Assuming that the feature partitioning is given and performs synchronous block updates of variables, it is an algorithm in which participants exchange their respective intermediate values frequently during the training process to train a global model jointly. Due to its high communication cost, many practical studies have reduced its communication cost, such as federated stochastic block coordinate descent (FedBCD) [14]. In addition, Das et al. proposed a new, decentralized stochastic coordinate descent algorithm [33] based on FedBCD. This algorithm is a decentralized two-tier federated learning algorithm and constructs a two-tier federated learning framework. However, as mentioned above, vertically federated learning still suffers from utility problems, such as the vast feature space of the overall model leading to an increase in computational complexity and a decrease in model accuracy. There still needs to be more related research.
Although, each participant in federated learning uses its local data to train the machine learning model, sharing only the model’s updated weight and gradient information with other participants. However, if the data structure is known, then the gradient information may also be exploited, leading to the disclosure of additional information about the training data [16]. Since federated learning requires different participants to upload and aggregate parameters to train the global model repeatedly, this process leads to the leakage of more private information, and personal privacy can still be obtained through model inversion. Therefore, reconstruction attacks are the main privacy concern of federated learning. In addition, during the model inference phase, an adversary may use reverse engineering to obtain additional information about the model to perform model inversion attacks or membership-inference attacks. The adversary in model inversion attacks aims to extract training data or feature vectors of training data from the model, while in membership-inference attacks, the adversary’s goal is to determine whether the training set of the model contains specific samples. Currently, privacy-preserving techniques in federated learning mainly include homomorphic encryption [17], secure multi-party computation [18], and differential privacy [19]. Differential privacy provides a rigorous, quantifiable privacy-preserving method for machine learning, independent of the attack’s background knowledge, and it has received much attention.
In summary, decentralized federated learning is currently more promising because it does not require a trusted third-party institution. Additionally, there are more research results for horizontal federated learning frameworks, while there are still gaps in the research of vertically federated learning frameworks. However, it is more realistic that the feature spaces are independent in data silos of unrelated industries. Therefore, this paper investigates the decentralized vertically federated learning framework and proposes a differential privacy protection solution to address the privacy leakage problem.

2.2. Differential Privacy in Machine Learning

Differential privacy is widely used in machine learning algorithms to counter various attacks and provide privacy by adding noise to the model parameters, all of which have achieved good results. Ref. [34] proposed a privacy-preserving logistic regression algorithm based on differential privacy. They optimized random data, which makes it possible to balance model performance and privacy preservation. They proved that their result satisfies ϵ -differential privacy according to the differential privacy definition. Mangasarian et al. [35], respectively, researched privacy-preserving support vector machine models on horizontally and vertically divided datasets. They used randomly generated kernel functions to hide the original learning kernel functions and achieve similar performance to the SVM model without privacy preservation. In addition, differential privacy is widely used for privacy preservation in deep learning and federated learning. Song et al. [36] proposed a differentially private stochastic gradient descent algorithm (DP-SGD), which slices the gradient and adds noise to the gradient during the training process, enabling the trained deep model to have ( ϵ , δ )-differential privacy. Many federated learning algorithms based on differential privacy are created and achieve good results [19]. Some personalized differentially private federated learning algorithms have been proposed for complex models [37,38] and non-uniform sampling processing [39].
In conclusion, differential privacy is commonly used in distributed machine learning and federated learning systems due to its computational efficiency and ease of implementation. However, the intake of noise will affect data accuracy and model performance. So balancing privacy and the utility of differential privacy preservation is always an open problem. Therefore, this paper investigates vertically federated learning algorithms of differential privacy preservation and works on improving the algorithm’s utility.

2.3. Correlated Differential Privacy

Differential privacy provides a rigorous mathematical approach to protect privacy by defining indistinguishability, ensuring that the addition or deletion of any individual record does not affect the analysis results. However, studies have shown that when multiple datasets are correlated, it increases the risk of privacy disclosure. At the same level of privacy preservation, correlated data will increase sensitivity, which adds an additional cost of noise and reduces the utility of the data [40,41]. Balancing the privacy and utility of correlated datasets has emerged as a new research hotspot in correlated data privacy techniques. Some studies have focused on correlated measurements to reduce sensitivity. When some users of different datasets have identical records, these datasets are considered as directly correlated. Correlation is strictly defined as two identical records. Unlike direct correlation, indirect correlation is more complex and is defined as two rented records about a user or its correlated users. For example, some information flows about user activities, such as GPS and social network records, may correlate with each other. It is not easy to identify and measure the different degrees of correlation. Zhu et al. [42,43] used the correlation matrix to express the correlation between correlated datasets. By converting global sensitivity to correlated sensitivity and using it as the upper sensitivity limit, it can limit the impact of the correlation. For correlation measurement of data, some uncertain correlation models are proposed, such as the Gaussian correlation model [26,44] and the Markov chain model [45]. Song et al. [46] proposed two mechanisms based on Pufferfish privacy mechanisms using the Markov chain to measure data correlation between adjacent states, which reduced global sensitivity. Zhang et al. [42] reduced global sensitivity by performing feature selection based on correlated sensitivity for datasets with correlated records. Recently, some studies have achieved results on differential privacy preservation for correlated data with unique formats, such as sequential data, tuple data, and trajectory data [47,48,49,50]. In order to balance privacy and utility in the algorithm, some scholars have also proposed correlated differential privacy preservation techniques, which focus on the privacy parameters in the correlated dataset. For example, Wu et al. [51] proposed a definition of correlated differential privacy and provided a fuzzy measurement method of correlation. They dynamically adjusted the level of privacy preservation in the correlated datasets based on Nash equilibrium theory and experimentally verified the effect of privacy parameters for multiple datasets on global data utility. Zhao et al. [52] extended correlated differential privacy to multi-party data release scenarios and performed feature selection through stability algorithms and exponential mechanisms, which reduced data sensitivity and noise intake to improve data utility.
However, the current research only applies to centralized learning scenarios, and there needs to be more research on data correlation in the scenario of joint training of multiple organizations in federated learning. In this paper, we investigate data correlation in the vertically federated learning framework with differential privacy preservation. We balance privacy and utility by reducing correlated sensitivity and noise intake.

3. Preliminary

3.1. Vertically Federated Learning

Datasets owned by different organizations often have different feature spaces for different purposes, but these organizations may share a large user population. We can build a better machine learning model from the heterogeneous data scattered across organizations without exchanging and disclosing private data through vertically federated learning. Each organization has the same identity and status as a participant in this system. Data used in vertically federated learning can be defined as the following equation.
F i F j , I i = I j , D i , D j , i j
Here, F represents the feature space of a dataset. I represents the sample ID space. D represents datasets owned by different participants. For dataset D, define M represents the dataset sample space. Then we have the following equation.
D R M × ( I , F )
For the dataset label space y, different vertically federated learning frameworks will be slightly different. In one situation, only one participant has the label space, and in the other, all participants have the label space. The vertically federated learning framework described in this work is the latter case, where all participants have their own label space. In vertically federated learning, each dataset owns its unique features. Such a dataset can be called feature-partitioned distributed data.
Definition 1 
(Vertically Federated Learning, VFL). A task with N participants training a vertical federated learning model should be defined as the following equation.
min Θ L ( Θ , D ; y ) = Δ 1 M i = 1 M f ( θ 1 , ... , θ N ; D i , y i ) + λ j = 1 N ω ( θ j )
where the global model Θ consists of local models of each participant and is defined as the following equation.
Θ = { θ 1 T , ... , θ N T } T
In addition, f ( · ) represents the loss function, ω ( · ) represents the regularization term, and  λ is the hyperparameter. For general ML models, such as logistic regression (LR), support vector machine (SVM), and neural network (NN), the loss function has the following forms:
f ( θ 1 , ... , θ N ; X i , y i ) = f ( j = 1 N θ j X j i , y i )
The federated random coordinate descent method [14] (FedBCD) is a classical vertically federated learning method in which minibatch based on training extraction is defined as S. For participant k, this part of the data can be defined as D k S . For formula i S , we can call the eigenvectors d k i and the label vectors y k i . The intermediate information obtained from the calculation of data and model parameters is as the following equation.
Φ k S = D k S θ k
For the whole model Θ , the data used for random gradient descent should be the sum of the intermediate information of all participants, denoted as the following equation.
Φ S = k = 1 N Φ k S
Then, for formula i S and ϕ i Φ S , we have
ϕ i = k = 1 N ϕ k i = d k i θ k
The label vector is called y k i . Let us call n S the number of samples of S. Then, a part of the random gradient of θ k can be defined as the following equation.
g k ( Θ ; S ) = k f ( Θ ; S ) + λ ω ( θ k ) = 1 n s g ( Φ S , y k S ) + λ ω ( θ k ) = 1 n s i S f ( ϕ i , y k i ) ϕ i ( d k i ) T + λ ω ( θ k )
For a participant model, the calculation of k f ( Θ ; S ) requires using intermediate information computed by other participants. In other words, participant k should broadcast calculated Φ k to all models within the framework during each training round and receive intermediate information from other participants simultaneously, which can be defined as the following equation.
Φ k S = { Φ q S } q N , q k
To highlight the communication process, the part of the random gradient can be further defined as the following equation.
g k ( Θ ; S ) = k f ( Φ k , θ k ; S ) + λ ω ( θ k ) = g k ( Φ k , θ k ; S )
The random global gradient can be expressed as:
g ( Θ ; S ) = { g k ( Φ k , θ k ; S ) } k N
Then, according to Equation (3) and the definition of the random gradient descent method, the learning rate η is given. For S agreed upon by each participant, the parameter is updated according to the following equation.
θ k θ k η g k ( Φ k , θ k ; S ) , k N
On this basis, FedBCD sets a local update count R. Each participant can transfer information only once per R round. Each message pass determines the minibatch of R groups, and then the intermediate value Φ is uniformly calculated and packaged for broadcast. The iteration of information transfer is called the communication round. Each time the local training takes place, the participant computes a set of intermediate values from the other participants. Suppose that when the training arrives at the t round, the latest communication round is τ o , the parameter update can be determined as the following equation.
θ k θ k η g k ( Φ k , θ k ; S ) , k N
The communication cost savings of FedBCD is evident; however, it still has the risk of privacy disclosure in communication. Introducing the differential privacy method is a reasonable scheme to ensure privacy protection while saving communication costs.

3.2. Differential Privacy

Differential privacy is a concept of privacy proposed by Dwork et al. in 2006 for the privacy disclosure of statistical databases [53,54]. The technique based on differential privacy protection designs a mechanism to add noise to the target database to minimize the loss of statistical information between the published dataset and the original dataset, ensuring that the modification of an individual record in the dataset will not significantly affect the statistical results.
Definition 2 
( ϵ -Differential Privacy). For any dataset D 1 and D 2 with at most one different record and any processed dataset r R a n g e ( M ) , the random mechanism satisfies differential privacy if and only if:
D P ( M ) = sup D 1 , D 2 , S log P r [ r S | D 1 ] P r [ r S | D 2 ] ϵ ,
where ϵ is called a privacy budget, it is the probability ratio of the random mechanism algorithm M to obtain the same return value on two adjacent datasets, reflecting the privacy protection level that M can provide. The smaller ϵ is, the higher the level of privacy protection is. The value of ϵ must be based on the requirements of specific scenarios to achieve a balance between privacy protection security and data availability.
Definition 3 
(Global Sensitivity). Suppose there is a function Q : D i R d , whose input is a dataset and output is a d-dimensional real vector. For any adjacent datasets D 1 and D 2 , its sensitivity is defined as follows.
Δ G S = max D 1 , D 2 Q ( D 1 ) Q ( D 2 ) 1
Definition 4 
(Random Mechanism). For dataset D, if the output satisfies the distribution of
Pr ( r S | D ) ,
where the random function M ( D ) is a random perturbation mechanism on D.
Definition 5 
(Laplacian Mechanism). The Laplace mechanism adds Laplacian noise to the actual output of the function to achieve differential privacy. The mechanism takes dataset D, function Q, and privacy budget ϵ as inputs; it is de-symbolized for functions whose output is actual. For any function Q : D i R d , the following mechanisms provide ϵ-differential privacy.
M ( D ) = Q ( D ) + L a p l a c e ( Δ G S ϵ )
McSherry and Talwar [55] proposed an exponential mechanism that satisfies differential privacy while outputting a near-optimal t T according to the utility function.
Definition 6 
(Exponential Mechanism). The indexing mechanism takes dataset D, privacy budget ϵ, and utility function u : ( D × T ) R d as inputs. The utility function assigns a score to each output t T , with a higher score indicating better utility. For any utility function u : ( D × T ) R d , the algorithm that chooses output t with probability proportional to exp ( ϵ u ( D , t ) 2 Δ u ) satisfies ϵ-differential privacy.
According to the mechanism’s scope, the accumulative privacy budget satisfies the sequential and parallel composition [53,54].
Theorem 1 
(Serial Composition). Assuming that a set of privacy steps { M 1 , ... , M m } operate sequentially on a dataset, and each M i provides a ϵ i privacy guarantee, then M satisfies i = 1 m ϵ i -differential privacy.
Theorem 2 
(Parallel Composition). Parallel synthesis. Assuming that a set of privacy steps { M 1 , ... , M m } operate sequentially over multiple disjoint data subsets, and each M i provides a ϵ i privacy guarantee, then M satisfies max ( ϵ i ) -differential privacy.

3.3. Correlated Differential Privacy

In single-party data scenarios, the correlated sensitivity has worked well. Under the differential privacy mechanism, as records are only partially correlated, deleting one record may impact other records differently. In the framework of correlated sensitivity, the influence of different intensities is defined as the relevance of records. Different methods can be used to calculate the correlation between records from the correlation data analysis [43]. The standard method calculates the Pearson’s correlation coefficient of records i and j. This coefficient is used to measure the linear dependence between variables, and  w i j [ 0 , 1 ] represents the degree of correlation between records i and j. When w i j > 0 , there is a specific correlation between record i and j. w i j = 1 indicates that records i and j are entirely correlated. w i j = 0 indicates that record i is entirely unrelated to record j. The higher the correlation, the stronger the correlation. At the same time, the correlation matrix Δ is used to describe the correlation of a set of data.
Δ = w 11 w 21 ... w n 1 w 21 w 22 ... w 2 n w n 1 w n 2 ... w n n
To meet the requirements of correlation analysis, the correlation threshold is defined to screen the correlation. For the given correlation threshold w 0 , the following formula filters the correlation that meets a certain intensity.
w i j = w i j w i j w 0 0 w i j < w 0
The filtered correlation matrix can be used to describe the correlation between the records of a dataset that meet the correlation threshold w 0 , through which the correlated sensitivity of the dataset can be calculated. The sensitivity of record i can be denoted as Δ C S i , which the following formula can calculate as the following equation.
Δ C S i = j = 0 l w i j ( Q ( D j ) Q ( D j ) 1 )
where D j represents the dataset with r j and D j represents the data set with r j deleted from D. Then, the correlated sensitivity of a single-party dataset containing q data records is denoted as the following equation.
Δ C S q = max i q Δ C S i
Compared with the correlated sensitivity, the global sensitivity Δ G S only measures the maximum number of correlated records without considering the degree of data correlation. Since the correlated sensitivity measures the degree of correlation, it reduces the global sensitivity and the noise for private data.

3.4. Problem Statement

To meet the actual vertically federated learning requirement, we summarize the current issues of vertically federated learning from the framework design, privacy protection, and utility optimization as follows:
  • According to the actual scenarios, there are problems such as network transmission performance differences and insufficient computing power due to network edge devices’ performance differences. Therefore, taking small data nodes (devices) as participants of federated learning significantly impacts global model training, especially for deep learning models with big data volumes. Therefore, different organizations should collect clients’ data, and this method is a feasible utility scheme. Usually, various data organizations, such as large medical service institutions, may collect many downstream units or customer data for model training. Data collected by different organizations are limited in application due to data barriers. So it is necessary to federated train a global model for various organizations. In this paper, we design a two-layer vertically federated learning framework to meet the requirements of federated learning from clients to organizations and between organizations.
  • As discussed above, federated learning suffered from privacy issues during training, and differential privacy realizes privacy protection by disturbing the parameters in the training process. Especially in the entirely decentralized federated learning framework, local differential privacy technology further satisfies the privacy protection of third-party untrusted institutions. Therefore, we perform a differential private operation for a decentralized vertically federated learning framework by injecting noise into the information transmission and feature selection process.
  • Utility optimization is the most critical for privacy-preserving vertically federated learning. On the one hand, increasing participants in vertically federated learning bring a massive surge in global model dimensions. The vast feature space may cause an increase in computational complexity and cost. Moreover, unrelated features result in the utility decreasing of the global model’s accuracy. Therefore, feature selection is necessary for utility optimization in client data aggregation. On the other hand, introducing differential privacy causes utility decline while protecting privacy. In particular, the data correlation will cause additional utility loss. To address these issues, we propose a correlated differential privacy utility optimization strategy, including feature selection and correlation analysis for VFL. Correlated sensitivity in vertically federated learning, i.e., CS-VFL, is defined to reduce sensitivity to improve the algorithm’s effectiveness.
For the given problem, we design vertically federated learning with correlated differential privacy, and the problem should be formalized as follows. We assume that there are N data centers in the vertically federated system, where the data organization j has K j devices. Then, N data centers jointly train a global model X R M × I , F on data space X R M × ( I , F ) based on the FedBCD algorithm. In intermediate information communication, the privacy protection requirements are satisfied by adding noise, and the process satisfies ϵ -differential privacy. In addition, achieving utility by utility optimization mainly refers to optimizing the precision of the global model.

4. CRDP-FL

4.1. Outline

In this paper, we design a general vertically federated learning framework and provide a privacy guarantee. As shown in Figure 1, the proposed CRDP-FL framework contains two layers. The top layer is designed as a decentralized differential privacy-preserving vertically federated learning with N data centers (organizations), each having at least one client (edge device) in the bottom layer. Laplace noise is added to local parameters before communicating to meet the differential privacy protection requirements. The data organizations form a complete graph structure, which can provide bilateral communication. Furthermore, they are decentralized, without a trusted central server in the top layer, and also trustless with each other. Therefore, CRDP-FL further satisfies local differential privacy. The bottom layer of CRDP-FL is client-side data aggregation for VFL between data organizations and also injecting DP noise for privacy protection. There is no communication between the bottom-layer clients. A correlated differential privacy utility optimization strategy is performed to achieve utility optimization of CRDP-FL, including privacy-preserved feature selection and feature optimization based on correlated analysis for VFL.
The core of CRDP-FL, i.e., client-side data aggregation and differentially private vertically federated learning, is described in detail in Section 4.2 and Section 4.3, respectively.

4.2. Client-Side Data Aggregation

Client-side data aggregation in CRDP-FL is the process of aggregating data from the clients to the data organization in the bottom-layer model. In particular, the correlated differential privacy utility optimization strategy is proposed. We perform feature selection for the vertically federated framework to reduce dimensionality and make the algorithm efficient. The features most relent to learning tasks are selected to improve data training accuracy. We also propose correlated sensitivity for vertically federated learning to analyze the quantitative relationship between feature sets and data correlation and achieve utility optimization by reducing noise intake through sensitivity reduction. On this basis, we propose feature optimization to vertically federated learning to determine the optimal feature set. Client-side data aggregation consists of the following steps. Algorithm 1 presents the implementation process of these processes.
  • Feature selection. For selecting the most relevant to the learning task, in CRDP-FL, the stability feature selection is performed. The features are divided into the optimal feature set and the adjusted feature set. We also use Pearson’s correlation coefficient to filter out highly correlated features and move one of the correlated features from the optimal feature set to the adjusted feature set. In the process, DP noise is injected for privacy-preserving.
  • Correlation analysis. We proposed correlated sensitivity in VFL to quantitative analysis of the relationship between features and data correlations for performing correlation analysis in VFL. To improve the classic correlated sensitivity, we propose feature-oriented correlation to measure the correlation in VFL and mean correlated degree as the standard of the correlated records.
  • Feature optimization. Based on the feature selection results, we generate candidate feature sets from the adjusted feature set. The DP exponential mechanism is used for obtaining the minimum correlated sensitivity with the highest score. Then, we add the features from the best candidate feature solutions to the optimal feature set. The final obtained optimal feature set is relaxed from the initial one since reducing the DP noise is essential for balancing privacy and utility in CRDP-FL.
  • Data aggregation. Aggregate data into data organizations based on the final optimal feature set.
Algorithm 1: DataAggregation
Input: distributed dataset D j ; private budget for Pearson’s correlation ϵ 1 ; private budget for feature optimization ϵ 2 ; important feature threshold θ ; Pearson’s correlation threshold θ p e r ; initial w 0 ; adjusting coefficient b
Output: optimal X j
1
FeatureSelection ( D j , ϵ 1 , θ , θ p e r ), obtain best features set α and adjusted features set β ;
2
CalculateCorrelationThreshold ( D j , w 0 ) update w 0 ;
3
FeatureOptimization ( D j , ϵ 2 , b, w 0 , β , α ) update β ;
4
Aggregate optimal X j according to β ;
5
return  X j

4.2.1. Feature Selection

In traditional machine learning algorithms, feature selection before training models significantly reduces data dimensionality and improves model accuracy. We propose a differentially private feature selection algorithm shown in Algorithm 2 based on the stability feature selection method [42] and combing Pearson’s correlation coefficient to merge feature sets on feature-partitioned distributed data. The DP noise is injected for privacy protection. Stability feature selection traverses the distributed data feature set and filters out the less correlated features to the prediction. As shown in Equation (23), the important feature score of feature f i in a distributed dataset is the ratio of the frequency of important features to the number of data subsets. The feature that exceeds the threshold is considered an important feature.
S i = T f r e q N
The stability selection algorithm is helpful for overcoming overfitting and understanding the data and for the scenario of client-side data aggregation for utility optimization in this paper. This method is one of the best-performing methods in a multi-party dataset environment. However, the method does not make the important features score 0 because of having similar features and associated features, so more similar and associated important features are retained. To overcome this weakness of stability selection, we combine Pearson’s correlation coefficient calculation in our feature selection algorithm. Equation (24) can calculate the Pearson’s correlation coefficients of features f m and f n , where μ m and μ n are the means of f m and f n , respectively, and  σ m and σ n are the standard deviations of f m and f n .
P m , n = E [ ( f m μ m ) ( f n μ n ) ] σ m σ n
Definition 7 
(Correlation Coefficient Record Sensitivity). For a query Q, the record sensitivity of Pearson’s correlation coefficient for record i can be defined as follows.
Δ C S i = max m , n p p m , n p m , n 1
Definition 8 
(Correlation Coefficient Sensitivity). For a query Q, the sensitivity of the correlation coefficient is determined by the maximum record sensitivity of the correlation coefficient.
Δ C S p e r = max l D β ( Δ C S i )
where Q is a query for a set of records Pearson’s correlation coefficients, it is easy to know the correlation coefficient sensitivity Δ C S p e r 1 , since the correlation coefficient takes values from 0 to 1.
Algorithm 2: FeatureSelection
Electronics 11 03958 i001

4.2.2. CS-VFL

Our proposed CS-VFL is a correlation analysis technique for distributed datasets such as vertically federated frameworks. In CS-VFL, we define feature-oriented correlation, which measures the correlation between records by the coincidence of feature values, and thus provides an intuitive quantitative relationship between feature sets and data correlation. Moreover, by defining the mean correlated degree in VFL as the correlation standard, our method utilizes the a priori knowledge in VFL and thus provides a more objective measure. On these bases, the correlated sensitivity in VFL can be calculated, which effectively reduces the amount of noise due to a tighter correlation threshold than the global sensitivity (GS) and the traditional correlated sensitivity (CS) in the single-party dataset. Since calculated correlation is feature-oriented, it provides a solid foundation for the subsequent feature optimization.
Definition 9 
(Feature-oriented Correlation). Feature-oriented correlation is the record correlation that measures the degree of matching between record i and record j for all features taken.
w i j = m a t c h ( i , j ) l
m a t c h ( i , j ) = 1 , i f v i m = v j m f o r m l , 0 , o t h e r w i s e .
where v i m and v j m are the m t h feature values of record i and record j.
Definition 10 
(Mean Correlated degree in VFL). For a distributed dataset with n-sided data with feature partitioning, the correlated sensitivity of one of the data subsets is denoted as Δ C S n , which is the sum of the correlations of the k correlated records in the data subsets, so that the mean correlated degree in the data subsets can be found as Δ C S n ¯ , then the distributed mean correlation of the n-party dataset is defined as the following equation.
M C D = 1 N k = 1 N Δ C S n ¯
M C D is used to measure the average level of correlation strength in a vertically federated framework with feature-partitioned distributed data. Since w i j [ 0 , 1 ] and Δ C S n ¯ [ 0 , 1 ] , the value of M C D is also between 0 and 1. The detailed procedure for computing M C D is shown in Algorithm 3.
Algorithm 3: CalculateCorrelationThreshold
Electronics 11 03958 i002
M C D reflects the trend of correlation in the merged dataset. Using the value of M C D as the correlation threshold w 0 , the correlation between records is marked according to Equation (19), otherwise, the correlation is zero. The distributed correlated sensitivity Δ C S p of the merged subset according to Equation (19) and Equation (20) is as the following equation.
Δ C S p = max i p j = 0 i w i j ( Q ( D p j ) Q ( D p j ) 1 )
where D p denotes the merged dataset, w i j is the degree of correlation between record i and record j when the dataset differs between only record j, and  Δ C S p is used to measure the maximum impact of merging data and all records due to changing one record for any query Q. For any query Q, the output of the differential privacy technique perturbation based on the Laplace mechanism can be calculated using Equation (31).
Q ^ ( D p ) = Q ( D p ) + L a p ( Δ C S p ϵ )
In summary, our proposed CS-VFL provides a basis for utility optimization in client-side data aggregation. We perform correlation analysis in VFL balancing accuracy changes due to dimensional changes and effectively reduce distributed correlated sensitivity, thereby reducing noise ingestion because the mean correlated degree provides a more stringent correlated sensitivity threshold.

4.2.3. Feature Optimization

Through feature selection, we retain a number of features that are most relevant to the training task. However, considering the inevitable existing data correlation due to data aggregation, removing more features usually leads to a higher degree of data correlation. In the DP mechanism, under the same privacy level, it will introduce additional noise, thereby reducing the utility. Therefore, feature optimization aims at relaxing the optimal feature set by adding a certain number of features. To examine the impact of adding noise and features on model training accuracy, we define the utility function based on correlated sensitivity to verify the effect of reducing data correlation to reduce noise on the accuracy of machine learning algorithms. The utility function is defined based on CS-VFL; therefore, it can minimize the correlated sensitivity. The distributed correlated sensitivity Δ C S p c i of the division scheme c i can be obtained according to CS-VFL as shown in Equation (30) so that the candidate feature set with the minimum correlated sensitivity can be selected with a high probability, as shown in Equation (32).
u ( D p , c i ) = w 0 Δ C S p c i
The exponential mechanism is utilized to select relative feature sets with low data correlation levels based on the utility function, maintaining good utility for VFL. The utility scores of all the candidate sets are known, and the probability of selecting candidate c i by the exponential mechanism is as follows.
e x p ( ϵ u ( D p , c i ) 2 Δ u ) c i c e x p ( ϵ u ( D p , c ) 2 Δ u )
The process of feature optimization in CRDP-FL is to generate a certain number of candidate feature sets according to the given adjustment factor b from the adjusting feature set. Then, we calculate the utility score for all candidate feature sets. The exponential mechanism is helpful for selecting the best candidate feature set with the highest utility score. This reduces the correlated sensitivity features and adds to the optimal feature set to improve the utility. The process of feature optimization implementation is shown in Algorithm 4.
Algorithm 4: FeatureOptimization
Electronics 11 03958 i003

4.3. Differentially Private VFL

The top layer of our proposed CRDP-FL is a differentially private VFL. FedBCD is chosen as a fundamental training framework, in which each participant jointly trains their local model using coordinate descent through other participants. Eventually, each data organization maintains its local model for subsequent predictions. We introduce differential privacy into the framework to protect privacy.
  • Data aggregation. The bottom layer is client-side data aggregation and has been shown in Section 4.2. Since the data organization is semi-trustworthy to the clients it owns, each client will only send the data that is determined to be the optimal set of features and formed after utility optimization to the data organization. The data organization does not know the feature space of the training data until it gets the optimal dataset resulting from the data aggregation process.
  • Model initialization. Each organization forms its local data after data aggregation. Then, the data organization initializes the model parameters according to the feature space of local data. Overall, the data owned by each data organization is feature partitioned. Additionally, in federated learning, no exchange of local data occurs. All data organizations maintain a part of the model, so the value calculated by the local model with the local data is also not the predicted value of the whole model but a part of the predicted value, which is called intermediate information. For a sample to be inferred, only the intermediate information computed by all data organizations is accumulated to be the actual predicted value.
  • Intermediate information synchronization. The FedBCD algorithm sets a local training round R to reduce communication costs. When the number of iterations is n R (e.g., 0 , R , 2 R ... ), information can be passed between data organizations. These rounds are collectively referred to as communication rounds, while other rounds are not available for information transfer. At the beginning of the communication round, each data organization agrees that the next R non-communication rounds are used to update the R-group minibatch, compute the R-group intermediate information and broadcast it, and receive the intermediate information from all other data organizations. These messages are used for the next R rounds of local updates. In addition, since data organizations do not trust each other, differential privacy protection of the intermediate information is required before data organizations broadcast their own intermediate information by perturbing the intermediate information to avoid privacy leakage due to inference attacks.
  • Local model updates. Data organizations update the local model according to Equation (14) whether it is an exchange round or not. When performing local updates, the intermediate information from other data organizations is calculated in the most recent synchronization round, except for the latest intermediate information calculated from the part of the model owned by itself. Since the data organization only updates a portion of the parameters belonging to itself by using coordinate descent, it is a biased estimation, which may impact the training accuracy. However, the study by Liu [14] et al. demonstrates that even biased estimates can converge with sufficient training rounds.
Algorithm 5 shows the implementation of the above steps, where line 9 achieves privacy protection by adding Laplace noise. At the same time, the correlated sensitivity is calculated by CS-VFL, thus introducing less noise to achieve utility improvement.
Algorithm 5: CRDP-FL
Electronics 11 03958 i004

5. Utility Analysis

5.1. Privacy Analysis

We first analyze which steps of the CRDP-FL proposed in this paper consume the privacy budget and then analyze the privacy level of CRDP-FL. During the client-side data aggregation, two parts consume the privacy budget. One is the differential privacy operation on the merged dataset during the feature selection, and the other is the candidate feature set selection using the exponential mechanism during the feature optimization. In the differentially private VFL, the exchange of intermediate information by data organizations requires differential privacy perturbation and consumes the privacy budget. Therefore, the privacy level of CRDP-FL can be analyzed as follows.
  • In the feature selection process, assuming that D and D differ by one data record and Q ( ) is a query about the Pearson’s correlation coefficient between any two features of the proximity database, we have the following equation. M 1 ( x , Q 1 ( ) , ϵ 1 ) = Q 1 ( x ) + l a p ( Δ C S p e r ϵ 1 ) . When x, y denote the adjacent dataset variables, respectively, and the random variable x R , the adjacent dataset probability density ratio is shown as the following equation.
    p x z p y z = i = 1 N e x p ( ε 1 | Q 1 x i z i | Δ C S p e r ) e x p ( ε 1 | Q 1 y i z i | Δ C S p e r ) e x p ϵ 1
    Thus, the process satisfies ϵ 1 -differential privacy, and the algorithm requires only a small injection of noise since the sensitivity Δ C S p e r [ 0 , 1 ] .
  • In the feature optimization process, the candidate feature set is selected by the exponential mechanism. For D and D , the privacy level is analyzed as follows.
    exp ( ϵ 2 u ( D p , c i ) 2 Δ u ) c i c exp ( ϵ 2 u ( D p , c ) 2 Δ u ) exp ( ϵ 2 u ( D p , c i ) 2 Δ u ) c i c exp ( ϵ 2 u ( D p , c ) 2 Δ u ) = exp ( ϵ 2 u ( D p , c i ) 2 Δ u ) exp ( ϵ 2 u ( D p , c i ) 2 Δ u ) c i c exp ( ϵ 2 u ( D p , c ) 2 Δ u ) c i c exp ( ϵ 2 u ( D p , c ) 2 Δ u )
    For the left-hand side, the first part can be analyzed as follows.
    exp ( ϵ 2 u ( D p , c i ) 2 Δ u ) exp ( ϵ 2 u ( D p , c i ) 2 Δ u ) = exp ( ϵ 2 ( u ( D p , c i ) u ( D p , c i ) ) 2 Δ u ) exp ( ϵ 2 2 )
    By symmetry, the other part can be analyzed as follows.
    c i c exp ( ϵ 2 u ( D p , c ) 2 Δ u ) c i c exp ( ϵ 2 u ( D p , c ) 2 Δ u ) exp ( ϵ 2 2 )
    Therefore, the feature optimization process for candidate feature set selection by the exponential mechanism satisfies m ϵ 2 -differential privacy, where m is the number of c i . However, it should be noted that the exponential mechanism does not actually add noise and therefore has no effect on the accuracy of the model.
  • During the differentially private vertically federated learning, the data organization performs the exchange of intermediate information by adding Laplace noise with a privacy budget of ϵ 3 each training round. Similar to feature selection, it holds t ϵ 3 -differential privacy.
In summary, since feature selection is operated on a single data center, thus satisfying ϵ 1 -differential privacy, and feature optimization and vertically federated learning are operated on D p , according to Theorem 1 and Theorem 2 that CRDP-FL satisfies max ( ϵ 1 , m ϵ 2 + t ϵ 3 ) -differential privacy.

5.2. Correlation Analysis

Our proposed CS-VFL reduces the data correlation with the same number of features, thus improving the utility of differentially private VFL. Since Δ G S does not consider correlated degrees, the correlation calculated by Δ G S is higher than the weighted correlation Δ C S . Although, the number of correlation records K is the same for both methods. CS-VFL considers the correlated degree. Furthermore, it provides a more strict standard for the correlation threshold. In distributed datasets, the correlation of local datasets provides a priori knowledge for determining the correlation threshold of the merged dataset, so the selection of Δ C S p is more objective than that of Δ C S . While reflecting the global correlation trend, the correlation threshold of Δ C S p is more strict than Δ C S . Therefore, CS-VFL can reduce correlations more effectively.
Theorem 3. 
For query Q, the distributed correlated sensitivity Δ C S p is equal to or less than the correlated sensitivity Δ C S . Suppose Δ C S p and Δ C S are correlation analyses about the same set of data, and the same correlation matrix Δ is obtained. Since C D A v g > w 0 makes the number of correlated records k k p , then Δ C S p Δ C S .
Theorem 4. 
When the learning rate of CRDP-FL satisfies 0 < η < min { 2 2 R i = 1 N L i 2 + 3 L j 2 , 1 L } , then for any T 1 , the following constraint is satisfied constantly.
T τ = 0 T 1 E [ L ( Θ τ ) 2 ] 2 η T ( L ( Θ ( 0 ) ) L ( Θ * ) ) + 2 η 2 ( N + 3 ) R 2 n = 1 N L n 2 σ 2 S + 2 N σ 2 S

6. Evaluation

6.1. Experimental Setting

To verify the effectiveness of CRDP-FL, we designed comprehensive experiments to measure the impact of the number of different organizations, privacy budget, and feature selection threshold on model accuracy. By comparing with classical algorithms, we verify the advancement of CRDP-FL in model accuracy and data correlation, explore the number of organizations and the threshold value of feature selection, and draw corresponding conclusions. The experimental parameters are set as Table 1. The specific experimental settings are as follows.
  • Datasets. The common datasets in ML are selected for our experiments. We chose these two datasets because they contain enough feature spaces, which will simulate the federated learning scenarios of feature-partitioned distributed data more realistically. We divide vertically according to the feature space in the experiments. ISOLET [56,57] is a word-and-language phonetic recognition dataset that contains the features of the name of each letter of the English alphabet from 150 volunteers. We retained the records, totaling 7797. The audio data in each entry is quantified as 617 additional features, each of which is a numeric type. Because of its ample feature space, it is suitable for this experiment’s dataset. Breast Cancer [58] consists of 570 entries with 30 features, where M represents malignancy and B represents benignity. It is one of the most commonly used datasets in ML. This dataset contains cytological features of Breast Cancer biopsies that can be used to diagnose breast cancer.
  • Training Models. Both of our experiment datasets are textual data that belong to the classification task. Furthermore, the ISOLET dataset is more complex than Breast Cancer. Therefore, in the process of FL, the former uses the full connect neural network (DNN) model for training and inference, and the latter uses the logistic regression (LR) model to classify.
  • Comparison Algorithms. We verify the validity of CRDP-FL with different comparison algorithms, including training with a single-party dataset (single), non-private vertically federated learning (Non-private FL) [33], non-private vertically FL with feature selection (FS-FL) [33], and federated learning with a correlation analysis of global sensitivity by calculating the number of correlated records (GDP-FL) [59]. Among them, the training with a single-party dataset refers to non-federated learning. FS-FL is the algorithm that adds feature selection based on non-private vertically federated learning.

6.2. Different Parameter Effect on Accuracy

6.2.1. Organization Number VS Accuracy

This group of experiments reflects the influence of the number of data organizations on accuracy. For different datasets, vertically federated learning without feature selection and communication noise, the impact of the number of organizations on the model accuracy are shown in Figure 2a,b.
In Figure 2a,b, we obtain the highest accuracy when N = 1 , with only a single dataset. When N > 1 , along with the increase in N, the accuracy increases first and then decreases sharply. It can be seen through analysis that in the initial stage, due to too few organizations, the global model is sparse, resulting in a decline in accuracy. Subsequently, with the increase in the number of organizations, the accuracy will rise to a certain extent. However, when the number of organizations gets too large, the data features in a single organization become less, resulting in the scarcity of features shared by local models. At this stage, it is more difficult for local models to capture valuable information, resulting in a decline in accuracy. It can be concluded that in the decentralized FL framework, too few or too many organizations will reduce accuracy. In addition, when the number of organizations is too high, the training time will also rise drastically. Therefore, to balance model accuracy and training time, the following experiments set the number of organizations to 4 for ISOLET and 3 for Breast Cancer datasets. Because of the ample feature space of ISOLET, a more significant number of organizations can be set when configuring the experimental environment. However, when the number of tissue N reaches more than 150, the model’s performance decreases severely, and the training time rises sharply. Hence, we set the maximum number of organizations at 150.

6.2.2. Feature Selection Threshold

This group of experiments reflects the impact of feature selection thresholds on accuracy. We analyze the optimal range of feature selection thresholds through these experiments. As shown in Figure 3a,b, feature selection thresholds on different datasets affect the accuracy of the algorithms FS-FL, CRDP-FL, and GDP-FL. As the threshold of feature selection increases, all the algorithms experience a process of increasing and then decreasing in accuracy, with the highest accuracy reaching between 0.4 and 0.5. For FS-FL, too low a threshold causes uncorrelated features not to be removed entirely, and too high a threshold yields some critical features to be filtered out, resulting in a state of high, middle and low sides. In addition, for CRDP-FL and GDP-FL, when the threshold of feature selection is higher, the decrease in accuracy is much more significant than that of FS-FL. For the noise-adding algorithm, too high a feature selection threshold will increase correlation, thus introducing additional noise, resulting in a further decline in accuracy. Therefore, for CRDP-FL and similar algorithms, an appropriate feature selection threshold is necessary for satisfactory model accuracy.

6.3. Accuracy Evaluation

To verify the accuracy of our CRDP-FL, we set up a group of comparative experiments of model accuracy under different privacy budgets on the ISOLET and Breast Cancer datasets. In this group of experiments, we only consider the influence of the change of ϵ 3 on the accuracy. Because the sensitivity in the feature selection process is small, the amount of noise is tiny and will not significantly impact the accuracy. At the same time, ϵ 2 is an exponential mechanism, and there will be no noise injection.
The experimental results are shown in Figure 4a,b, and the performance of the non-private algorithms (Single, Non-private, and FS-FL) is better than the other two private comparison algorithms on different datasets. It is because privacy budgets ( ϵ 1 , ϵ 2 , and ϵ 3 ) do not affect Single, Non-private FL and FS-FL since they are only effective for noise introducing. Non-private and FS-FL followed closely, and the performance of the feature-selected algorithm FS-FL was better than that of the general Non-private algorithm. The results show that feature selection can optimize the utility of vertically FL. Among the private algorithms, CRDP-FL is significantly better than the algorithm GDP-FL under various privacy budget conditions. The correlated sensitivity calculated by the CRDP-FL is lower than the GDP-FL. Therefore, the feature selection by the CRDP-FL further improves the model’s accuracy. Therefore, our algorithm has significant advantages for utility improvement.
To further verify our proposed CS-VFL, we set up a group of experiments to compare the different definitions of correlated sensitivity on different datasets under different feature selection thresholds. We also analyze the change in the correlated sensitivity under different feature selection thresholds. As shown in Figure 5a,b, the correlated sensitivity of CRDP-FL is much smaller than GDP-FL on different datasets. In addition to the advantages of the correlation calculation mentioned for CRDP-FL, the candidate feature set added to the feature optimization process also plays a role. The feature optimization process selects a set of candidate feature sets with the lowest correlation to join the best feature set, which also plays a crucial role in reducing data correlation and sensitivity. In addition, as the threshold of feature selection increases, the feature space of data collected by each data organization becomes smaller, increasing data correlation and sensitivity. Thus, in the intermediate information communication stage, more noise is required, resulting in decreased data utility, so the feature selection threshold should not be too high.
According to the results of the above comprehensive experiment, we can draw the following conclusions. The number of data organizations and the feature selection threshold affect the accuracy, and we can determine the best precision setting through experiments. Regarding accuracy performance, CRDP-FL effectively improves the model accuracy of FL from two aspects. For multiple participants, feature selection is an effective means to improve effectiveness. Furthermore, while providing privacy, our method effectively improves the model accuracy through correlation analysis based on CS-VFL by reducing the additional loss of utility.

7. Conclusions

According to the ubiquitous vertically federated learning scenario, and to break data silos and solve the privacy problem in federated learning, we propose a vertically federated learning algorithm with correlated differential privacy in this paper. We design a decentralized two-layer vertically federated learning framework for different organizations with independent feature space and scenarios with multiple clients downstream of different organizations. A privacy-preserving strategy based on differential privacy is proposed to be integrated into the framework. Moreover, in view of the utility loss caused by large feature space and the possible additional utility loss of correlated data to the application of differential privacy technology in vertically federated scenarios, the solutions of feature selection, correlated sensitivity analysis, and feature optimization are proposed. For the differentially private federated learning, the improving correlation analysis technology presented here can reduce the correlated sensitivity and improve the model accuracy. Comprehensive experiments on ISOLET and Breast Cancer datasets show that feature selection of this method can improve model accuracy, especially for large datasets of the feature set.

Author Contributions

J.Z.: Conceptualization, Methodology. J.W.: Data curation, Software, Writing—Original draft preparation. Z.L.: Visualization, Software, Investigation. W.Y.: Software, Validation. S.M.: Conceptualization, Writing—Reviewing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant No. 62102074.

Institutional Review Board Statement

It is not applicable for the study since it is not involving humans or animals.

Data Availability Statement

The ISOLET dataset and Breast Cancer dataset used to support the findings of this study are available at http://www.doczj.com/doc/517e80d6b9f3f90f76c61b84.html (accessed on 15 May 2022) and http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.names (accessed on 15 May 2022).

Conflicts of Interest

The authors have no relevant financial or non-financial interests to disclose.

Abbreviations

The following abbreviations are used in this manuscript:
MLMachine learning
AIArtificial intelligence
FLFederated learning
VFLVertically federated learning
DPDifferential privacy
LDPLocal differential privacy
CRDPCorrelated differential privacy
CRDP-FLCorrelated differentially private federated learning
DP-SGDDifferentially private stochastic gradient descent algorithm
FedAvgFederated averaging
FedBCDFederated stochastic block coordinate descent
CS-VFLCorrelated sensitivity in VFL
GSGlobal Sensitivity
CSCorrelated Sensitivity

References

  1. Hatcher, W.G.; Yu, W. A survey of deep learning: Platforms, applications and emerging research trends. IEEE Access 2018, 6, 24411–24432. [Google Scholar] [CrossRef]
  2. Song, Y.; Cai, X.; Zhou, X.; Zhang, B.; Chen, H.; Li, Y.; Deng, W.; Deng, W. Dynamic hybrid mechanism-based differential evolution algorithm and its application. Expert Syst. Appl. 2023, 213, 118834. [Google Scholar] [CrossRef]
  3. Deng, W.; Zhang, L.; Zhou, X.; Zhou, Y.; Sun, Y.; Zhu, W.; Chen, H.; Deng, W.; Chen, H.; Zhao, H. Multi-strategy particle swarm and ant colony hybrid optimization for airport taxiway planning problem. Inf. Sci. 2022, 612, 576–593. [Google Scholar] [CrossRef]
  4. Xue, X.; Liu, W. Integrating heterogeneous ontologies in asian languages through compact genetic algorithm with annealing re-sample inheritance mechanism. Trans. Asian Low-Resour. Lang. Inf. Process. 2022. [Google Scholar] [CrossRef]
  5. Huang, C.; Zhou, X.; Ran, X.; Liu, Y.; Deng, W.; Deng, W. Co-evolutionary competitive swarm optimizer with three-phase for large-scale complex optimization problem. Inf. Sci. 2022, 619, 2–18. [Google Scholar] [CrossRef]
  6. Piper, D. Data protection laws of the world: Full handbook. DLA Piper 2017, 1, 1–50. [Google Scholar]
  7. General Data Protection Regulation. GDPR. 2019. Available online: Https://gdpr-info.eu (accessed on 15 May 2022).
  8. Nasr, M.; Shokri, R.; Houmansadr, A. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 739–753. [Google Scholar]
  9. Deng, W.; Xu, J.; Gao, X.; Zhao, H. An Enhanced MSIQDE Algorithm with Novel Multiple Strategies for Global Optimization Problems. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 1578–1587. [Google Scholar] [CrossRef]
  10. Ramu, S.P.; Boopalan, P.; Pham, Q.V.; Maddikunta, P.K.R.; Huynh-The, T.; Alazab, M.; Nguyen, T.T.; Gadekallu, T.R. Federated learning enabled digital twins for smart cities: Concepts, recent advances, and future directions. Sustain. Cities Soc. 2022, 79, 103663. [Google Scholar] [CrossRef]
  11. Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated Machine Learning: Concept and Applications. ACM Trans. Intell. Syst. Technol. 2019, 10, 1–19. [Google Scholar] [CrossRef]
  12. Alazab, M.; Priya, R.M.S.; Parimala, M.; Maddikunta, P.K.R.; Gadekallu, T.R.; Pham, Q. Federated Learning for Cybersecurity: Concepts, Challenges, and Future Directions. IEEE Trans. Ind. Inform. 2022, 18, 3501–3509. [Google Scholar] [CrossRef]
  13. Yang, S.; Ren, B.; Zhou, X.; Liu, L. Parallel distributed logistic regression for vertical federated learning without third-party coordinator. arXiv 2019, arXiv:1911.09824. [Google Scholar]
  14. Liu, Y.; Kang, Y.; Zhang, X.; Li, L.; Cheng, Y.; Chen, T.; Hong, M.; Yang, Q. A communication efficient collaborative learning framework for distributed features. arXiv 2019, arXiv:1912.11187. [Google Scholar] [CrossRef]
  15. Asad, M.; Moustafa, A.; Yu, C. A Critical Evaluation of Privacy and Security Threats in Federated Learning. Sensors 2020, 20, 7182. [Google Scholar] [CrossRef] [PubMed]
  16. Aono, Y.; Hayashi, T.; Wang, L.; Moriai, S.; Phong, L.T. Privacy-preserving deep learning via additively homomorphic encryption. IEEE Trans. Inf. Forensics Secur. 2017, 13, 1333–1345. [Google Scholar]
  17. Yuan, J.; Yu, S. Privacy preserving back-propagation neural network learning made practical with cloud computing. IEEE Trans. Parallel Distrib. Syst. 2013, 25, 212–221. [Google Scholar] [CrossRef]
  18. Riazi, M.S.; Weinert, C.; Tkachenko, O.; Songhori, E.M.; Schneider, T.; Koushanfar, F. Chameleon: A hybrid secure computation framework for machine learning applications. In Proceedings of the 2018 on Asia Conference on Computer and Communications Security, Incheon, Republic of Korea, 4–8 June 2018; pp. 707–721. [Google Scholar]
  19. Ouadrhiri, A.E.; Abdelhadi, A. Differential Privacy for Deep and Federated Learning: A Survey. IEEE Access 2022, 10, 22359–22380. [Google Scholar] [CrossRef]
  20. Cao, T.; Huu, T.T.; Tran, H.; Tran, K. A federated deep learning framework for privacy preservation and communication efficiency. J. Syst. Archit. 2022, 124, 102413. [Google Scholar] [CrossRef]
  21. Wang, T.; Zhang, X.; Feng, J.; Yang, X. A Comprehensive Survey on Local Differential Privacy toward Data Statistics and Analysis. Sensors 2020, 20, 7030. [Google Scholar] [CrossRef] [PubMed]
  22. Xiao, Y.; Xiong, L. Protecting locations with differential privacy under temporal correlations. In Proceedings of the Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, 12–16 October 2015; pp. 1298–1309. [Google Scholar]
  23. Lv, D.; Zhu, S. Achieving correlated differential privacy of big data publication. Comput. Secur. 2019, 82, 184–195. [Google Scholar] [CrossRef]
  24. Chen, R.; Fung, B.; Yu, P.S.; Desai, B.C. Correlated network data publication via differential privacy. VLDB J. 2014, 23, 653–676. [Google Scholar] [CrossRef]
  25. Zhu, T.; Xiong, P.; Li, G.; Zhou, W. Correlated differential privacy: Hiding information in non-IID data set. IEEE Trans. Inf. Forensics Secur. 2014, 10, 229–242. [Google Scholar]
  26. Yang, B.; Sato, I.; Nakagawa, H. Bayesian differential privacy on correlated data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, VIC, Australia, 31 May–4 June 2015; pp. 747–762. [Google Scholar]
  27. Lian, X.; Zhang, C.; Zhang, H.; Hsieh, C.J.; Zhang, W.; Liu, J. Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
  28. Muñoz-González, L.; Co, K.T.; Lupu, E.C. Byzantine-robust federated machine learning through adaptive model averaging. arXiv 2019, arXiv:1909.05125. [Google Scholar]
  29. Jiang, Z.; Balu, A.; Hegde, C.; Sarkar, S. Collaborative deep learning in fixed topology networks. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
  30. Daily, J.; Vishnu, A.; Siegel, C.; Warfel, T.; Amatya, V. Gossipgrad: Scalable deep learning using gossip communication based asynchronous gradient descent. arXiv 2018, arXiv:1803.05880. [Google Scholar]
  31. McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics. PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
  32. McMahan, H.B.; Moore, E.; Ramage, D.; y Arcas, B.A. Federated learning of deep networks using model averaging. arXiv 2016, arXiv:1602.05629. [Google Scholar]
  33. Das, A.; Patterson, S. Multi-tier federated learning for vertically partitioned data. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 3100–3104. [Google Scholar]
  34. Chaudhuri, K.; Monteleoni, C. Privacy-Preserving Logistic Regression. Available online: https://proceedings.neurips.cc/paper/2008/file/8065d07da4a77621450aa84fee5656d9-Paper.pdf (accessed on 15 May 2022).
  35. Mangasarian, O.L.; Wild, E.W.; Fung, G.M. Privacy-preserving classification of vertically partitioned data via random kernels. ACM Trans. Knowl. Discov. Data (TKDD) 2008, 2, 1–16. [Google Scholar] [CrossRef]
  36. Song, S.; Chaudhuri, K.; Sarwate, A.D. Stochastic gradient descent with differentially private updates. In Proceedings of the 2013 IEEE Global Conference on Signal and Information Processing, Austin, TX, USA, 3–5 December 2013; pp. 245–248. [Google Scholar]
  37. Truex, S.; Liu, L.; Chow, K.H.; Gursoy, M.E.; Wei, W. LDP-Fed: Federated learning with local differential privacy. In Proceedings of the Third ACM International Workshop on Edge Systems, Analytics and Networking, Heraklion, Greece, 27 April 2020. [Google Scholar] [CrossRef]
  38. Li, J.; Khodak, M.; Caldas, S.; Talwalkar, A. Differentially private meta-learning. arXiv 2019, arXiv:1909.05830. [Google Scholar]
  39. Wang, Y.; Tong, Y.; Shi, D. Federated latent Dirichlet allocation: A local differential privacy based framework. In Proceedings of the Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 6283–6290. [Google Scholar]
  40. Kifer, D.; Machanavajjhala, A. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, Athens, Greece, 12–16 June 2011; pp. 193–204. [Google Scholar]
  41. He, X.; Machanavajjhala, A.; Ding, B. Blowfish privacy: Tuning privacy-utility trade-offs using policies. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, Snowbird, UT, USA, 22–27 June 2014; pp. 1447–1458. [Google Scholar]
  42. Zhang, T.; Zhu, T.; Xiong, P.; Huo, H.; Tari, Z.; Zhou, W. Correlated differential privacy: Feature selection in machine learning. IEEE Trans. Ind. Inform. 2019, 16, 2115–2124. [Google Scholar] [CrossRef]
  43. Zhu, T.; Li, G.; Xiong, P.; Zhou, W. Answering differentially private queries for continual datasets release. Future Gener. Comput. Syst. 2018, 87, 816–827. [Google Scholar] [CrossRef]
  44. Chen, J.; Ma, H.; Zhao, D.; Liu, L. Correlated differential privacy protection for mobile crowdsensing. IEEE Trans. Big Data 2017, 7, 784–795. [Google Scholar] [CrossRef]
  45. Cao, Y.; Yoshikawa, M.; Xiao, Y.; Xiong, L. Quantifying differential privacy in continuous data release under temporal correlations. IEEE Trans. Knowl. Data Eng. 2018, 31, 1281–1295. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  46. Song, S.; Wang, Y.; Chaudhuri, K. Pufferfish privacy mechanisms for correlated data. In Proceedings of the 2017 ACM International Conference on Management of Data, Chicago, IL, USA, 14–19 May 2017; pp. 1291–1306. [Google Scholar]
  47. Wang, H.; Wang, H. Correlated tuple data release via differential privacy. Inf. Sci. 2021, 560, 347–369. [Google Scholar] [CrossRef]
  48. Wang, H.; Xu, Z.; Jia, S.; Xia, Y.; Zhang, X. Why current differential privacy schemes are inapplicable for correlated data publishing? World Wide Web 2021, 24, 1–23. [Google Scholar] [CrossRef]
  49. Ou, L.; Qin, Z.; Liao, S.; Hong, Y.; Jia, X. Releasing correlated trajectories: Towards high utility and optimal differential privacy. IEEE Trans. Dependable Secur. Comput. 2018, 17, 1109–1123. [Google Scholar] [CrossRef]
  50. Tang, P.; Chen, R.; Su, S.; Guo, S.; Ju, L.; Liu, G. Differentially Private Publication of Multi-Party Sequential Data. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece, 19–22 April 2021; pp. 145–156. [Google Scholar]
  51. Wu, X.; Dou, W.; Ni, Q. Game theory based privacy preserving analysis in correlated data publication. In Proceedings of the Australasian Computer Science Week Multiconference, Geelong, Australia, 31 January–3 February 2017; pp. 1–10. [Google Scholar]
  52. Zhao, J.Z.; Wang, X.W.; Mao, K.M.; Huang, C.X.; Su, Y.K.; Li, Y.C. Correlated Differential Privacy of Multiparty Data Release in Machine Learning. J. Comput. Sci. Technol. 2022, 37, 231–251. [Google Scholar] [CrossRef]
  53. Dwork, C. Differential Privacy: A Survey of Results. In Theory and Applications of Models of Computation. TAMC 2008; Agrawal, M., Du, D., Duan, Z., Li, A., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2021; Volume 4978. [Google Scholar] [CrossRef]
  54. Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating noise to sensitivity in private data analysis. In Proceedings of the Theory of Cryptography Conference; Springer: Berlin/Heidelberg, Germany, 2006; pp. 265–284. [Google Scholar]
  55. McSherry, F.; Talwar, K. Mechanism design via differential privacy. In Proceedings of the 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS’07), Providence, RI, USA, 21–23 October 2007; pp. 94–103. [Google Scholar]
  56. Fanty, M.; Cole, R. Spoken letter recognition. Adv. Neural Inf. Process. Syst. 1990, 3. [Google Scholar] [CrossRef]
  57. Dietterich, T.G.; Bakiri, G. Solving multiclass learning problems via error-correcting output codes. J. Artif. Intell. Res. 1994, 2, 263–286. [Google Scholar] [CrossRef] [Green Version]
  58. Dietterich, T.G.; Bakiri, G. Error-correcting output codes: A general method for improving multiclass inductive learning programs. In Proceedings of the AAAI. Citeseer, Anaheim, CA, USA, 14–19 July 1991; pp. 572–577. [Google Scholar]
  59. Xu, R.; Baracaldo, N.; Zhou, Y.; Anwar, A.; Joshi, J.; Ludwig, H. FedV: Privacy-Preserving Federated Learning over Vertically Partitioned Data. In Proceedings of the AISec@CCS 2021: Proceedings of the 14th ACM Workshop on Artificial Intelligence and Security, Virtual Event, Republic of Korea, 15 November 2021; Carlini, N., Demontis, A., Chen, Y., Eds.; ACM: New York, NY, USA, 2021; pp. 181–192. [Google Scholar] [CrossRef]
Figure 1. CRDP-FL Framework.
Figure 1. CRDP-FL Framework.
Electronics 11 03958 g001
Figure 2. The accuracy of the vertically federated learning varies with the number of organizations on different datasets.
Figure 2. The accuracy of the vertically federated learning varies with the number of organizations on different datasets.
Electronics 11 03958 g002
Figure 3. The accuracy of the algorithms varies with the feature selection thresholds on different datasets.
Figure 3. The accuracy of the algorithms varies with the feature selection thresholds on different datasets.
Electronics 11 03958 g003
Figure 4. Accuracy comparison of algorithms under different privacy budgets on different datasets.
Figure 4. Accuracy comparison of algorithms under different privacy budgets on different datasets.
Electronics 11 03958 g004
Figure 5. Correlated sensitivity comparison of the algorithms under different feature selection thresholds.
Figure 5. Correlated sensitivity comparison of the algorithms under different feature selection thresholds.
Electronics 11 03958 g005
Table 1. Experimental Parameters.
Table 1. Experimental Parameters.
ParametersSymbolsISOLETBreast Cancer
Epoch-2000500
Local Training EpochR1010
Size of Minibatch-256256
Learning Rate η 0.010.001
Number of OrganizationsN1/4/10–1501/3/5/7/9/11/13/15
Feature Selection Threshold θ 0.2/0.4/0.6/0.80.3/0.5/0.7
Linear Threshold θ p e r 0.90.9
Feature Selection Privacy Budget ϵ 1 11
Feature Optimization Privacy Budget ϵ 2 11
Communication Privacy Budget ϵ 3 0.2/0.4/0.6/0.8/10.2/0.4/0.6/0.8/1
Initial Sensitivity Threshold w 0 0.40.4
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhao, J.; Wang, J.; Li, Z.; Yuan, W.; Matwin, S. Vertically Federated Learning with Correlated Differential Privacy. Electronics 2022, 11, 3958. https://doi.org/10.3390/electronics11233958

AMA Style

Zhao J, Wang J, Li Z, Yuan W, Matwin S. Vertically Federated Learning with Correlated Differential Privacy. Electronics. 2022; 11(23):3958. https://doi.org/10.3390/electronics11233958

Chicago/Turabian Style

Zhao, Jianzhe, Jiayi Wang, Zhaocheng Li, Weiting Yuan, and Stan Matwin. 2022. "Vertically Federated Learning with Correlated Differential Privacy" Electronics 11, no. 23: 3958. https://doi.org/10.3390/electronics11233958

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop