AWDP-FL: An Adaptive Differential Privacy Federated Learning Framework

Chen, Zhiyan; Zheng, Hong; Liu, Gang

doi:10.3390/electronics13193959

Open AccessArticle

AWDP-FL: An Adaptive Differential Privacy Federated Learning Framework

by

Zhiyan Chen

,

Hong Zheng

^* and

Gang Liu

School of Computer Science and Engineering, Changchun University of Technology, Changchun 130022, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(19), 3959; https://doi.org/10.3390/electronics13193959

Submission received: 15 August 2024 / Revised: 23 September 2024 / Accepted: 7 October 2024 / Published: 8 October 2024

(This article belongs to the Special Issue AI for Edge Computing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Data security and user privacy concerns are receiving increasing attention. Federated learning models based on differential privacy offer a distributed machine learning framework that protects data privacy. However, the noise introduced by the differential privacy mechanism may affect the model’s usability, especially when reasonable gradient clipping is absent. Fluctuations in the gradients can lead to issues like gradient explosion, compromising training stability and potentially leaking privacy. Therefore, gradient clipping has become a crucial method for protecting both model performance and data privacy. To balance privacy protection and model performance, we propose the Adaptive Weight-Based Differential Privacy Federated Learning (AWDP-FL) framework, which processes model gradient parameters at the neural network layer level. First, by designing and recording the change trends of two-layer historical gradient sequences, we analyze and predict gradient variations in the current iteration and calculate the corresponding weight values. Then, based on these weights, we perform adaptive gradient clipping for each data point in each training batch, which is followed by gradient momentum updates based on the third moment. Before uploading the parameters, Gaussian noise is added to protect privacy while maintaining model accuracy. Theoretical analysis and experimental results validate the effectiveness of this framework under strong privacy constraints.

Keywords:

federated learning; differential privacy; adaptive weight clipping; privacy preserving

1. Introduction

Federated learning, as a distributed machine learning technique, successfully addresses the issue of data silos, allowing institutions to collaborate across regions without the need to transfer data, improving the accuracy of predictive models [1,2,3]. Researchers have significantly enhanced data privacy protection by integrating various privacy-preserving techniques, including secure multi-party computation, homomorphic encryption, and differential privacy. Ma et al. proposed a federated learning method based on multi-key homomorphic encryption, which effectively protects privacy and reduces computational costs. Park et al. employed homomorphic encryption to directly encrypt model parameters, enabling the central server to compute on encrypted data without decrypting it. However, despite these techniques achieving privacy protection through “computable but not visible” methods, they also introduce significant computational and communication overhead [4,5,6,7,8,9,10,11,12]. In particular, secure multi-party computation relies on complex communication protocols, while homomorphic encryption requires extensive encryption operations.

In contrast, differential privacy, with its simplicity and strong privacy protection capabilities, has become an important research direction in federated learning. In particular, Local Differential Privacy (LDP), which adds noise locally without relying on the trustworthiness of a central server, offers higher security and can prevent privacy leaks on the server side. However, LDP also faces some challenges, especially in terms of gradient clipping. The role of gradient clipping is to control the magnitude of gradients, prevent abnormal gradients from affecting global model updates, and avoid excessive gradients from amplifying the noise. Although each client in LDP independently implements privacy protection measures and needs to clip gradients before transmission to control the noise effect, excessive clipping may lead to degraded model performance and affect the convergence of the global model. Therefore, balancing the strength of gradient clipping to ensure privacy protection while maintaining model performance is a key challenge in LDP.

In addition, federated learning faces the challenge of communication costs, particularly in Internet of Things environments, where communication between devices is costly [13,14,15]. Federated learning relies on frequent parameter exchanges between clients and servers, and this frequent communication is especially significant in resource-constrained devices. Although differential privacy technology enhances privacy protection, it also increases communication overhead and training time. Therefore, effectively reducing communication costs while preserving privacy remains one of the key challenges in federated learning.

In response to the challenges in the aforementioned FL models, the contributions of this paper are as follows:

We propose an Adaptive Weight-Based Differential Privacy Federated Learning (AWDP-FL) framework aimed at reducing communication costs during the federated learning process while safeguarding privacy and maintaining model accuracy. The two core improvements of this framework are reflected in gradient clipping and gradient updating. First, we introduce an adaptive gradient clipping method, incorporating two-layer historical gradient sequences to record and analyze trends in historical gradient changes. This enables dynamic weight coefficient calculations and adaptive gradient clipping for each data point in each batch, better controlling gradient fluctuations and reducing unnecessary communication overhead. Second, we propose an adaptive momentum update strategy based on the third moment, further optimizing model training to enhance convergence speed while ensuring privacy protection. To further enhance privacy protection, dynamic Gaussian noise is added when uploading parameters, ensuring the security of the transmission.
We conducted multi-scenario experiments on the MNIST, Fashion-MNIST, and CIFAR-10 datasets, analyzing the algorithm from multiple dimensions, including external comparisons with other advanced algorithms and internal comparisons by adjusting local parameters to explore the effectiveness of the algorithm. In addition, we provide the source code used in the experiments.

2. Related Work

•: Differential Privacy Protection in Federated Learning

Local Differential Privacy (LDP) prevents privacy leakage by adding noise locally. The LDPFL algorithm proposed by Chamikapa et al. [16] reduces the risk of information leakage by increasing the randomness of input data while maintaining high model accuracy. The LDP-Fed algorithm developed by Truexs et al. [17] allows users to customize privacy budgets based on local privacy needs. However, the introduction of noise inevitably reduces model accuracy, especially in high-dimensional data and complex model scenarios. Sun et al. [18] proposed an adaptive weight parameter setting and data perturbation strategy, which improves model performance by addressing weight differences at various levels in deep learning models. The FedSGD method designed by Zhao et al. [19] improves model accuracy by perturbing gradients, but this comes at the cost of increased communication overhead.

The main challenges in federated learning include the following: (1) gradient clipping is required after adding noise, and improper clipping strategies can negatively affect model performance and convergence; (2) adding noise also increases communication overhead, especially in bandwidth-limited environments. Therefore, finding a balance between privacy protection and efficiency remains a pressing issue. Adaptive clipping strategies have shown clear advantages by dynamically adjusting the clipping threshold, significantly improving model accuracy and performance. Liu et al. [20] proposed an adaptive gradient clipping algorithm that optimizes model performance by dynamically adjusting the clipping threshold. Although it has achieved some success, the method is sensitive to parameter settings; improper settings may lead to excessive clipping or insufficient noise injection. The PEDPFL algorithm by Shen et al. [21] introduced regularization techniques to enhance model robustness, but the algorithm’s complexity may increase the computational burden during practical deployment.

In terms of improving communication efficiency, Liu et al. [22] developed the APFL algorithm, which assesses data contribution to model output via correlation propagation and injects noise based on these contributions, reducing the negative impact of noise on performance. However, this algorithm may increase computational complexity and extend training time when handling high-dimensional data. Wu et al. [23] improved the efficiency of client-side local gradient descent, but when there are significant differences in data distribution across clients, the stability of model training may be affected. The adaptive clipping framework proposed by Wang Fangwei et al. [24] improves model performance by dynamically adjusting the gradient clipping threshold. However, this method requires the precise tuning of clipping parameters in practical applications, or it may negatively impact convergence speed. Zhao et al. [25] combined self-sampling with an adaptive perturbation mechanism to effectively implement local differential privacy protection, but this increased communication costs. Additionally, Hu et al. [26] proposed the Fed-SMP algorithm, which enhances communication efficiency through model sparsification techniques. However, its impact is less significant in models with lower sparsity. The hierarchical federated learning method proposed by Lian et al. [27] reduces communication overhead by analyzing model correlations, but its generalizability is limited, and its applicability is restricted. Baek et al. [28] designed a robust differential privacy mechanism to address user dropout issues, but exhaustion of the privacy budget remains a potential risk in cases of large-scale user dropout.

Although these methods have made some progress in enhancing privacy protection and performance in federated learning, many limitations remain. This paper proposes the AWDP-FL framework, which combines adaptive clipping strategies with federated learning communication challenges to better address privacy protection issues. Its superiority over other algorithms is accurately analyzed through experiments, as detailed in Table 1.

3. Preliminary

In this section, we will introduce the background knowledge relevant to this study. First, Section 3.1 will provide a detailed introduction to the federated learning (FL) model, helping readers gain a comprehensive understanding of the entire FL process. Next, Section 3.2 will delve into the concept of differential privacy and its application in FL models, introducing some key definitions commonly used in practical applications.

3.1. Federated Learning

The development of artificial intelligence has become a major trend, and the demand for high-quality data is increasing. These data often hold significant value, leading organizations and companies to be cautious about data sharing, which exacerbates the phenomenon of data silos. To address this issue, federated learning technology has emerged, allowing multiple participants to collaboratively build models without directly sharing data. Based on the distribution of participants’ data, federated learning can be divided into three types: horizontal federated learning, vertical federated learning, and federated transfer learning [29]. Among these, horizontal federated learning is suitable for participants with different data samples but similar features, while vertical federated learning is applicable when the participants have similar data samples but contain different features. By sharing only model parameters and other information, and collaboratively building models with the assistance of a central server, this approach reduces the risk of data leakage while maintaining model performance comparable to centralized machine learning. Federated learning primarily consists of a central server and N clients, denoted as

H = H_{1}, H_{2}, \dots, H_{n}

, where each client

H_{i}

holds its dataset

M_{i}

. The data samples from all clients are represented as M, satisfying the condition

\sum_{i = 1}^{N} M_{i} = | M |

. The global model parameters are obtained by weighted aggregation across all clients.

θ = \frac{1}{N} \sum_{i = 1}^{N} θ_{i}

(1)

The optimization problem in federated learning is as follows:

\min L (M, θ)

, where the goal is to solve for the loss function and the minimal model parameters. In this paper, the loss function used is the cross-entropy loss function, which provides a finer-grained performance evaluation.

min_{θ} L (M, θ) = \frac{1}{N} \sum_{i = 1}^{N} L (M_{i}, θ_{i})

(2)

3.2. Differential Privacy

The fundamental principle of differential privacy [30,31] is to add specific noise perturbations to the original data, ensuring that changes in a single sample do not significantly affect the distribution of the output results, protecting the sensitivity of individual data. By adding noise to datasets containing personal sensitive information, differential privacy effectively reduces the sensitivity of the data, transforming it into a more generalized dataset. Even if an attacker obtains certain data information, they cannot accurately identify the individual to whom the information belongs from the query results, protecting personal privacy. After adopting differential privacy techniques, data remain usable while ensuring privacy protection, effectively balancing the need for data protection and application.

In the application of differential privacy in federated learning, correctly setting the gradient clipping threshold is key to balancing model performance and privacy protection. If the clipping threshold is set too high, more noise may need to be added, which could reduce the model’s accuracy; if the threshold is set too low, too much information may be clipped, affecting the effectiveness of the gradients. Gradient clipping can be considered from two main aspects: first is value-based clipping, where all gradient values exceeding a fixed threshold are clipped to that threshold, precisely controlling the maximum values of all gradients in the model to ensure they do not exceed the set range. However, this method does not account for each gradient component, and when gradients are generally large, it may lead to the loss of critical gradient information. The second method is norm-based clipping, which adjusts the proportion of the entire gradient vector to meet the norm constraints, preserving the direction of the original gradient vector and helping retain more information about the optimization direction. This method is currently more mainstream, but there is still room for optimization in practical applications. This paper will explore this area in depth, aiming to further optimize gradient clipping methods to better balance privacy protection and model performance.

Definition 1

(Adjacent Datasets). Given two datasets, M and

M_{1}

, when the distance between M and

M_{1}

is 1, they are referred to as adjacent datasets.

∥ M - M_{1} ∥_{2} = 1

(3)

The distance is defined as the sum of the absolute differences for each pair of records between the two datasets. If a corresponding record is missing in one dataset during the calculation of differences, 0 is used in place of the missing value, indicating that the two datasets are very similar.

Definition 2

(

(ε, δ)

-Differential Privacy). Given a random algorithm R, if for any output set S, the following inequality holds for the adjacent datasets M and

M_{1}

, then algorithm R is said to satisfy

(ϵ, δ)

-differential privacy:

Pr [R (M) \in S] \leq Pr [R (M_{1}) \in S] \cdot e^{ε} + δ

(4)

Here,

\Pr [R (M) \in S]

denotes the probability that algorithm R, when applied to dataset M, produces an output that falls within set S. The privacy budget

ϵ

controls the ratio of output probabilities between adjacent datasets. The smaller the

ϵ

, the closer the output probabilities of the two datasets, which indicates stronger privacy protection, making it more difficult for an attacker to distinguish which dataset the data came from. The parameter

δ

defines the probability of violating

(ϵ)

-differential privacy, providing flexibility in privacy protection. When

δ = 0

, the algorithm satisfies

(ϵ)

-differential privacy, representing a stricter privacy guarantee standard.

Definition 3

(Sensitivity). The sensitivity of a function describes the maximum possible change in the function’s output when a data point is added or removed from the dataset, measuring the function’s responsiveness to changes in a single data item. For two datasets, M and

M_{1}

, differing by only one element, the sensitivity is determined by calculating the maximum possible difference in the output of function R between these two datasets, specifically,

Δ s = max_{M, M_{1}} {∥ R (M) - R (M_{1}) ∥}_{2}

(5)

In the Gaussian mechanism, if the

L_{2}

sensitivity of function R is

Δ s

(measured using Euclidean distance),

(ϵ, δ)

-differential privacy is achieved by adding Gaussian-distributed random noise with mean 0 and variance

σ^{2}

to the result.

Definition 4

(Composition Theorem). Suppose there is a set of algorithms

R_{1} (M), R_{2} (M), \dots, R_{n} (M)

, each satisfying

(ϵ, δ)

-differential privacy. When combined into a new algorithm

R_{\forall} = (R_{1} (M), R_{2} (M), \dots, R_{n} (M))

, the entire composite algorithm

R_{\forall}

will also satisfy

(ϵ, δ)

-differential privacy. The privacy of individual data points in the composite algorithm remains protected with the combination possibly being serial, parallel, etc.

Definition 5

(Post-processing Immunity [30]). If algorithm R satisfies

(ϵ, δ)

-differential privacy for the same dataset M, then even if another random algorithm A (which may not follow differential privacy principles) forms a new algorithm

B = A (R (M))

by applying A to the output of R, the new algorithm B will still satisfy

(ϵ, δ)

-differential privacy.

4. Weighted Adaptive Differential Privacy Federated Learning Framework

To protect the privacy of client data in federated learning, this paper combines an adaptive weight clipping threshold selection strategy to design a novel weighted adaptive differential privacy federated learning framework (Algorithm 1). Unlike traditional federated learning frameworks, this framework introduces improvements in the way model gradients are handled with all client operations being performed from the neural network layer perspective. During each local training iteration, clients first calculate the threshold for adaptive gradient clipping using weight coefficients and then apply adaptive gradient clipping to limit the magnitude of the gradients, which is followed by adaptive gradient updates to the model, and they finally add dynamic Gaussian noise when uploading the parameters. This federated learning framework emphasizes an optimal balance between data privacy protection and model performance during communication, offering new perspectives and methods for privacy protection in federated learning. The communication process between the client and server is illustrated in Figure 1.

Algorithm 1 Weighted Adaptive Differential Privacy Federated Learning Framework

Input: Initial model parameters $θ_{t}$ , learning rate $α$ , client set O, randomly selected clients for training H, communication rounds between client and server T, number of neural network layers L, batch size B, client data M, adaptive clipping history norm sequence P, number of local iterations E, momentum update parameters $τ_{1}^{t}$ , $τ_{2}^{t}$ , $ρ$ .
Output: Model parameters ${\tilde{θ}}_{t}$
  1:
for $t = 1$ to T do
  2:
        # Client Side
  3:
        for $h = 1$ to H do
  4:
              Init $θ_{t} \leftarrow {\tilde{θ}}_{t}$ // Broadcast global parameters to update the local model
  5:
              for $e = 1$ to E do
  6:
                     for $B_{i, h} \in M_{h}$ do // i is an integer from 1 to $⌈ M / B ⌉$
  7:
                            $G_{t, l} \leftarrow g_{t} (x_{t})$
  8:
                            ${\tilde{G}}_{t, l} \leftarrow WDP (G_{t, l})$
  9:
                            $θ_{t} \leftarrow AGU ({\tilde{G}}_{t, l})$
10:
                     end for
11:
              end for
12:
               ${\tilde{θ}}_{t}^{h} \leftarrow θ_{t} + N (0, σ^{2})$ // Noise perturbation
13:
        end for
14:
        # Server Side
15:
         ${\tilde{θ}}_{t} \leftarrow \frac{1}{| H |} \sum_{h = 1}^{H} {\tilde{θ}}_{t}^{h}$ // Server-side parameter aggregation
16:
end for

The relevant symbols and parameters involved in this paper are shown in Table 2.

4.1. Workflow of the Framework

The framework consists of four main steps:

Local Model Training: Clients must perform multiple gradient descents locally to reduce communication with the server. First, in each local iteration, the adaptive gradient clipping WDP (Algorithm 2), based on model weights, is used for gradient clipping, which enhances the model’s robustness. Next, the adaptive gradient update AGU (Algorithm 3) is employed to determine the result of the model training in this iteration. Notably, this algorithm does not need to consider the training results of other clients globally; clients operate independently without affecting each other. Detailed steps will be presented in the next section.
Parameter Upload: To protect client privacy, hierarchical noise perturbation must be added to the model parameters before uploading. The model parameters processed by the WDP and AGU algorithms can reduce the noise impact caused by individual samples. This stability helps to smooth the loss function descent, accelerating convergence while minimizing the effect of noise on model parameters. The noise addition steps are as follows (using a single client as an example):

$\begin{matrix} ({\tilde{θ}}_{t}^{h}) & = {({\tilde{θ}}_{t, l 1}^{h}), ({\tilde{θ}}_{t, l 2}^{h}), \dots, ({\tilde{θ}}_{t, l n}^{h})} ∋ \\ θ_{t, l} - α \cdot \frac{{\hat{V}}_{t, l, d, e}^{1}}{ρ + \sqrt[3]{{\hat{V}}_{t, l, d, e}^{3}}} (d = | M |, e = | E |) + N (0, σ_{l}^{t^{2}}) \\ = θ_{t, l} - α \cdot \sum_{e = 1}^{E} \frac{{\hat{V}}_{t, l, d, e}^{1}}{ρ + \sqrt[3]{{\hat{V}}_{t, l, d, e}^{3}}} (d \in M) \\ = θ_{t, l} - α \cdot \sum_{e = 1}^{E} \sum_{d = 1}^{⌈ M / B ⌉} \frac{{\hat{V}}_{(t, l, d, e)}^{1}}{ρ + \sqrt[3]{{\hat{V}}_{t, l, d, e}^{3}}} (d = B) \\ = θ_{t, l} - \frac{α}{| B |} \cdot \sum_{e = 1}^{E} \sum_{d = 1}^{⌈ M / B ⌉} \sum_{b = 1}^{B} \frac{{\hat{V}}_{t, l, d, e}^{1}}{ρ + \sqrt[3]{{\hat{V}}_{t, l, d, e}^{3}}} \end{matrix}$

(6)

Here, L represents the neural network layer level, which serves as the basis for all operations in this paper. $({\tilde{θ}}_{t}^{h})$ denotes the noise-perturbed model parameters, $θ_{(t, l)}$ are the unperturbed model parameters, M represents the client data, B are the training data for each batch, t is the current global round, d represents the current training data, and e is the current batch’s local iteration. ${\hat{V}}_{(t, l, d, e)}^{1} / (ρ + \sqrt[3]{{\hat{V}}_{(t, l, d, e)}^{3}})$ represents the model gradient of the l-th layer in the t-th round after the adaptive momentum update. $N {(0, σ^{2})}_{l}^{t}$ is the noise perturbation added before uploading, which satisfies $N {(0, {(Δ s σ)}^{2})}_{l}^{t}$ , where $Δ s_{l}^{t}$ is the sensitivity determined by the adaptive clipping threshold.
Global Parameter Aggregation: After the local training of sampled clients is completed, the server performs weighted aggregation of the noise-perturbed parameters uploaded by the clients to update the global model for the next round, specifically as follows:

${\tilde{θ}}_{t} = \sum_{h = 1}^{H} {\tilde{θ}}_{t}^{h}$

(7)
Global Parameter Distribution: In the federated learning framework, the server employs a selective parameter broadcasting strategy. Specifically, the server does not broadcast the latest model parameters to all clients every time; instead, it randomly selects a portion of clients for parameter updates. This approach avoids redundant broadcasting and reduces unnecessary communication overhead. During this process, the server does not need direct access to the clients’ local data. After receiving the global model from the server, clients adaptively update their local models based on these parameters, maintaining the efficiency of model updates.

Client noise perturbation depends on sensitivity, sensitivity depends on the determination of the clipping threshold, and the clipping threshold depends on adaptive weight selection. Gradient clipping ensures that the model norm remains within a certain range. Specifically,

L_{t, l} = {∥ g_{t} (x_{t}) ∥}_{2} \leq {Clip}_{t, l}

(8)

Thus, the sensitivity calculation is as shown in Lemma 1:

Lemma 1.

{\tilde{θ}}_{t}^{h} (d = | M |)

represents the model parameters of client h after local training on a dataset M of size

| M |

. Therefore, the sensitivity is

Δ s_{l}^{t} = ∥{\tilde{θ}}_{t}^{h} (d = | M |) - {\tilde{θ}}_{t}^{h} (d = | M^{'} |)∥ \leq 2 α {Clip}_{t, l} \frac{E}{B}

(9)

Proof.

{\tilde{θ}}_{t}^{h} (d = | M |)

represents the model parameters obtained after E local iterations from

θ_{t}^{h} (d = | M |)

.

| M |

and

{| M |}^{'}

are adjacent datasets of client h with their only difference being in batch B from

| M |

and batch

B^{'}

from

{| M |}^{'}

. Therefore, the standard deviation of the hierarchical Gaussian noise to be added before uploading the model parameters can be calculated as follows:

\begin{matrix} Δ s_{l}^{t} & = {∥{\tilde{θ}}_{t}^{h} (d = | M |) - {\tilde{θ}}_{t}^{h} (d = | M^{'} |)∥}_{2} \\ = {∥θ_{t, l} - \frac{α}{| B |} \cdot \sum_{e = 1}^{E} \sum_{d = 1}^{⌈ M / B ⌉} \sum_{b = 1}^{B} \frac{{\hat{V}}_{(t, l, d, e)}^{1}}{ρ + \sqrt[3]{{\hat{V}}_{(t, l, d, e)}^{3}}} - θ_{t, l} - \frac{α}{| B |} \cdot \sum_{e = 1}^{E} \sum_{d = 1}^{⌈ M^{'} / B ⌉} \sum_{b = 1}^{B} \frac{{\hat{V}}_{(t, l, d, e)}^{1}}{ρ + \sqrt[3]{{\hat{V}}_{(t, l, d, e)}^{3}}}∥}_{2} \\ = \frac{α}{| B |} {∥\sum_{e = 1}^{E} (\sum_{d = 1}^{⌈ M / B ⌉} \sum_{b = 1}^{B} \frac{{\hat{V}}_{(t, l, d, e)}^{1}}{ρ + \sqrt[3]{{\hat{V}}_{(t, l, d, e)}^{3}}} - \sum_{d = 1}^{⌈ M^{'} / B ⌉} \sum_{b = 1}^{B} \frac{{\hat{V}}_{(t, l, d, e)}^{1}}{ρ + \sqrt[3]{{\hat{V}}_{(t, l, d, e)}^{3}}})∥}_{2} \\ = \frac{α}{| B |} \sum_{e = 1}^{E} {∥\frac{{\hat{V}}_{(t, l, d, e)}^{1}}{ρ + \sqrt[3]{{\hat{V}}_{(t, l, d, e)}^{3}}} (d = | M |) - \frac{{\hat{V}}_{(t, l, d, e)}^{1}}{ρ + \sqrt[3]{{\hat{V}}_{(t, l, d, e)}^{3}}} (d = | M^{'} |)∥}_{2} \\ \leq \frac{α}{| B |} (\sum_{e = 1}^{E} {∥\frac{{\hat{V}}_{(t, l, d, e)}^{1}}{ρ + \sqrt[3]{{\hat{V}}_{(t, l, d, e)}^{3}}} (d = | M |)∥}_{2} + \sum_{e = 1}^{E} {∥\frac{{\hat{V}}_{(t, l, d, e)}^{1}}{ρ + \sqrt[3]{{\hat{V}}_{(t, l, d, e)}^{3}}} (d = | M^{'} |)∥}_{2}) \\ \leq \frac{2 α {Clip}_{t, l} E}{B} \end{matrix}

(10)

□

Algorithm 2 Weight-Based Adaptive Gradient Clipping (WDP)

Input: Unupdated model parameters after training $θ_{t}$ , number of neural network layers L, communication rounds between client and server T, batch size B, client data M, adaptive clipping history norm sequence P.
Output: Processed model parameters ${\tilde{θ}}_{t}$ .
  1:
$\bar{G_{t, l}} = \frac{1}{| B |} \sum_{t = 1}^{B} G_{t, l}$
  2:
$\bar{{GB}_{t, l}} = \frac{1}{| m |} \sum_{m = 1}^{n} \bar{G_{B_{m}, l}}$ (m ≥ 2)
  3:
# Compute layer-specific weight coefficients
  4:
${GW}_{t, l} = cos (\bar{G_{t, l}}, \bar{{GB}_{t, l}}) = \frac{\bar{G_{t, l}} \cdot \bar{{GB}_{t, l}}}{∥ \bar{G_{t, l}} ∥ \cdot ∥ \bar{{GB}_{t, l}} ∥}$
  5:
# Compute weighted norm
  6:
${realL}_{t, l} = \frac{({max}_{0 \leq t \leq B, 0 \leq l \leq L} L_{t, l} - \frac{1}{| B |} \sum_{t = 1}^{B} L_{t, l})}{2 \cdot {GW}_{t, l}} + \frac{1}{| B |} \sum_{t = 1}^{B} L_{t, l}$
  7:
# Compute clipping threshold:
  8:
${Clip}_{t, l} = {L L}_{P} = {{realL}_{1, l}, {realL}_{(2, l)}, {realL}_{(3, l)}, \dots, {realL}_{(M, l)}}_{P}$
  9:
# Layer-specific gradient clipping
10:
${\tilde{G}}_{t, l} = \frac{1}{| B |} \sum_{t = 1}^{B} G_{t, l} / max (1, \frac{L_{t, l}}{{Clip}_{t, l}})$

Algorithm 3 Adaptive Gradient Update (AGU)

Input: Adaptively clipped model parameters ${\tilde{G}}_{t, l}$ , learning rate $α$ , communication rounds between client and server T, number of neural network layers L, momentum update parameters $τ_{1}^{t}$ , $τ_{2}^{t}$ , $ρ$ .
Output: Model parameters ${\tilde{θ}}_{t}$ .
1:
# Gradient Update
2:
$V_{t}^{1} \leftarrow τ_{1}^{t} \cdot V_{t - 1}^{1} + (1 - τ_{1}^{t}) \cdot {\tilde{G}}_{t, l}$ // Moment Estimation
3:
$V_{t}^{3} \leftarrow τ_{3}^{t} \cdot V_{t - 1}^{3} + (1 - τ_{3}^{t}) \cdot {\tilde{G}}_{t, l}^{3}$
4:
$\hat{V_{t}^{1}} \leftarrow \frac{V_{t}^{1}}{1 - τ_{1}^{t}}$ // Corrected Moment Estimation
5:
$\hat{V_{t}^{3}} \leftarrow \frac{V_{t}^{3}}{1 - τ_{3}^{t}}$
6:
$θ_{t} \leftarrow θ_{t - 1} - α \cdot \frac{\hat{V_{t}^{1}}}{ρ + \sqrt[3]{\hat{V_{t}^{3}}}}$ // Adaptive Gradient Update

4.2. Weight-Based Adaptive Gradient Clipping and Update

In traditional methods, the gradient clipping threshold is usually determined based on the researchers’ experience and pretraining on public datasets. This fixed threshold approach aims to ensure model convergence and performance but fails to account for differences between various layers of the neural network, which can negatively impact model performance in certain cases. With technological advancements, a new approach has been developed that dynamically adjusts the clipping threshold using the

L_{2}

norm percentile of historical gradients, allowing for more flexible responses to gradient changes. However, despite this method’s increased flexibility in threshold adjustment, it may not effectively address situations where model training performs poorly, leading to wasted computational resources during the update process and a lack of personalized settings for different network layers. To address this issue, this paper proposes a weight-based adaptive gradient clipping and update algorithm with the detailed implementation steps as follows:

A.: Layer-wise Adaptive Weight Coefficients

Calculate the gradient and

L_{2}

norm for each neural network layer during each iteration, where

l \in {1, 2, 3, \dots, L}

, and L is the number of neural network layers.

G_{t, l} = g_{t} (x_{t})

(11)

L_{t, l} = {∥ g_{t} (x_{t}) ∥}_{2}

(12)

Compute the mean of all gradients for each layer during each iteration as the preprocessing weight parameter

τ_{1}

.

\bar{G_{t, l}} = \frac{1}{| B |} \sum_{t = 1}^{B} G_{t, l}

(13)

Add this value layer-wise to a sequence, generating a new weight preprocessing historical gradient sequence

G B

.

G B = {\bar{G_{B_{1}, l}}, \bar{G_{B_{2}, l}}, \bar{G_{B_{3}, l}}, \dots, \bar{G_{B_{m}, l}}}

(14)

Calculate the mean of

G B

layer-wise as the preprocessing weight parameter

τ_{2}

.

\bar{G B_{t, l}} = \frac{1}{| m |} \sum_{m = 1}^{n} \bar{G_{B_{m}, l}} (m \geq 2)

(15)

Use the local weight parameter

τ_{1}

and the global weight parameter

τ_{2}

to calculate the weight coefficient

G W_{B, l}

layer-wise according to the following formula.

G W_{t, l} = cos (\bar{G_{t, l}}, \bar{G B_{t, l}}) = \frac{\bar{G_{t, l}} \cdot \bar{G B_{t, l}}}{∥ \bar{G_{t, l}} ∥ ∥ \bar{G B_{t, l}} ∥}

(16)

B.: Adaptive Clipping Threshold Selection

In this framework, a high cosine similarity between two updates indicates that the direction of gradient updates remains consistent, suggesting a stable training process. Conversely, low or negative similarity may indicate issues such as gradient explosion or vanishing gradients. Here, B represents the batch size, and M is the number of training data points in the client.

Weight the norm of each iteration according to the layer-wise weight coefficient

G W_{B, l}

.

real L_{t, l} = (max_{0 \leq t \leq B, 0 \leq l \leq L} L_{t, l} - \frac{1}{| B |} \sum_{t = 1}^{B} L_{t, l}) / 2 \cdot G W_{t, l} + \frac{1}{| B |} \sum_{t = 1}^{B} L_{t, l}

(17)

Add the weighted gradient norms of this batch to a sequence layer-wise, resulting in the threshold processing historical norm list

L L

.

L L = {real L_{1, l}, real L_{2, l}, real L_{3, l}, \dots, real L_{M, l}}

(18)

Calculate the percentile P of the

L L

sequence layer-wise as the clipping threshold for this iteration.

{Clip}_{t, l} = {L L}_{P} = {real L_{1, l}, real L_{2, l}, real L_{3, l}, \dots, real L_{M, l}}_{P}

(19)

Perform layer-wise gradient clipping on each data point in this iteration.

real G_{t, l} = \frac{G_{t, l}}{max (1, L_{t, l} / {Clip}_{t, l})}

(20)

Take the mean layer-wise as the gradient for this batch iteration, preparing for the gradient update.

\tilde{G_{t, l}} = \frac{1}{| B |} \sum_{t = 1}^{B} real G_{t, l}

(21)

C.: Adaptive Momentum Gradient Updates

Initialize the first-order moment

V_{t}^{1}

and third-order moment

V_{t}^{3}

based on historical gradients, and define momentum decay parameters

τ_{1}^{t}

,

τ_{3}^{t}

, and hyperparameter

ρ

. The third-order moment measures the symmetry of the data distribution; if the data distribution is symmetric, the skewness is zero. If the distribution is skewed to the left or right, the skewness is non-zero. Using the first-order and third-order moments based on historical gradients allows the parameter update to accumulate past gradient directions, helping to escape local minima or saddle points. Momentum aids the algorithm in accelerating toward the optimal solution and reduces oscillations in directions with smaller gradients, speeding up convergence.

Calculate the first-order moment estimate:

V_{t}^{1} = τ_{1}^{t} \cdot V_{t - 1}^{1} + (1 - τ_{1}^{t}) \cdot {\tilde{G}}_{t, l}

(22)

Calculate the third-order moment estimate:

V_{t}^{3} = τ_{3}^{t} \cdot V_{t - 1}^{3} + (1 - τ_{3}^{t}) \cdot {({\tilde{G}}_{t, l})}^{3}

(23)

Based on the initial gradient estimates, which often have large variance and bias, the correction mechanism adjusts the momentum estimates to more accurately reflect the true gradient information.

Correct the first-order moment:

{\hat{V}}_{t}^{1} = \frac{V_{t}^{1}}{1 - τ_{1}^{t}}

(24)

Correct the third-order moment:

{\hat{V}}_{t}^{3} = \frac{V_{t}^{3}}{1 - τ_{3}^{t}}

(25)

Update the model parameters:

θ_{t} = θ_{t - 1} - α \cdot \frac{{\hat{V}}_{t}^{1}}{ρ + \sqrt[3]{{\hat{V}}_{t}^{3}}}

(26)

The third-order moment historical gradient processing provides higher-dimensional data analysis capabilities, dynamically refining gradient information. This enables a better utilization of model gradients after weighted adaptive gradient clipping, and adaptive momentum updates produce more complex model parameters, enhancing model performance. For parameter selection, we recommend setting

τ_{1}^{t}

to 0.9, as it controls the first-order momentum update, allowing a quick response to changes in gradient direction, making it suitable for capturing newer information. Set

τ_{3}^{t}

to 0.99, as the calculation involves dual historical gradient sequences with more dimensional information, requiring finer distinctions. We recommend selecting a hyperparameter

ρ

no higher than 0.2, as it controls the adjustment of certain weights or update steps, which should be based on the complexity of the specific dataset.

4.3. Privacy and Security Analysis

Privacy Analysis

Client Parameter Upload Phase: In the t-th round of local training, each participating client h adds Gaussian noise (with a mean of 0 and a standard deviation of

σ

) to the model parameters

θ_{t}^{h}

obtained by minimizing the loss function, where

θ_{t}^{h} = arg min L (M_{h}, θ_{t})

, and

θ_{t}

is the global model parameter in the t-th round. The standard deviation

σ

is calculated according to Equation (10),

Δ s_{l}^{t} \leq 2 α {Clip}_{t, l} \frac{E}{B}

, where

Δ s_{l}^{t}

represents the sensitivity of client i in the l-th layer during the t-th round. In terms of privacy analysis, the client uploads the noise-added model parameters

({\tilde{ω}}_{t}^{i})

to the server, and the client satisfies

(ϵ, δ)

-differential privacy.

Server Parameter Distribution Phase: In each round, the server performs weighted aggregation of the noise-added model parameters from the participating clients and updates the global model. According to Differential Privacy Definition 5 (Post-processing Immunity [30]), the server’s parameter distribution process in each round satisfies

(ϵ, δ)

-differential privacy. After T rounds of communication between the client and server, the client’s dataset has been accessed T times in sequence, forming a parallel sequence combination. According to Differential Privacy Definition 4 (Composition Theorem), the privacy budget equals the maximum of all client privacy budgets, i.e.,

T ϵ

. The algorithm satisfies

(T ϵ, T δ)

-differential privacy.

Security Analysis

Our algorithm can effectively resist Membership Inference Attacks (MIAs) and Deep Leakage from Gradients (DLG) attacks. This is because the AWDP-FL framework uses a batch-by-batch training method and clips the gradients of each sample. For poorly performing samples, the algorithm can promptly remove their influence on gradient updates, while the global model parameters still retain the information from well-performing samples.

The goal of a membership inference attack is to infer whether a sample was involved in model training. Since our method removes the influence of poorly performing samples during gradient clipping, if an attacker attempts to identify these samples through membership inference, the results are likely to be misleading because these samples contribute little or nothing to the model weights. In other words, unlike traditional batch averaging methods, our algorithm can exclude samples detrimental to training, reducing the accuracy of membership inference attacks. This sample selection and clipping mechanism not only enhances the robustness of the algorithm but also greatly reduces the success rate of MIA attacks, further improving privacy protection and security defenses.

DLG attacks attempt to recover the original training data by reverse engineering the gradient information uploaded by clients. Our algorithm clips and perturbs gradients during training, so the gradients uploaded to the server no longer directly reflect the original data, weakening the attacker’s ability to infer the original data through DLG attacks.

Regarding DLG attacks, we designed specific defense measures in the experimental Section 5.2.3 and experimentally validated the security of AWDP-FL in countering DLG attacks. Both theoretical analysis and experiments demonstrate the robustness and security of our algorithm against various attack methods.

4.4. Algorithm Complexity Analysis

This section will analyze the time complexity of the AWDP-FL framework. The framework consists of four processes with T communication rounds and

O \cdot q

clients participating in the training. During each client–server communication, the time complexity of the AWDP-FL client is

O (log (O \cdot q))

, so the total communication time complexity of AWDP-FL is

O (T log (O \cdot q))

.

It is worth noting that the framework recommends using the parameter settings from Section 5.2, such as the values for q and E (see Table 2 for symbol meanings). These settings can effectively reduce communication overhead while maintaining high model accuracy with fewer communication rounds. However, this configuration will significantly increase the client’s training time. As the depth of the neural network model increases, the memory and GPU memory requirements will also rise significantly.

5. Experiments

5.1. Experiment Settings

Datasets: Three public datasets are used

-: MNIST Dataset: Based on a single-channel handwritten digit recognition dataset, it contains images of digits from 0 to 9, with each image being a 28 × 28 pixel grayscale image. The dataset includes 60,000 images for training and 10,000 images for testing.
-: Fashion-MNIST: Based on a single-channel fashion classification dataset, it contains 70,000 grayscale images of 10 different types of clothing with each image being 28 × 28 pixels. The classification for testing and validation is the same as the MNIST dataset but is more challenging.
-: CIFAR-10 Dataset: Based on three-channel color images of 10 classes, such as ships, airplanes, cars, etc., each image is 32 × 32 pixels. The dataset includes 50,000 images for training and 10,000 images for testing.

Models

The network models for the MNIST and Fashion-MNIST datasets consist of one convolutional layer and one fully connected layer with 256 output neurons. A ReLU activation layer is also applied, utilizing the built-in activation function to enhance the model’s nonlinear expression capabilities. For the CIFAR-10 dataset, two network architectures were used. The first model includes two convolutional layers, which are each followed by a ReLU activation layer to enhance feature extraction and reduce data dimensionality. Two fully connected layers are then employed to transform the features, converting 384 input features into 192 output features, further extracting and combining useful information. The second model uses four convolutional layers with a dropout layer and ReLU activation function added after every two convolutional layers. A MaxPool layer is then applied to halve the spatial dimensions of the feature map, retaining the most significant features. Finally, the model completes the classification task through three fully connected layers.

Experimental Setup

The experiments were conducted using the Pytorch framework on an NVIDIA GeForce RTX 4090 server with a CPU configuration of 16 vCPUs (Intel(R) Xeon(R) Gold 6430). The results were averaged over multiple tests.

5.2. Experimental Results and Analysis

The experimental analysis in this section is divided into two parts, each employing different comparison methods. Below is a detailed description of both parts.

Part One: We compare the AWDP-FL algorithm with the base federated learning algorithm without differential privacy (NoDP-FL) [29] and three advanced differential privacy federated learning algorithms across three different datasets: (1) Client-level Differential Privacy Federated Learning (CDP-FL) [22]; (2) Fadam Differential Privacy Federated Learning (FDP-FL), which is based on adaptive learning rates [23]; and (3) Adaptive Differential Privacy Federated Learning (ADP-FL), which uses adaptive clipping thresholds [24]. To fully reflect the robustness of each algorithm, the evaluation metrics for this part of the experiment include the following: Section 5.2.1, which assesses model accuracy and loss values across the three datasets; Section 5.2.2, which evaluates algorithm usability based on privacy budget consumption, i.e., the privacy budget required to achieve the same model accuracy; and Section 5.2.3, which assesses the privacy protection capabilities of different methods on the three datasets using the traditional DLG attack.

Part Two: We explore the optimal parameter configurations for the AWDP-FL algorithm in practical applications and compare its performance with the base federated learning algorithm without differential privacy (NoDP-FL). Section 5.2.4 analyzes the impact of local iteration counts on model accuracy and communication costs; Section 5.2.5 discusses the effect of sampling rate on model accuracy; and Section 5.2.6 focuses on the adaptive hierarchical clipping strategy within the AWDP-FL algorithm, particularly its effectiveness in deeper neural networks. Each section provides a detailed description of the relevant experimental parameter settings.

5.2.1. Analysis of Algorithm Model Accuracy and Loss Values

Model Accuracy Analysis: Model accuracy reflects the model’s classification ability on the test dataset, that is, the proportion of samples correctly classified by the model. Higher accuracy indicates good classification performance on the test set, providing an intuitive representation of the model’s overall performance. Regarding privacy budget selection, we did not use a single fixed value but chose an arithmetic sequence. For example, in the MNIST dataset, the privacy budget sequence was set to 0.4, 0.6, and 0.8. The purpose of this sequence is to clearly demonstrate the robustness of different algorithms. If the model accuracy fluctuates little within this privacy budget range, it indicates that the algorithm still performs stably under a lower privacy budget. Conversely, if there are large fluctuations, it suggests that the algorithm has reached the maximum privacy budget it can accommodate. By using a privacy budget sequence, we can more clearly demonstrate the performance of each algorithm under different conditions, allowing us to determine the critical privacy budget for the dataset. In this experiment, we selected 50 clients, and model accuracy is expressed as a percentage. The parameters for all algorithms were kept consistent: the number of local iterations E was set to 3, batch size B was 64, and the number of global communication rounds T was 50 for the MNIST dataset, while it was 100 for the Fashion-MNIST and CIFAR-10 datasets. Next, we will analyze the experimental results for each dataset in detail.

In the MNIST dataset, the number of communication rounds was set to 50, and fewer communication rounds help reduce the consumption of the privacy budget. The privacy budget was selected within the range of 0.4 to 0.8, where a smaller budget provides a higher level of privacy protection. As shown in Table 3, the model accuracy of the NoDP-FL algorithm was

89.84 %

for privacy budgets of 0.4 and 0.8, while the model accuracy of the AWDP-FL algorithm was

87.59 %

and

88.10 %

, respectively. This indicates that although the reduction in the privacy budget increases the perturbation, the accuracy drop for AWDP-FL is minimal, with the error remaining within

1.74 %

. Under the same conditions, the highest accuracy of FDP-FL was

84.17 %

, showing that AWDP-FL outperformed other algorithms by approximately

4 %

, demonstrating a good balance between privacy protection and model performance.

Figure 2 shows the convergence trends of each algorithm on the MNIST dataset. It can be observed that the NoDP-FL algorithm converges rapidly within 10 to 20 rounds with its fast convergence attributed to the absence of noise perturbation, thus providing limited privacy protection. The AWDP-FL algorithm gradually converges after 20 rounds, showing a smooth curve, while other algorithms require more than 40 rounds to reach a convergence trend. In federated learning, the smoothness of the model accuracy curve over consecutive rounds indicates the stability and robustness of the model during the training process. This demonstrates that the AWDP-FL algorithm, while introducing privacy protection, is still able to maintain efficient model convergence with minimal accuracy fluctuations, reflecting its good convergence performance and robustness.

In the Fashion-MNIST dataset, the number of communication rounds was set to 100, and the privacy budgets were 0.4, 0.6, and 0.8, respectively. As shown in Table 4, the NoDP-FL algorithm quickly converged within fewer rounds with a model accuracy of

84.53 %

. As the privacy budget increases, AWDP-FL gradually approaches the accuracy of NoDP-FL. When the privacy budget is 0.8, AWDP-FL achieves an accuracy of

83.22 %

, which is only

1.31 %

lower than NoDP-FL. In contrast, the highest accuracy for FDP-FL, ADP-FL, and CDP-FL is

79.52 %

,

79.45 %

, and

77.58 %

, respectively. AWDP-FL outperformed the other algorithms by approximately

5.64 %

in terms of model accuracy, proving its ability to maintain excellent classification performance even under a high privacy budget.

Figure 3 shows the convergence curves of each algorithm on the Fashion-MNIST dataset. It can be observed that AWDP-FL’s convergence speed significantly outperforms other algorithms as the number of communication rounds increases, achieving convergence within 70 to 80 rounds. In contrast, the other algorithms only begin to stabilize after 90 rounds. The smoothness of the AWDP-FL curve indicates that as the communication rounds increase, the algorithm is able to quickly and stably converge while maintaining strong privacy protection, thus reducing communication time and resource consumption.

In the CIFAR-10 dataset, the number of communication rounds was also set to 100, and the privacy budgets were chosen as 4, 5, and 6. Since the CIFAR-10 dataset contains three-channel color images, the training process is more complex. As shown in Table 5, the AWDP-FL algorithm achieved a model accuracy of

74.42 %

when the privacy budget was set to 6, significantly outperforming other algorithms. In comparison, NoDP-FL achieved a model accuracy of

76.99 %

under the same privacy budget, while FDP-FL’s highest accuracy at

ϵ

= 6 was

61.87 %

. AWDP-FL not only maintained a high accuracy under privacy protection but also significantly outperformed other differential privacy algorithms, particularly surpassing FDP-FL by

12.55 %

. The performance of AWDP-FL surpassed other algorithms across all privacy budgets, demonstrating its ability to balance privacy budget and model performance effectively.

Figure 4 shows the convergence trends of each algorithm on the CIFAR-10 dataset with 100 communication rounds. At a privacy budget of 6, the AWDP-FL algorithm exhibits higher convergence during the first 20 training rounds, slightly outperforming NoDP-FL in accuracy between rounds 5 and 20. The rapid convergence of AWDP-FL is especially apparent in the early stages of training, indicating that the algorithm can quickly optimize model performance in complex tasks. As the number of communication rounds increases, the accuracy gap between AWDP-FL and NoDP-FL gradually narrows with a final difference of

2.57 %

. In contrast, other differential privacy algorithms such as CDP-FL, ADP-FL, and FDP-FL have noticeably slower convergence speeds and much lower final accuracy compared to AWDP-FL. The smooth curve and significant accuracy improvement of AWDP-FL indicate its strong robustness and fast convergence ability in federated learning. A stable convergence curve means that the algorithm can maintain performance stability even when faced with privacy protection requirements, reducing fluctuations during training. This further demonstrates the algorithm’s superiority on complex datasets like CIFAR-10.

Model Loss Value Analysis In many practical applications, accuracy is often the preferred performance metric. However, in datasets with class imbalance, accuracy can lead to misleading conclusions. For example, in a dataset where

90 %

of the samples belong to class A and

10 %

belong to class B, a model that always predicts class A would still have

90 %

accuracy, but clearly, it is not a good model. Accuracy only considers whether the prediction is correct without accounting for the confidence of the prediction. A loss function is used to measure the difference between the model’s predictions and the actual values; the smaller the loss value, the closer the predictions are to the true values. The loss function used in this study is cross-entropy loss, which provides a more granular performance evaluation by not only focusing on whether the prediction is correct but also considering the degree of deviation in the prediction. Based on this, we compared the loss function values of various algorithms under different privacy budgets, using the same parameters and configurations as in the previous section. Next, we will analyze the experimental results for each dataset.

In the MNIST dataset, the number of communication rounds was set to 50. As shown in Table 6, the NoDP-FL algorithm, without noise perturbation, achieves a small loss value, ultimately converging to 0.3601. This is because NoDP-FL is not affected by privacy perturbation, allowing the optimizer to quickly find the optimal solution. The AWDP-FL algorithm, after applying adaptive clipping and noise perturbation, has a loss value of 0.4323, which is slightly higher than NoDP-FL but still significantly better than other differential privacy algorithms. The loss values for FDP-FL, ADP-FL, and CDP-FL are 0.5644, 0.6686, and 0.7994, respectively. Compared to FDP-FL, AWDP-FL shows a loss difference of 0.3671, demonstrating stronger convergence and robustness, especially maintaining good performance even under privacy protection.

Figure 5 shows the convergence trends of the loss values for each algorithm on the MNIST dataset. It can be observed that the NoDP-FL algorithm converges rapidly within 20 rounds, and it is unaffected by noise perturbation. The AWDP-FL algorithm converges quickly in the early stages, finally completing convergence between 40 and 50 rounds. Its curve is smooth, demonstrating strong robustness. In federated learning, the smoothness of the loss value curve indicates that the model remains stable through multiple iterations. This characteristic of AWDP-FL shows that it can maintain stability and efficiency in model training while preserving privacy.

In the Fashion-MNIST dataset, the loss values of the various algorithms are relatively close, and they all converge quickly. As shown in Table 7, the loss value of the NoDP-FL algorithm is 0.4395, while the loss value of the AWDP-FL algorithm gradually decreases as the privacy budget increases, eventually reaching 0.5013 at

ϵ

= 0.8. In comparison, the loss values for FDP-FL, ADP-FL, and CDP-FL are 0.5959, 0.5976, and 0.6540, respectively. AWDP-FL’s loss value is 0.1527 lower than that of FDP-FL, demonstrating that the AWDP-FL algorithm can maintain a lower loss value under a higher privacy budget while also exhibiting strong robustness and stability.

Figure 6 shows the convergence of each algorithm on the Fashion-MNIST dataset within 50 to 80 rounds. With a privacy budget of 0.8, AWDP-FL rapidly converges between 50 and 60 rounds, while other algorithms fail to converge within this range. The loss curve of AWDP-FL is relatively smooth, indicating that the algorithm can reach the optimum point more quickly under privacy protection with minimal fluctuations in loss value. This further validates the efficiency and robustness of the AWDP-FL algorithm.

Due to the higher complexity of the CIFAR-10 dataset, the distribution of loss function values is more sparse. As shown in Table 8, the loss value of NoDP-FL is 0.6754, while AWDP-FL’s loss value is 0.7916 at a privacy budget of 6. Although it is higher than NoDP-FL, it shows a lower loss value compared to other differential privacy algorithms. The loss values of FDP-FL and ADP-FL are 1.1844 and 0.9643, respectively. AWDP-FL consistently maintains a lower loss value, and as the privacy budget increases, its performance remains stable. This demonstrates that AWDP-FL has good robustness and adaptability on more complex datasets.

Figure 7 shows the changes in loss values for each algorithm over the communication rounds in the CIFAR-10 dataset. AWDP-FL demonstrates lower loss values during the first 20 rounds of training, highlighting its efficiency and stability in complex tasks. Although the loss value of AWDP-FL is slightly higher than that of NoDP-FL as noise perturbation increases, its curve remains smooth and lower than that of other differential privacy algorithms. The final loss value difference between AWDP-FL and NoDP-FL is 0.1162, while the gap with other algorithms reaches up to 0.6032. This indicates that AWDP-FL has better adaptability than other algorithms on complex datasets.

5.2.2. Usability Analysis of the Algorithm

In this experimental section, we conducted a comparative analysis of the performance of the AWDP-FL algorithm and other differential privacy federated learning algorithms on three datasets. Fifty clients were selected, and model accuracy is expressed as a percentage. The parameters used for all algorithms were kept consistent: the number of local iterations was set to 3, the batch size was to 64, and the global communication rounds for the three datasets were set to 50, as shown by the experimental results in Table 9.

On the MNIST dataset, the AWDP-FL algorithm achieved a notable performance with an accuracy of

90.75 %

and a privacy budget of only

ϵ

= 0.4. Compared to other algorithms, CDP-FL and ADP-FL had accuracies of

90.14 %

and

90.12 %

, respectively, but their privacy budgets were

ϵ

= 9 and

ϵ

= 3, which are much higher than that of AWDP-FL. Additionally, FDP-FL had an accuracy of

88.92 %

with a privacy budget of

ϵ

= 4, which was inferior to AWDP-FL in both accuracy and privacy protection. This indicates that AWDP-FL can provide higher model accuracy with a smaller privacy budget, demonstrating a better privacy–utility trade-off.

On the Fashion-MNIST dataset, AWDP-FL achieved an accuracy of

83.52 %

with a privacy budget of only

ϵ

= 0.4, still outperforming other algorithms. The accuracies of CDP-FL and ADP-FL were

82.66 %

and

82.41 %

, respectively, but their privacy budgets were

ϵ

= 9 and

ϵ

= 2. FDP-FL achieved an accuracy of

82.72 %

on this dataset with a privacy budget of

ϵ

= 4. Similarly, AWDP-FL significantly reduced the privacy budget while ensuring model accuracy, demonstrating its robustness on the dataset.

On the CIFAR-10 dataset, AWDP-FL again showed its superiority with an accuracy of

72.89 %

and a privacy budget of

ϵ

= 4. In comparison, CDP-FL achieved an accuracy of

65.56 %

with a privacy budget of

ϵ

= 15, ADP-FL had an accuracy of

67.80 %

with a privacy budget of

ϵ

= 9, and FDP-FL’s accuracy was

66.2 %

with a privacy budget of

ϵ

= 12. AWDP-FL not only achieved higher accuracy but also demonstrated better privacy protection with a lower privacy budget.

In summary, the AWDP-FL algorithm exhibited higher model accuracy and lower privacy budget consumption across all three datasets, demonstrating a good balance between privacy protection and model performance.

5.2.3. Privacy Validation of Algorithms under DLG Attack

In this section, we conducted DLG attack experiments on the AWDP-FL algorithm and other differential privacy federated learning algorithms using the MNIST, Fashion-MNIST, and CIFAR-10 datasets. The experimental parameters were based on the conclusions of Section 5.2.2, where the attacks were performed under the privacy budget required to achieve the same model accuracy. The conclusions from that section will not be referenced again here. We believe that federated learning needs to strike a balance between performance and privacy protection, but in practical applications, enhancing privacy protection while ensuring performance remains a major challenge. Therefore, we used the following two metrics to evaluate the experimental results: (1) the Structural Similarity Index (SSIM), which assesses the difference between the reconstructed images from the attack and the original images; and (2) DLG attack loss values, which measure the gradient differences between the original and disguised samples.

Specifically, the SSIM values range from 0 to 1, with values closer to 1 indicating higher image similarity and 1 representing two identical images. The calculation of SSIM is based on three aspects: first, brightness comparison, which measures the average brightness difference; second, contrast comparison, which evaluates contrast differences; and third, structural comparison, which reflects the local structural similarity of the images. These indicators allow us to more comprehensively evaluate the performance of the attack and the effectiveness of privacy protection.

In the experiments, each algorithm randomly selects two samples from each dataset for DLG attacks with the number of attack iterations set to 1000 to ensure convergence of the attack curve. We select the optimal value for each indicator for comparative analysis. Next, we will provide a detailed discussion of the experimental results across the three datasets.

As seen in Table 10, for each randomly selected sample, the NoDP-FL algorithm exhibits a near-zero loss value in image reconstruction after the attack with SSIM values close to 1. This is because the algorithm does not apply any noise, and as a baseline algorithm in federated learning, it provides a stable comparison for the performance of other algorithms. For the CDP-FL algorithm, we observe that its indicators are nearly identical to those of NoDP-FL, as it uses a large privacy budget, which improves model performance but neglects the effectiveness of privacy protection. For FDP-FL, ADP-FL, and AWDP-FL, our algorithms show the best performance in both indicators. For instance, the loss value for Sample 1 is 0.0008 with an SSIM value of 0.6971. As clearly shown in Figure 8, the image reconstructed by the AWDP-FL algorithm barely retains any identifiable features of the original image, demonstrating its significant advantage in privacy protection.

The results for the MNIST dataset are shown below.

For the Fashion-MNIST dataset, which is a clothing dataset, we selected two different categories of samples: a top and a shoe. As shown in Table 11, the values for Sample 2 outperform those of Sample 1 across all indicators. This is because Sample 2’s features are harder to capture compared to those of Sample 1, making it more difficult to reconstruct.

From Figure 9, it can be observed that the image reconstruction performance of the ADP-FL and AWDP-FL algorithms is superior to that of other algorithms. This is because these algorithms consume far less privacy budget than the others, achieving a balance between privacy and performance. However, even with this advantage, these two algorithms still extract a significant amount of image features after the attack. This suggests that despite the addition of differential privacy protection, whether to further sacrifice model performance to enhance privacy protection remains a question worth considering.

The results for the Fashion-MNIST dataset are shown below.

For the CIFAR-10 dataset, unlike the other datasets used in the experiments, this dataset consists of three-channel images, which poses a significant challenge for model training. As a result, each algorithm uses a relatively large privacy budget with the privacy budgets for CDP-FL and FDP-FL set at 12 and 15, respectively. As shown in Table 12, the excessive privacy budgets result in the two indicators of the model after the attack being nearly identical to those of the NoDP-FL model without noise processing, indicating that these algorithms offer poor privacy protection on this dataset. For the ADP-FL and AWDP-FL algorithms, the table shows that the loss value for Sample 2 in ADP-FL is greater than that in AWDP-FL with values of 0.0403 and 0.0330, respectively. Meanwhile, the SSIM values indicate that ADP-FL’s value is lower than AWDP-FL’s, suggesting that ADP-FL has stronger privacy protection capabilities. Figure 10 also shows a visual comparison of the images produced by the respective algorithms, further validating this conclusion.

The results for the CIFAR-10 dataset are shown below.

Finally, the experimental scenarios in this section were designed based on the conclusions drawn in other chapters of this paper. While further exploration from other perspectives may be possible, the above experimental results indirectly support the conclusions of other chapters and demonstrate the superiority of our algorithms. In this scenario, we believe that the introduction of noise processing will inevitably affect model performance, thus reducing the probability of attackers obtaining local data. However, the key challenge remains how to enhance privacy protection without compromising model performance. Therefore, the selection of the privacy budget plays a crucial role in balancing model performance and privacy protection.

5.2.4. The Impact of Local Iteration Counts

This section will explore the impact of local iteration counts on model performance, time overhead, and global communication overhead. In the experiment, 50 clients were selected, and model accuracy was expressed as a percentage. The parameters for all algorithms were kept consistent: the batch size B was 64, and the number of global communication rounds T was 10 for the MNIST dataset, 20 for the Fashion-MNIST dataset, and 50 for the CIFAR-10 dataset. This experimental setup achieved a smaller global communication overhead (compared to the 50, 100, and 100 rounds in the experimental design of Section 5.2.1). Based on this, we will explore the model performance with different local iteration counts across different datasets.

In the MNIST dataset, as shown in Table 13, with the increase in local iteration count E, the performance of the AWDP-FL algorithm varies under different privacy budgets. For a smaller privacy budget

ϵ

= 0.4, as E increases, the model accuracy gradually decreases, particularly at E = 7 and E = 9, where the accuracy is

87.24 %

and

87.11 %

, respectively. However, with larger privacy budgets

ϵ

= 0.6 and

ϵ

= 0.8, the performance of AWDP-FL improves, especially at E = 7, where model accuracy approaches 90%.

Figure 11 further illustrates the trend of the AWDP-FL algorithm as local iteration count E varies under different privacy budgets. Under a smaller privacy budget

ϵ

= 0.4, model accuracy shows a downward trend as E increases. However, under higher privacy budgets

ϵ

= 0.6 and

ϵ

= 0.8, the performance of AWDP-FL is more stable with the best performance observed at E = 7. This suggests that while E = 7 is an appropriate choice under a larger privacy budget, the exact choice should still be adjusted according to the specific privacy budget.

For the Fashion-MNIST dataset, as shown in Table 14, the performance of the AWDP-FL algorithm varies significantly with the local iteration count E under different privacy budgets. As E increases, especially under a smaller privacy budget

ϵ

= 0.4, model accuracy gradually decreases, dropping to

81.39 %

at E = 7. In contrast, while performance improves under higher privacy budgets

ϵ

= 0.6 and

ϵ

= 0.8, it is not as stable as with smaller iteration counts. Therefore, E = 3 is an ideal iteration count for the Fashion-MNIST dataset, achieving a good balance between privacy protection and model performance.

Figure 12 shows the trend of the AWDP-FL algorithm on the Fashion-MNIST dataset as the local iteration count E changes under different privacy budgets. As E increases, especially under the smaller privacy budget

ϵ

= 0.4, model accuracy decreases significantly with the worst performance observed at E = 7. The performance improves slightly under the higher privacy budgets

ϵ

= 0.6 and

ϵ

= 0.8, but it is not as stable as with smaller iteration counts. Therefore, E = 3 is the most appropriate iteration count.

In the CIFAR-10 dataset, as shown in Table 15, the accuracy of the AWDP-FL algorithm gradually decreases with the increase in local iteration count E under different privacy budgets. The decline in model accuracy is particularly noticeable under the smaller privacy budget

ϵ

= 5 as E increases. At E = 7, the model shows relatively balanced performance, with higher accuracy, especially under lower privacy budgets. Therefore, E = 5 is the most ideal iteration count for the CIFAR-10 dataset.

Figure 13 shows the trend of the AWDP-FL algorithm on the CIFAR-10 dataset as the local iteration count E changes under different privacy budgets. As E increases, model accuracy shows a downward trend across all privacy budgets with the worst performance observed at E = 10. This is because as the number of iterations increases, the noise added during each local iteration also increases, as indicated by the noise calculation shown in Formula (9).

A comprehensive analysis of the three datasets leads to the following conclusions. In the MNIST dataset, E = 7 is the optimal local iteration count, performing exceptionally well under higher privacy budgets. In the Fashion-MNIST dataset, E = 3 is the optimal iteration count, offering higher model performance under smaller privacy budgets. In the CIFAR-10 dataset, E = 5 is the most suitable iteration count, achieving the best balance between privacy protection and model performance. These results indicate that different datasets have varying requirements for local iteration counts, and the AWDP-FL algorithm demonstrates strong robustness by selecting appropriate iteration counts under different privacy budgets. Notably, as seen in Section 5.2.1 (E = 3), as local iteration counts increase, the local training time overhead proportionally increases but does not raise global communication overhead. Instead, it can reduce global communication overhead while maintaining model accuracy. The number of communication rounds for the MNIST dataset was reduced from 50 to 10, for the Fashion-MNIST dataset from 100 to 20, and for the CIFAR-10 dataset from 100 to 50, which is beneficial in reducing the overall privacy overhead.

5.2.5. The Impact of Sampling Rate on Model Accuracy

This section will explore the impact of client sampling rate (defined as the ratio of the number of clients participating in training to the total number of clients) on model performance and global communication. This ratio can also be understood as the proportion of data involved in training relative to the total data. In the experiment, the optimal local iteration count E was selected for each dataset based on the conclusions from Section 5.2.4. The experimental setup includes 50 clients with model accuracy expressed as a percentage. The batch size B was 64, and the number of global communication rounds T was 10 for the MNIST dataset, 20 for the Fashion-MNIST dataset, and 50 for the CIFAR-10 dataset. Next, we will conduct an in-depth analysis of model performance across different datasets under various sampling rates.

In Table 16, the impact of different sampling rates on model performance in the MNIST dataset is clearly evident. The NoDP-FL algorithm achieved an accuracy of

90.14 %

at a sampling rate q = 0.6, while its optimal performance occurred at q = 1 with an accuracy of

90.83 %

. For the AWDP-FL algorithm, with a privacy budget

ϵ

= 0.6, the accuracy reached

89.83 %

at a sampling rate q = 0.6 and increased to a maximum of

90.16 %

when the sampling rate was increased to q = 1. Nevertheless, AWDP-FL already exhibited performance close to q = 1 at q = 0.6, indicating that it can achieve excellent model performance with lower communication costs (T = 10) while maintaining strong privacy protection.

In Figure 14, the trend shows that the accuracy of the NoDP-FL algorithm increases steadily with the sampling rate, reaching its highest value at q = 1. This indicates that the performance of a model without added noise improves as more clients participate in training, leading to a larger dataset and stronger performance. The trend for the AWDP-FL algorithm is more fluctuating. Although it peaks at q = 0.6, it then drops to

86.31 %

at q = 0.8 before rising again to

90.16 %

at q = 1. This “U-shaped” trend indicates that AWDP-FL performs better at higher sampling rates, but a lower sampling rate of q = 0.6 can already achieve a good balance between performance and communication costs.

In Table 17, the impact of different sampling rates on model performance in the Fashion-MNIST dataset can be observed. The NoDP-FL algorithm achieved an accuracy of

83.54 %

at a sampling rate of q = 0.6, while AWDP-FL, with a privacy budget of

ϵ

= 0.6, achieved an accuracy of

83.43 %

at the same sampling rate. Compared to other sampling rates, q = 0.6 already exhibited high model accuracy. Further increasing the sampling rate to q = 1 did not result in a significant improvement in accuracy but rather showed a slight decline. This suggests that a lower sampling rate of q = 0.6 can achieve a balance between performance and communication costs, effectively reducing local training time while maintaining strong model performance.

According to Figure 15, the trend shows that as the sampling rate increases from q = 0.4 to q = 0.6, the model performance of both NoDP-FL and AWDP-FL improves with AWDP-FL reaching an accuracy of

83.43 %

at q = 0.6. However, when the sampling rate increases to q = 0.8 and q = 1, the model’s accuracy experiences a slight decline, especially with AWDP-FL, which does not show an improvement with the higher sampling rates. This indicates that for the Fashion-MNIST dataset, an excessively high sampling rate does not significantly improve model performance and may even affect overall training efficiency. Therefore, q = 0.6 is the optimal sampling rate for maintaining efficient training and accuracy performance.

In Table 18, the impact of different sampling rates on model performance in the CIFAR-10 dataset is relatively clear. The NoDP algorithm achieved an accuracy of

77.23 %

at a sampling rate of q = 1, while the AWDP-FL algorithm, with a privacy budget of

ϵ

= 5, achieved an accuracy of

75.33 %

at q = 1. Compared to the lower sampling rates of q = 0.4 and q = 0.6, the model accuracy showed improvement. Although the lower sampling rate q = 0.6 provided faster training, in the CIFAR-10 dataset, the sampling rate of q = 1 is the ideal choice for achieving the highest model accuracy.

According to Figure 16, the trend shows that as the sampling rate increases, the performance of both NoDP and AWDP-FL improves, especially between q = 0.8 and q = 1, where model accuracy significantly improves. For the NoDP algorithm, model performance changes little at lower sampling rates q = 0.4 and q = 0.6, but it reaches its highest value of

77.23 %

at q = 1. The accuracy of the AWDP-FL algorithm is relatively lower at the lower sampling rates q = 0.4 and q = 0.6, but it reaches

75.33 %

at q = 1. Therefore, on the CIFAR-10 dataset, although lower sampling rates help reduce training time, q = 1 remains the best choice for ensuring higher model performance.

A comprehensive analysis of the results from the three datasets shows that a lower client sampling rate q = 0.6 provided high cost-effectiveness in the MNIST and Fashion-MNIST datasets, as it speeds up local training and improves experimental efficiency without significantly affecting overall model accuracy. However, in the CIFAR-10 dataset, although lower sampling rates can speed up training, the sampling rate q = 1 provided higher model accuracy. Therefore, if model accuracy is prioritized, q = 1 is the best choice.

5.2.6. The Impact of Hierarchical Gradient Clipping Optimization

Gradient clipping optimization at the neural network layer level, as one of the core optimization algorithms in this paper, will be explored in this section to determine whether different neural network layers affect the optimization effectiveness of this algorithm. We will compare the original AWDP-FL algorithm with a modified algorithm (which calculates the clipping threshold based on the entire model for each iteration without considering individual layers separately). This modified algorithm is based on AWDP-FL to assess the impact of layer-level adaptiveness. Additionally, a new, deeper network structure (Model 2, see Section 5.1) was designed compared to the original network used for the CIFAR dataset. The experimental setup includes 50 clients with model accuracy expressed as a percentage. The batch size B is 64, and the dataset used is the more complex CIFAR-10 dataset, with 100 global communication rounds T = 100. Next, we will conduct an in-depth analysis of the experimental results.

In Table 19, the comparison between hierarchical and non-hierarchical processing shows a significant difference in the performance of AWDP-FL under the differential privacy mechanism. Hierarchical AWDP-FL (

ϵ

= 5) consistently achieves higher accuracy than non-hierarchical AWDP-FL under different privacy budgets. For example, at

ϵ

= 4, the accuracy of hierarchical processing is

75.94 %

, while non-hierarchical processing achieves only

70.01 %

. When

ϵ

= 6, hierarchical processing further improves to

80.67 %

, while non-hierarchical processing reaches

76.02 %

. This indicates that the hierarchical mechanism better preserves model performance while ensuring privacy protection. It can also be observed that as the privacy budget

ϵ

increases, the accuracy of both methods rises, suggesting that increasing the privacy budget reduces the impact of differential privacy on model performance. However, the advantage of hierarchical processing lies in its ability to deliver higher accuracy even under the same privacy budget. As for NoDP-FL, without the introduction of differential privacy, it achieved an accuracy of

81.89 %

, which aligns with the expected performance of models without differential privacy.

6. Conclusions

This paper focuses on the application of differential privacy in federated learning, particularly on the selection of adaptive thresholds during gradient clipping and the adaptive update of model gradients. By combining adaptive weight parameters and a layer-wise approach, an Adaptive Differential Privacy Federated Learning framework (AWDP-FL) was designed, introducing dynamic noise perturbation before model parameter transmission. Experimental results demonstrate that on the MNIST, Fashion-MNIST, and CIFAR-10 public datasets, AWDP-FL not only provides strong privacy protection but also effectively enhances model performance and significantly improves the stability of the training process in various experimental scenarios. Furthermore, this paper provides an in-depth analysis of the optimal parameter configuration for AWDP-FL, further enhancing the algorithm’s transparency and practicality.

In this paper, while the equal distribution of privacy budgets reduces waste, it may still be too high for some datasets. Future work will focus on developing more accurate privacy loss measurement methods to better control the consumption of privacy budgets. Data heterogeneity in federated learning is a common challenge. Due to significant differences in data distribution across clients, traditional privacy protection strategies may lead to performance degradation in heterogeneous environments. Addressing the issue of data heterogeneity will help improve the adaptability of models across different clients and allow for a more precise implementation of privacy protection. Deep neural networks perform better when processing complex data, as they are capable of extracting richer features. However, deep networks impose higher demands on differential privacy, and future research will explore how to optimize privacy protection in complex networks to balance privacy and performance.

Author Contributions

Conceptualization, Z.C. and H.Z.; methodology, Z.C. and H.Z.; validation, H.Z.; investigation, Z.C.; data curation, Z.C. and G.L.; writing—original draft preparation, Z.C.; writing—review and editing, Z.C. and H.Z.; supervision, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science and Technology Research Planning Project of the School of Computer Science and Engineering, Changchun University of Technology, Changchun, China, under Grant JJKH20240860KJ.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jiang, Y.; Wang, S.; Valls, V.; Ko, B.J.; Lee, W.H.; Leung, K.K.; Tassiulas, L. Model pruning enables efficient federated learning on edge devices. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 10374–10386. [Google Scholar] [CrossRef] [PubMed]
El Ouadrhiri, A.; Abdelhadi, A. Differential privacy for deep and federated learning: A survey. IEEE Access 2022, 10, 22359–22380. [Google Scholar] [CrossRef]
Chamikara, M.; Liu, D.; Camtepe, S.; Nepal, S.; Grobler, M.; Bertók, P.; Khalil, I. Local differential privacy for federated learning in industrial settings. In Proceedings of the Computer Security—ESORICS 2022: 27th European Symposium on Research in Computer Security, Copenhagen, Denmark, 26–30 September 2022. [Google Scholar]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, PMLR, Fort Lauderdale, FL, USA, 20–22 April 2017; pp. 1273–1282. [Google Scholar]
Ma, J.; Naas, S.A.; Sigg, S.; Lyu, X. Privacy-preserving federated learning based on multi-key homomorphic encryption. Int. J. Intell. Syst. 2022, 37, 5880–5901. [Google Scholar] [CrossRef]
Pajooh, H.H.; Demidenko, S.; Aslam, S.; Harris, M. Blockchain and 6G-Enabled IoT. Inventions 2022, 7, 109. [Google Scholar] [CrossRef]
Warnat-Herresthal, S.; Schultze, H.; Shastry, K.L.; Manamohan, S.; Mukherjee, S.; Garg, V.; Sarveswara, R.; Pickkers, P.; Aziz, N.A.; Ktena, S.; et al. Swarm learning for decentralized and confidential clinical machine learning. Nature 2021, 594, 265–270. [Google Scholar] [CrossRef]
Hosseini, S.M.; Sikaroudi, M.; Babaei, M.; Tizhoosh, H.R. Cluster based secure multi-party computation in federated learning for histopathology images. In Proceedings of the International Workshop on Distributed, Collaborative, and Federated Learning; Springer Nature: Cham, Switzerland, 2022; pp. 110–118. [Google Scholar] [CrossRef]
Kanagavelu, R.; Wei, Q.; Li, Z.; Zhang, H.; Samsudin, J.; Yang, Y.; Goh, R.S.M.; Wang, S. CE-Fed: Communication efficient multi-party computation enabled federated learning. Array 2022, 15, 100207. [Google Scholar] [CrossRef]
Zhu, L.; Liu, Z.; Han, S. Deep leakage from gradients. Adv. Neural Inf. Process. Syst. 2019, 32, 1323. Available online: https://proceedings.neurips.cc/paper/2019/hash/60a6c4002cc7b29142def8871531281a-Abstract.html (accessed on 15 August 2024).
Park, J.; Lim, H. Privacy-preserving federated learning using homomorphic encryption. Appl. Sci. 2022, 12, 734. [Google Scholar] [CrossRef]
Sun, L.; Lyu, L. Federated model distillation with noise-free differential privacy. arXiv 2020, arXiv:2009.05537. [Google Scholar] [CrossRef]
Alasmary, H.; Tanveer, M. ESCI-AKA: Enabling Secure Communication in an IoT-Enabled Smart Home Environment Using Authenticated Key Agreement Framework. Mathematics 2023, 11, 3450. [Google Scholar] [CrossRef]
Gupta, S.; Alharbi, F.; Alshahrani, R.; Kumar Arya, P.; Vyas, S.; Elkamchouchi, D.H.; Soufiene, B.O. Secure and lightweight authentication protocol for privacy preserving communications in smart city applications. Sustainability 2023, 15, 5346. [Google Scholar] [CrossRef]
Kanellopoulos, D.; Sharma, V.K. Dynamic load balancing techniques in the IoT: A review. Symmetry 2022, 14, 2554. [Google Scholar] [CrossRef]
Chamikara, M.A.P.; Liu, D.; Camtepe, S.; Nepal, S.; Grobler, M.; Bertok, P.; Khalil, I. Local differential privacy for federated learning. arXiv 2022, arXiv:2202.06053. [Google Scholar] [CrossRef]
Truex, S.; Liu, L.; Chow, K.H.; Gursoy, M.E.; Wei, W. LDP-Fed: Federated learning with local differential privacy. In Proceedings of the Third ACM International Workshop on Edge Systems, Analytics and Networking, Heraklion, Greece, 27 April 2020; pp. 61–66. [Google Scholar] [CrossRef]
Sun, L.; Qian, J.; Chen, X. LDP-FL: Practical private aggregation in federated learning with local differential privacy. arXiv 2020, arXiv:2007.15789. [Google Scholar] [CrossRef]
Zhao, Y.; Zhao, J.; Yang, M.; Wang, T.; Wang, N.; Lyu, L.; Niyato, D.; Lam, K.Y. Local differential privacy-based federated learning for internet of things. IEEE Internet Things J. 2020, 8, 8836–8853. [Google Scholar] [CrossRef]
Liu, W.; Cheng, J.; Wang, X.; Lu, X.; Yin, J. Hybrid differential privacy based federated learning for Internet of Things. J. Syst. Archit. 2022, 124, 102418. [Google Scholar] [CrossRef]
Shen, X.; Liu, Y.; Zhang, Z. Performance-enhanced federated learning with differential privacy for internet of things. IEEE Internet Things J. 2022, 9, 24079–24094. [Google Scholar] [CrossRef]
Geyer, R.C.; Klein, T.; Nabi, M. Differentially private federated learning: A client level perspective. arXiv 2017, arXiv:1712.07557. [Google Scholar] [CrossRef]
Wu, X.; Zhang, Y.; Shi, M.; Li, P.; Li, R.; Xiong, N.N. An adaptive federated learning scheme with differential privacy preserving. Future Gener. Comput. Syst. 2022, 127, 362–372. [Google Scholar] [CrossRef]
Wang, F.; Xie, M.; Li, Q.; Wang, C. An Adaptive Clipping Differential Privacy Federated Learning Framework. J. Xidian Univ. 2023, 04, 111–120. [Google Scholar] [CrossRef]
Zhao, J.; Yang, M.; Zhang, R.; Song, W.; Zheng, J.; Feng, J.; Matwin, S. Privacy-enhanced federated learning: A restrictively self-sampled and data-perturbed local differential privacy method. Electronics 2022, 11, 4007. [Google Scholar] [CrossRef]
Hu, R.; Guo, Y.; Gong, Y. Federated learning with sparsified model perturbation: Improving accuracy under client-level differential privacy. IEEE Trans. Mob. Comput. 2023, 23, 8242–8255. [Google Scholar] [CrossRef]
Lian, Z.; Wang, W.; Huang, H.; Su, C. Layer-based communication-efficient federated learning with privacy preservation. IEICE Trans. Inf. Syst. 2022, 105, 256–263. [Google Scholar] [CrossRef]
Baek, C.; Kim, S.; Nam, D.; Park, J. Enhancing differential privacy for federated learning at scale. IEEE Access 2021, 9, 148090–148103. [Google Scholar] [CrossRef]
Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. (TIST) 2019, 10, 1–19. [Google Scholar] [CrossRef]
Dwork, C.; Roth, A. The algorithmic foundations of differential privacy. Found. Trends® Theor. Comput. Sci. 2014, 9, 211–407. [Google Scholar] [CrossRef]
Dwork, C.; Rothblum, G.N.; Vadhan, S. Boosting and differential privacy. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, Las Vegas, NV, USA, 23–26 October 2010; pp. 51–60. [Google Scholar] [CrossRef]

Figure 1. Federated learning communication framework diagram.

Figure 2. Convergence curves and accuracy trends for the MNIST dataset across different algorithms and privacy budgets.

Figure 3. Convergence and accuracy trends for the Fashion-MNIST dataset across different algorithms.

Figure 4. Convergence and accuracy trends for the CIFAR-10 dataset across different algorithms.

Figure 5. Convergence and loss value trends for the MNIST dataset across different algorithms.

Figure 6. Convergence and loss value trends for the Fashion-MNIST dataset across different algorithms.

Figure 7. Convergence and loss value trends for the CIFAR-10 dataset across different algorithms.

Figure 8. The effect of DLG attacks on models trained with different differential privacy algorithms on the MNIST dataset.

Figure 9. The effect of DLG attacks on models trained with different differential privacy algorithms on the Fashion-MNIST dataset.

Figure 10. The effect of DLG attacks on models trained with different differential privacy algorithms on the CIFAR-10 dataset.

Figure 11. The impact of local iteration count on model performance under different privacy budgets in the MNIST dataset.

Figure 12. The impact of local iteration count on model performance under different privacy budgets in the MNIST dataset.

Figure 13. The impact of local iteration count on model performance under different privacy budgets in the MNIST dataset.

Figure 14. The impact of different sampling rates on model performance on the MNIST dataset.

Figure 15. The impact of different sampling rates on model performance in the Fashion-MNIST dataset.

Figure 16. The impact of different sampling rates on model performance in the CIFAR-10 dataset.

Table 1. Comparative analysis of AWDP-FL and other differential privacy federated learning algorithms.

Comparison Dimension	Comparison Algorithms	Methods Used by Other Algorithms	AWDP-FL Method	Algorithm Superiority
Clipping Threshold Selection	CDP-FL [22]	Uses a static clipping threshold, leading to high communication overhead.	Utilizes adaptive clipping threshold processing.	AWDP-FL is better suited for complex scenarios, allowing dynamic adjustments based on training conditions, improving model performance and convergence speed under a wider range of conditions with lower communication overhead.
Adaptive Update Steps	ADP-FL [24]	Based on the entire model, the traditional batch mean adaptation method is used, which averages the gradients of batches.	Single-sample gradient and weight analysis is used, introducing a dual historical sequence to determine weight coefficients, and processing is performed from a model hierarchy perspective.	AWDP-FL provides more granular control, suitable for complex models or scenarios requiring high-precision optimization, utilizing gradient information and historical changes at each layer more effectively.
Gradient Update Method	FDP-FL [23]	Uses traditional first-order moment momentum updates.	Employs higher-order moment analysis to reduce the impact of the initial learning rate.	The AWDP-FL algorithm is more refined, introducing higher-order information to mitigate the effects of the initial learning rate, making it particularly suitable for complex loss functions and high-precision optimization scenarios. This algorithm accelerates convergence in the later stages of model training and improves performance.

Table 2. Relevant symbols and parameters.

Symbol	Meaning
$ε$	Privacy budget in the definition of local differential privacy
T	Total communication rounds in federated learning
q	Client sampling rate during federated learning
$α$	Learning rate of the federated learning client
$θ$	Federated learning model parameters
C	Local iterations of the federated learning client
B	Local batch size of the federated learning client
$τ_{1}^{t}$	Momentum parameter for client updates in federated learning
$τ_{2}^{t}$	Momentum parameter for client updates in federated learning
$ρ$	Momentum parameter for client updates in federated learning
M	Datasets used by clients participating in federated learning training
L	Neural network layers
P	Selected value from the adaptive clipping history norm sequence
O	Federated learning client set

Table 3. Comparison of model accuracy for the MNIST dataset across different algorithms and privacy budgets.

Algorithm	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$
NoDP-FL [29]	89.84
CDP-FL [22]	78.01	79.32	79.85
ADP-FL [24]	83.64	83.80	84.01
FDP-FL [23]	79.25	82.48	84.17
AWDP-FL	87.59	87.96	88.10

Table 4. Comparison of model accuracy for the Fashion-MNIST dataset across different algorithms and privacy budgets.

Algorithm	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$
NoDP-FL [29]	84.53
CDP-FL [22]	77.01	77.12	77.58
ADP-FL [24]	79.02	79.40	79.45
FDP-FL [23]	79.12	79.46	79.52
AWDP-FL	82.13	82.76	83.22

Table 5. Comparison of model accuracy for the CIFAR-10 dataset across different algorithms and privacy budgets.

Algorithm	$ε = 4$	$ε = 5$	$ε = 6$
NoDP-FL [29]	76.99
CDP-FL [22]	53.05	53.37	53.65
ADP-FL [24]	66.72	67.45	70.00
FDP-FL [23]	59.44	61.56	61.87
AWDP-FL	70.00	72.00	74.42

Table 6. Comparison of model loss for the MNIST dataset across different algorithms and privacy budgets.

Algorithm	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$
NoDP-FL [29]	0.3601
CDP-FL [22]	0.8341	0.8013	0.7994
ADP-FL [24]	0.6734	0.6709	0.6686
FDP-FL [23]	1.0365	0.6180	0.5644
AWDP-FL	0.4384	0.4389	0.4323

Table 7. Comparison of model loss for the Fashion-MNIST dataset across different algorithms and privacy budgets.

Algorithm/Privacy Budget	$ε = 0.4$	$ε = 0.6$	$ε = 0.8$
NoDP-FL [29]	0.4395
CDP-FL [22]	0.6817	0.6634	0.6540
ADP-FL [24]	0.6125	0.5977	0.5976
FDP-FL [23]	0.6097	0.6062	0.5959
AWDP-FL	0.5701	0.5197	0.5013

Table 8. Comparison of model loss for the CIFAR-10 dataset across different algorithms and privacy budgets.

Algorithm/Privacy Budget	$ε = 4$	$ε = 5$	$ε = 6$
NoDP-FL [29]	0.6754
CDP-FL [22]	1.4076	1.3810	1.3744
ADP-FL [24]	1.0303	1.0383	0.9643
FDP-FL [23]	1.2275	1.1925	1.1844
AWDP-FL	0.9204	0.8488	0.7916

Table 9. Comparison of algorithm effectiveness (model accuracy).

Algorithm	MNIST	Fashion-MNIST	CIFAR-10
CDP-FL [22]	90.14 ( $ϵ = 9$ )	82.66 ( $ϵ = 9$ )	65.56 ( $ϵ = 15$ )
ADP-FL [24]	90.12 ( $ϵ = 3$ )	82.41 ( $ϵ = 2$ )	67.80 ( $ϵ = 9$ )
FDP-FL [23]	88.92 ( $ϵ = 4$ )	82.72 ( $ϵ = 4$ )	66.2 ( $ϵ = 12$ )
AWDP-FL	90.75 ( $ϵ = 0.4$ )	83.52 ( $ϵ = 0.4$ )	72.89 ( $ϵ = 4$ )

Table 10. Effectiveness of different algorithms under DLG attacks on the MNIST dataset.

Indicators	NoDP-FL [29]	CDP-FL [22]	FDP-FL [23]	ADP-FL [24]	AWDP-FL
Loss Value of Sample 1	0.0001	0.0001	0.0005	0.0008	0.0121
SSIM Value of Sample 1	0.9988	0.9970	0.9752	0.9585	0.6971
Loss Value of Sample 2	0.0001	0.0001	0.0005	0.0008	0.0118
SSIM Value of Sample 2	0.9995	0.9938	0.9786	0.9604	0.7551

Table 11. Effectiveness of different algorithms under DLG attacks on the Fashion-MNIST dataset.

Indicators	NoDP-FL [29]	CDP-FL [22]	FDP-FL [23]	ADP-FL [24]	AWDP-FL
Loss Value of Sample 1	0.0000	0.0001	0.0005	0.0018	0.0120
SSIM Value of Sample 1	0.9998	0.9971	0.9829	0.9515	0.8076
Loss Value of Sample 2	0.0000	0.0001	0.0005	0.0018	0.0116
SSIM Value of Sample 2	1.0000	0.9966	0.9836	0.9385	0.7753

Table 12. Effectiveness of different algorithms under DLG attacks on the CIFAR-10 dataset.

Indicators	NoDP-FL [29]	CDP-FL [22]	FDP-FL [23]	ADP-FL [24]	AWDP-FL
Loss Value of Sample 1	0.0000	0.0037	0.0061	0.0102	0.0324
SSIM Value of Sample 1	0.9998	0.9700	0.9427	0.9043	0.8773
Loss Value of Sample 2	0.0000	0.0039	0.0054	0.0403	0.0330
SSIM Value of Sample 2	0.9981	0.9836	0.9382	0.7932	0.8007

Table 13. Analysis of model performance under different privacy budgets and local iteration counts on the MNIST dataset.

Algorithm	E = 4	E = 5	E = 7	E = 9
NoDP-FL [29]	89.34	89.56	90.14	90.35
AWDP-FL ( $ε = 0.4$ )	88.77	88.46	87.24	87.11
AWDP-FL ( $ε = 0.6$ )	88.96	89.38	89.83	89.14
AWDP-FL ( $ε = 0.8$ )	89.05	89.25	89.94	89.44

Table 14. Analysis of model performance under different privacy budgets and local iteration counts on the Fashion-MNIST dataset.

Algorithm	E = 3	E = 4	E = 5	E = 7
NoDP-FL [29]	83.54	83.70	83.71	83.98
AWDP-FL ( $ε = 0.4$ )	82.73	82.69	82.50	81.39
AWDP-FL ( $ε = 0.6$ )	83.43	83.39	82.85	82.26
AWDP-FL ( $ε = 0.8$ )	83.08	83.55	83.50	83.28

Table 15. Analysis of model performance under different privacy budgets and local iteration counts on the CIFAR-10 dataset.

Algorithm	E = 5	E = 7	E = 8	E = 10
NoDP-FL [29]	77.23	78.91	79.17	78.84
AWDP-FL ( $ε = 5$ )	75.33	73.11	72.82	70.19

Table 16. Analysis of model performance under different sampling rates on the MNIST dataset.

Algorithm	q = 0.4	q = 0.6	q = 0.8	q = 1
NoDP-FL [29]	89.52	90.14	90.62	90.83
AWDP-FL ( $ε = 0.6$ )	88.81	89.83	86.31	90.16

Table 17. Analysis of model performance under different sampling rates on the Fashion-MNIST dataset.

Algorithm	q = 0.4	q = 0.6	q = 0.8	q = 1
NoDP-FL [29]	83.35	83.54	83.43	83.41
AWDP-FL ( $ε = 0.6$ )	82.83	83.43	83.33	83.19

Table 18. Analysis of model performance under different sampling rates on the CIFAR-10 dataset.

Algorithm	q = 0.4	q = 0.6	q = 0.8	q = 1
NoDP-FL [29]	75.42	75.14	76.98	77.23
AWDP-FL ( $ε = 0.6$ )	73.10	73.69	75.19	75.33

Table 19. Comparison of model accuracy for neural network layer-wise gradient clipping optimization on the CIFAR-10 dataset.

Algorithm	$ε = 4$	$ε = 5$	$ε = 6$
NoDP-FL [29]	81.89
AWDP-FL ( $ε = 5$ , layered)	75.94	79.17	80.67
AWDP-FL ( $ε = 5$ , not layered)	70.01	74.04	76.02

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Z.; Zheng, H.; Liu, G. AWDP-FL: An Adaptive Differential Privacy Federated Learning Framework. Electronics 2024, 13, 3959. https://doi.org/10.3390/electronics13193959

AMA Style

Chen Z, Zheng H, Liu G. AWDP-FL: An Adaptive Differential Privacy Federated Learning Framework. Electronics. 2024; 13(19):3959. https://doi.org/10.3390/electronics13193959

Chicago/Turabian Style

Chen, Zhiyan, Hong Zheng, and Gang Liu. 2024. "AWDP-FL: An Adaptive Differential Privacy Federated Learning Framework" Electronics 13, no. 19: 3959. https://doi.org/10.3390/electronics13193959

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AWDP-FL: An Adaptive Differential Privacy Federated Learning Framework

Abstract

1. Introduction

2. Related Work

3. Preliminary

3.1. Federated Learning

3.2. Differential Privacy

4. Weighted Adaptive Differential Privacy Federated Learning Framework

4.1. Workflow of the Framework

4.2. Weight-Based Adaptive Gradient Clipping and Update

4.3. Privacy and Security Analysis

4.4. Algorithm Complexity Analysis

5. Experiments

5.1. Experiment Settings

5.2. Experimental Results and Analysis

5.2.1. Analysis of Algorithm Model Accuracy and Loss Values

5.2.2. Usability Analysis of the Algorithm

5.2.3. Privacy Validation of Algorithms under DLG Attack

5.2.4. The Impact of Local Iteration Counts

5.2.5. The Impact of Sampling Rate on Model Accuracy

5.2.6. The Impact of Hierarchical Gradient Clipping Optimization

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI