FedUB: Federated Learning Algorithm Based on Update Bias

Zhang, Hesheng; Zhang, Ping; Hu, Mingkai; Liu, Muhua; Wang, Jiechang

doi:10.3390/math12101601

Open AccessArticle

FedUB: Federated Learning Algorithm Based on Update Bias

by

Hesheng Zhang

¹,

Ping Zhang

^1,2,*,

Mingkai Hu

¹

,

Muhua Liu

¹ and

Jiechang Wang

³

¹

School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang 471023, China

²

Intelligent System Science and Technology Innovation Center, Longmen Laboratory, Luoyang 471023, China

³

Sports Big Data Center, Department of Physical Education, Zhengzhou University, Zhengzhou 450001, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(10), 1601; https://doi.org/10.3390/math12101601

Submission received: 18 April 2024 / Revised: 11 May 2024 / Accepted: 17 May 2024 / Published: 20 May 2024

(This article belongs to the Special Issue Federated Learning Strategies for Machine Learning)

Download

Browse Figures

Review Reports Versions Notes

Abstract

Federated learning, as a distributed machine learning framework, aims to protect data privacy while addressing the issue of data silos by collaboratively training models across multiple clients. However, a significant challenge to federated learning arises from the non-independent and identically distributed (non-iid) nature of data across different clients. non-iid data can lead to inconsistencies between the minimal loss experienced by individual clients and the global loss observed after the central server aggregates the local models, affecting the model’s convergence speed and generalization capability. To address this challenge, we propose a novel federated learning algorithm based on update bias (FedUB). Unlike traditional federated learning approaches such as FedAvg and FedProx, which independently update model parameters on each client before direct aggregation to form a global model, the FedUB algorithm incorporates an update bias in the loss function of local models—specifically, the difference between each round’s local model updates and the global model updates. This design aims to reduce discrepancies between local and global updates, thus aligning the parameters of locally updated models more closely with those of the globally aggregated model, thereby mitigating the fundamental conflict between local and global optima. Additionally, during the aggregation phase at the server side, we introduce a metric called the bias metric, which assesses the similarity between each client’s local model and the global model. This metric adaptively sets the weight of each client during aggregation after each training round to achieve a better global model. Extensive experiments conducted on multiple datasets have confirmed the effectiveness of the FedUB algorithm. The results indicate that FedUB generally outperforms methods such as FedDC, FedDyn, and Scaffold, especially in scenarios involving partial client participation and non-iid data distributions. It demonstrates superior performance and faster convergence in tasks such as image classification.

Keywords:

federated learning; update bias; adaptive weights; data heterogeneity; loss function; secure aggregation

MSC:

68T07

1. Introduction

Over the past decade, rapid advancements have been made in big data, cloud services, the Internet of Things, machine deep learning, and other artificial intelligence technologies [1,2,3,4,5]. However, challenges have emerged alongside these developments, with personal privacy facing unprecedented threats [6,7]. In response to the limitations and constraints of traditional privacy protection methods, federated learning has emerged as a cutting-edge solution [8,9]. Classic machine learning methods typically rely on data stored in centralized customer databases to train models [10,11]. However, this approach can lead to numerous challenges in real-world applications, such as the leakage of user privacy data [12,13]. Federated learning aims to protect data privacy while facilitating the collaborative training of machine learning models by multiple parties. Its core concept involves shifting the model training process from centralized cloud servers to various local devices, ensuring that original data always remain local. Only model gradient updates, rather than the original data themselves, are shared. This decentralized learning model significantly reduces the risk of privacy breaches.

FedAvg [14], a widely utilized federated learning aggregation algorithm proposed by McMahan et al., serves as a milestone in the field of federated learning. It established the fundamental framework of federated learning through local model training on clients and subsequent aggregation of updates to a global model. The essence of the FedAvg algorithm lies in two core steps: local training and parameter aggregation. Specifically, each client

i

independently trains a model on local data, aiming to minimize the empirical loss function:

\min_{θ} F_{i} (θ) = L (y_{i}, f (x_{i}; θ))

(1)

Subsequently, the central server averages the model parameters from all clients to obtain an updated global model:

θ^{t + 1} = \sum_{i = 1}^{N} \frac{| D_{i} |}{| D |} θ_{i}^{t + 1}

(2)

where

| D |

is the total number of data points across all clients, and

| D_{i} |

is the number of data points for client

i

.

Although the FedAvg algorithm is simple and effective, the discrepancies between local models and the global model in scenarios of uneven data distribution can lead to a decrease in global model performance. To address this issue, Tian Li et al. proposed the FedProx [15] algorithm in 2020. The FedProx algorithm introduces an additional regularization term to reduce the differences between local models and the global model. The local update formula for FedProx, which includes a regularization term for the difference in global model parameters, is as follows:

\min_{θ} F_{i} (θ) = L (y_{i}, f (x_{i}; θ)) + \frac{α}{2} | | θ - θ^{t} | |^{2}

(3)

where

α

is the regularization coefficient, and

θ^{t}

represents the global model parameters at round

t

.

Despite its effectiveness, FedProx was unable to fully address the performance degradation issue under non-iid data conditions. To further tackle the instability of learning with the FedAvg algorithm under non-iid data, Durmus and colleagues introduced FedDyn [16]. FedDyn dynamically adjusts the weight of the regularization term in each client’s objective function, aiming to solve the issue of unstable learning with the FedAvg algorithm under non-iid data conditions. This dynamic adjustment mechanism allows each client to more closely follow the update direction of the global model, theoretically improving learning performance under non-iid data distributions. However, the adjustments introduced by the FedDyn algorithm add complexity to the algorithm’s implementation and make hyperparameter tuning more challenging, as the dynamic adjustment of the regularization term weights needs to be set based on specific data distributions and the training process. Satyen and his team proposed Scaffold [17], which introduces control variables to correct the direction of client updates, reducing the bias in updates caused by data heterogeneity among clients. The design of control variables aims to ensure that each client’s update is based not only on its data optimization but also considers the update direction of other clients. This method can mitigate the impact of non-iid data on model training to some extent. However, in real-world applications, highly heterogeneous data distributions may cause control variables to inadequately capture the update differences among all clients, thereby limiting their effectiveness [18]. Liang Gao and others proposed FedDC [19], which attempts to optimize the local update process on clients by introducing local drifts, reducing information loss as the model is passed between different clients. This correction based on local model drift allows the global model to adapt to the data characteristics of each client to some extent. It can improve the model’s performance on client data but might also lead to overfitting to specific data features of some clients, thereby affecting the model’s generalization capability across the federated learning system. Additionally, each local drift in FedDC is approximated by the model of a single client after each round of training, instead of the aggregated global model, which can introduce errors that may be magnified as training progresses. Inspired by the drift idea from FedDC, we explored how to effectively handle these biases in federated learning. Unlike FedDC, which adjusts model parameters, we propose a new optimization strategy that focuses on dynamically adjusting local training strategies to directly address the bias between local model updates and global model updates. We introduce an innovative update bias quantity

r

, considering the update bias between the client model and the global model in each training round. By dynamically adjusting the bias quantity

r

and the local training strategy, and setting adaptive weights in the aggregation process based on the bias between local and global models, we ensure the performance of the final global model.

Summarizing the above, our contributions are as follows:

We propose a federated learning algorithm based on update bias, which optimizes the federated learning process by considering the differences in model parameter updates between local models and the global model during local training at the client and the aggregation process at the central server.
During the aggregation process, we set adaptive weight coefficients based on the differences between local and global models, significantly accelerating the convergence speed of the global model.
We mitigated the challenges of non-independent and identically distributed (non-iid) data. Traditional federated learning algorithms often face performance degradation with non-iid data. Our method, through a fitting strategy for model parameter update amounts, effectively alleviates the discord between local and global models caused by uneven data distribution. By adding adaptive weights for each client during the aggregation process, we achieved better model performance and stability under non-iid data conditions. The effectiveness of our method has been validated across multiple datasets.

2. Related Work

In recent years, as data privacy and security have become increasingly important, federated learning has emerged as a hot topic in the research field [20,21]. Federated learning aims to solve the problem of data silos in distributed machine learning [22,23] while protecting user data privacy. Despite significant progress, the challenge posed by non-iid data remains a critical issue in this field [24,25,26,27]. FedAvg, as a pioneering work, provided a foundational framework for federated learning by training models on multiple clients and updating the global model through parameter averaging. However, subsequent research found that FedAvg’s performance significantly decreases when client data are highly heterogeneous, mainly due to drift and gradient divergence in client updates. To address this challenge, a series of improved algorithms were proposed [14,15,16,17,19]. FedProx attempts to reduce the differences between clients and the global model caused by data heterogeneity by adding an approximate regularization term during the client training process. Although FedProx improved the model’s stability to some extent, it did not fundamentally solve the optimization problem under non-iid data. Further, Scaffold addresses the gradient divergence issue by correcting drifts between clients and the server. Moreover, FedDyn dynamically adjusts the objective function of each client by monitoring the parameter differences between each local model and the global model, aiming for a better global model. FedDC introduces auxiliary drift variables to learn the drift between local and global clients, better adapting to the characteristics of each client’s data. Although these algorithms have taken different strategies to improve issues in federated learning, they still have their limitations. Approximate regularization fundamentally contradicts the fact that the optimal solutions of local models in federated learning differ to those of the global model. Local drift, using the model after local training to approximate the model after global aggregation, introduces a certain degree of error, and these errors may be amplified during subsequent training processes. Traditional federated learning algorithms encounter substantial challenges, whereas personalized federated learning (PFL) algorithms have attracted considerable attention in recent years. Xu et al. proposed FedPAC [28], which enhances personalized model performance through feature alignment and classifier collaboration. The strength of this method lies in its effective handling of data heterogeneity among different clients, while also improving model accuracy. Zhang et al. introduced FedALA [29], an adaptive local aggregation-based personalized federated learning algorithm. The advantage of this approach is that it adapts the aggregation strategy to better meet client needs, thereby enhancing personalized model performance. Luo et al. proposed APPLE [30], which investigates the personalization issue in cross-silo federated learning. Its strength lies in adaptively learning personalized strategies for different data silos, thereby improving model adaptability. Huang et al. introduced FedAMP [31], which explores personalized cross-silo federated learning on non-iid data. The advantage of this method is its effective personalization for non-iid data, enhancing model performance. Li et al. proposed FedPHP [32], which achieves personalization through private model inheritance, enhancing model efficiency and performance. However, despite the advantages of PFL algorithms in providing personalized models for each participating node, they face greater challenges in terms of model complexity, communication cost, stability, and convergence compared to traditional federated learning algorithms. Additionally, since PFL often focuses on personalized model performance, it may perform poorly on the global model, exhibiting weaker generalization ability. In light of this, we propose a new method, FedUB, aimed at further addressing the non-iid data issue in federated learning and optimizing model performance and convergence speed in real-world non-independent and identically distributed data environments. The core idea of the FedUB algorithm is to introduce the concept of update bias, which considers the data differences between clients and dynamically adjusts the way each client participates in the global model update. By accurately calculating and adjusting the update bias, FedUB can more finely analyze the characteristics of each client’s data, thereby achieving tighter data collaboration in the training process of the global model. Additionally, during the local model aggregation phase at the central server, Juncai Liu et al. proposed a framework named FedPA [33] to address the ‘slow device problem’ caused by participating devices with weak computational or communication capabilities in federated learning. FedPA employs partial aggregation, focusing on real-time model availability, and addresses the ‘slow device problem’ by dynamically optimizing the aggregation strategy, aggregating only the models received in time to reduce the impact of slow devices. However, some issues exist; if the selected aggregation number is inappropriate, it may affect the overall model convergence speed. Simultaneously, if the number of aggregated models is too small, it may result in insufficient global model update information per round, thereby requiring more rounds to achieve the desired model accuracy. FedUB adds adaptive weights for each client during the local model aggregation process. Experimental results show that FedUB significantly accelerates the convergence speed of the global model compared to other baseline methods and improves the model’s performance and generalization ability on non-iid data.

3. Preliminaries

3.1. Traditional Federated Learning Algorithm

In federated learning, we assume there are N clients, and suppose the private local dataset of client is represented by

D_{i}

. The objective is to train a global model on the global dataset

D = \cup_{i \in [N]} D_{i}

with the aim of addressing the following objective:

θ^{*} = a r g m i n_{w} L (θ) = \sum_{i = 1}^{N} \frac{| D_{i} |}{| D |} L_{i} (θ_{i})

(4)

Here,

θ

represents the global model parameters, which depend on the empirical loss

L (θ)

calculated over the entire dataset

D

. The number of samples in the local dataset of client

i

is denoted by

| D_{i} |

, and the local empirical loss computed on client

i

’s local dataset can be represented as

L_{i} (θ_{i}) = E_{(x, y) \in D_{i}} l (θ_{i}; (x, y))

. To ensure data privacy is not compromised, raw data are not exchanged between clients. Consequently, the FedAvg algorithm was developed to enable multiple clients to collaborate with a central server in training a global model, while ensuring the privacy of the data. Specifically, in each training round of FedAvg, all clients independently optimize their model on their local datasets and upload their model parameter updates to the server. The server then merges these updates (i.e., calculates the average of these local model parameter updates) to update the global model, as follows:

θ = \sum_{i = 1}^{N} \frac{| D_{i} |}{| D |} θ_{i}

(5)

The specific process of FedAvg is as follows (Algorithm 1):

Algorithm 1. FedAvg

Server Procedure:

for global round

t = 0, 1, 2, ..., T - 1

do

S

← sample clients at random

for

i \in S

do

θ_{i}^{t^{+}}

← Client Procedure (

θ_{i}^{t}

)

end for

θ^{t^{+}}

←

\sum_{i = 1}^{S} \frac{| D_{i} |}{| D_{S} |} θ_{i}^{t^{+}}

end for

Client Procedure:

θ_{i}^{t} = θ^{t}

for local epoch

e = 1, 2, ..., K

do

Updates

θ_{i}^{t}

for e epoch of SGD on

F_{i}

with step-size

μ

to obtain

θ_{i}^{t^{+}}

end for

Return: the updated model

θ_{i}^{t^{+}}

In this framework, the global model parameters are denoted by

θ

, while the local model parameters for each client

i

are represented by

θ_{i}

. Subsequently, the updated global model parameters are distributed to each client, serving as the initial parameters for their next round of local model training. This approach ensures the continuity and consistency of model training, allowing each client to adjust their local models based on the latest global model.

3.2. The Challenges of Non-Independent and Identically Distributed Data

In typical scenarios, machine learning datasets tend to be iid. For iid (independently and identically distributed) data, this practice [14], which involves training and aggregating the model using Equations (4) and (5), can result in a better global model because each client’s data are independent and identically distributed, possessing the same distribution characteristics. Each client’s data can well represent the overall data distribution; therefore, the aggregated global model often exhibits higher accuracy and generalization capability. This is because the model can encounter a more balanced and comprehensive set of data samples during the training process.

However, when facing highly non-iid client data, the performance of the federated learning aggregated model can be significantly affected [34]. The underlying reason is that the optimal solutions of the local models on the clients may have significant differences from the optimal solution of the global model trained under the ideal state, that is, using the entire dataset. When data is highly non-iid, models trained by each client may overfit their specific data distributions, and these locally optimized models often struggle to maintain generalization capability over the entire dataset after aggregation. As shown in Figure 1, in the context of the MNIST dataset, one client only has data for the digit ‘0’. During local training, the client could capture too many specific features of the digit ‘0’, skewing the global model’s parameters, and leading to predicting digit ‘8’ as ‘0’ during testing. Thus, the model becomes overly adapted to this client’s local features, neglecting its global generalization ability.

In addition, data imbalance between clients is also a significant issue in the federated learning environment. Specifically, when some clients have much more data than others, aggregation functions in previous global models [14,15,16,17,19] tend to skew the model toward clients with larger datasets. This means the model performs better on data-rich clients but poorly on those with fewer data. Figure 1 illustrates this: if the first client only contains data with the label ‘0’ and has a significantly larger dataset than other clients, it will heavily influence the global model’s performance, weakening the global model’s generalization ability and causing incorrect predictions when other data types are encountered.

For traditional parameter averaging methods such as FedAvg, and even for FedProx, under the non-iid scenario depicted in Figure 2b, they fail to train a global model that performs well. As shown in Figure 2a, where the x-axis represents model parameters and the y-axis represents loss, the brown line illustrates the relationship between local model parameters and loss, while the blue line reflects the relationship between global model parameters and loss. In the ideal iid data conditions, the optimal solutions of the local models are close to the global model’s optimal solution, allowing both to converge effectively. However, in the actual non-iid context, the deviation between the optimal solutions of the local models and the global optimum, as illustrated in Figure 2b, suggests that even after multiple aggregations, the global model may not reach the desired performance. When the data distribution of each client has significant differences, the local optimum do not equate to the global optimum. This discrepancy prevents traditional federated learning algorithms from fully adapting to the unique characteristics of the data, thus diminishing the effectiveness of the aggregated global model.

4. The Proposed Method

To address this challenge, we propose a federated learning algorithm based on the deviation between local model updates and global model updates (FedUB). In addressing the challenges posed by non-independently and identically distributed (non-iid) datasets among clients, FedUB differs from previous methods that directly incorporate the approximation between local and global models into the

| | θ_{i} - θ | |

regularization function [15]. Instead, it incorporates the approximation between the local model updates of each round and the global model updates from the previous round into the

| | g_{i} - g | |

regularization function. In federated learning with non-iid client data, the optimal solutions of the client local models and the server’s global model are not identical, and differences exist between them. Therefore, the conventional method of adding a proximal term

| | θ_{i} - θ | |

is not entirely appropriate for non-iid scenarios. Instead, FedUB abandons the proximal term approach used in traditional federated learning algorithms and mitigates data heterogeneity issues by introducing an update bias. To the best of our knowledge, this is the first instance where an update bias has been incorporated into the objective function. Federated learning is divided into two stages: local training at the client side and global aggregation at the server. During the client’s local training phase, the server first distributes the previously aggregated global model to each client to initialize their models. After local training, each client submits their locally trained model to the server for aggregation. Following aggregation, the server begins the next round of training. As shown in Figure 2b, due to data heterogeneity, the optimal models of the client and server differ. Suppose that in a given round, the client and server models correspond to points A and B, respectively. We denote the local model parameter update for that round as

g_{i}

. Because of the lag in global model updates in federated learning, the global model parameter update from the previous round is set as

g

. The intuitive idea is to use

| | g_{i} - g | |

to approximately constrain the local and global model updates in each round, so that the models can maintain similar magnitudes and update directions during training, ultimately reaching their respective optimal points. As previously mentioned, the local and global models are not identical, and the training updates per round cannot ensure that they are completely aligned. In one round, the client and server models might correspond to points A and B, respectively, and in the next round, they might correspond to points C and D. The local and global model updates differ. Therefore, for the

t

-th round of training among the

i

clients, we introduce a compensatory term,

r

, defined as

r_{i}^{t} = r_{i}^{t - 1} + g_{i} - g

, which corresponds to the difference between the prior local model update and the global model update. This term, named the update bias, is added to the regularization function to correct the client’s local model training. Hence, the regularization function in FedUB is obtained as

| | g_{i} - g + r_{i} | |^{2}

.

Furthermore, during the aggregation phase of the global model, FedUB defines adaptive aggregation weights tailored to different local models. As previously mentioned, in unbalanced cases, some models with large datasets are assigned greater weights, which can affect the global model’s performance. Inspired by CFL [35], we define

P_{i} = \frac{〈θ_{i}^{t^{+}} \cdot θ^{t}〉}{| θ_{i}^{t^{+}} | \cdot | θ^{t} |}

in the global aggregation phase to reflect the degree of similarity between the locally trained models of different clients and those of the global model. Since the global model represents training performance across the entire dataset, local models that are more similar to the global model will receive higher weights, while models like the first client’s in Figure 1 will receive lower weights. This approach corrects the method that considers only dataset size for weight assignment, thereby enhancing the performance of the global model.

Our method fundamentally differs from FedProx, FedDC, and others. During the local training phase, FedUB allows local models to effectively learn the local data distribution while simultaneously learning the update bias to bridge the gap with the global model (Figure 3).

4.1. The Update Bias in FedUB

In FedUB’s local training, we define an update bias

r_{i}

for each client. Ideally, the update bias should represent the difference between the local model’s update amount and the global model’s update amount in this round, namely,

r_{i} = g_{i} - g

, where

g_{i}

is the local model update amount of client

i

and

g

is the global model update amount. However, in practice, since the global model update can only be calculated after all local clients have trained their respective models and uploaded them to the server, we approximate this round’s global model update using the previous round’s global model update parameters.

In practical scenarios involving non-iid data, to comprehensively consider all historical deviation information and enhance learning stability, thereby avoiding significant model performance fluctuations caused by extreme data in a particular round, we define the update bias of the

i

-th client after the

t

-th round of training as

r_{i}^{t^{+}} = r_{i}^{t} + g_{i}^{t^{+}} - g^{t}

. This bias, along with

θ_{i}^{t^{+}}

, will be transmitted to the central server. Therefore, throughout the training process, to ensure a constraint on the update bias between local and global models, we add the following penalty term for client

i

:

D_{i} (g_{i}, r_{i}, g) = | | g_{i} + r_{i} - g | |^{2}, \forall i \in [N]

(6)

Through this penalty term and the empirical loss function, each client can train their model and update the bias amount on their respective datasets.

In FedUB, the objective function for each client is composed of three parts: the empirical loss term, the penalty term, and the gradient correction term. Specifically, for the ith client, its objective function is:

F_{i} (θ_{i}, r_{i}, g_{i}, g) = L_{i} (θ_{i}) + \frac{λ}{2} D_{i} (g_{i}, r_{i}, g) + C_{i} (θ_{i}, g_{i}, g)

(7)

where

θ_{i}

is the local model of the

i

th client,

r_{i}

is the update bias, and

g_{i}

g

are the update amounts of the local and global models, respectively. Concurrently, inspired by Scaffold [17], we define the gradient correction term:

C_{i} = \frac{1}{μ K} 〈 θ_{i}, g_{i} - g 〉

, where

μ

is the learning rate,

K

is the number of iterations in a round, and

〈 \cdot, \cdot 〉

denotes the dot product between vectors. Intuitively, through the dot product operation, the gradient correction term measures the direction and magnitude of individual client model parameters

θ_{i}

relative to the local gradient bias

g_{i} - g

. Then, we can use this measure of direction and magnitude to adjust the update strategy of each client, making it more aligned with the global model’s optimization direction while reducing the variance of local gradients.

Parameter Update: The local model parameters are updated similarly to those of traditional federated learning algorithms. At the beginning of each round, the server first sends the global model parameters from the previous round to all clients. Each client

i

loads the global model parameters into their local model, i.e., setting

θ_{i} = θ

, and then updates the local model by minimizing the objective function of local training. The mini-batch gradient descent algorithm is employed during the training process. With a learning rate of

μ

, the update of local model parameters during the

t

th round and

k

th iteration is as follows:

θ_{i}^{t, k^{+}} = θ_{i}^{t, k} - μ \frac{\partial F (θ_{i}^{t, k}, r_{i}, g_{i}, g)}{\partial θ_{i}^{t, k}}

(8)

4.2. Adaptive Aggregation Weights

During global model parameter updates, it is necessary to consider the impact of update biases on the global model caused by differences between local updates and global updates. In traditional federated learning, the aggregated updates of the central server model are obtained as shown in Equation (5), by weighted averaging based on the size of each client’s dataset.

In FedUB, taking into account the model update biases and the level of divergence between the local models of participating training clients and the global model, we define the following formula to calculate the divergence metric:

P_{i} = \frac{〈θ_{i}^{t^{+}} \cdot θ^{t}〉}{|θ_{i}^{t^{+}}| \cdot |θ^{t}|}

(9)

Intuitively speaking, when the angle between a local model and the global model is small, this indicates that the two are very close, and thus, such a local model should be given a higher weight. If the angle between the local model and the global model is obtuse, then the calculated weight

P_{i}

will be negative. This means that the global model will update in the opposite direction of that local model, which can also accelerate the convergence of the global model. Therefore, we define the weight of the

i

th client as follows:

p_{i} = \frac{P_{i}}{\sum_{i = 1}^{N} P_{i}}

(10)

We define the new global model update formula as follows:

θ^{+} = \sum_{i = 1}^{N} p_{i} [\frac{|D_{i}|}{|D|} (θ_{i}^{+} + r_{i}^{+})]

(11)

Unlike previous methods, such as Equation (5), which simply aggregate local model parameters

θ_{i}^{t^{+}}

, FedUB aggregates both the local model parameters

θ_{i}^{t^{+}}

and the update bias

r_{i}^{t^{+}}

while considering adaptive weights

p_{i}

, to correct the local model parameters. Ultimately, the FedUB global model update formula is defined as shown in Equation (11). Where

p_{i}

represents the adaptive weight of the

i

th client, calculated using Equation (10).

θ^{+}

denotes the updated global model,

| D_{i} |

represents the number of samples in the dataset of the

i

th client,

| D |

represents the total number of samples in the datasets of all clients, and

θ_{i}^{+}

represents the local model parameters obtained and updated using Equation (8) by the

i

-th client after local training. Given the lag in the global model updates in federated learning, we define the update formula for

r_{i}^{+}

as

r_{i}^{+} = r_{i} + g_{i}^{+} - g

, where

g_{i}^{+} = θ_{i}^{+} - θ_{i}

and

g = θ^{t} - θ^{t - 1}

, with

g_{i}^{+}

being the current local model parameter update and

g

being the global model parameter update from the previous round. We use the global model’s update from the previous round to approximate the update for the current round.

The specific process of FedUB is as follows (Algorithm 2):

Algorithm 2. FedUB (Proposed Framework)

Server Procedure:
for global round

t = 0, 1, 2, ..., T - 1

do

S

← sample clients at random

g^{t} = θ^{t} - θ^{t - 1}

for

i \in S

do

θ_{i}^{t^{+}}, r_{i}^{t^{+}}

← Client Procedure

(θ_{i}^{t}, r_{i}^{t}, g_{i}^{t}, g^{t})

end for

P_{i} = \frac{〈θ_{i}^{t^{+}} \cdot θ^{t}〉}{|θ_{i}^{t^{+}}| \cdot |θ^{t}|}

p_{i} = \frac{P_{i}}{\sum_{i = 1}^{N} P_{i}}

θ^{t^{+}} \leftarrow \sum_{i = 1}^{S} p_{i} [\frac{|D_{i}|}{|D_{S}|} (θ_{i}^{t^{+}} + r_{i}^{t^{+}})]

end for

Client Procedure:

θ_{i}^{t} = θ^{t}

for local epoch

e = 1, 2, ..., K

do

updates

θ_{i}^{t}

for e epoch of SGD on

F_{i}

with step-size

μ

to obtain

θ_{i}^{t^{+}}

g_{i}^{t^{+}} = θ_{i}^{t^{+}} - θ_{i}^{t}

r_{i}^{t^{+}} + = g_{i}^{t^{+}} - g^{t}

end for

Return: the updated model

θ_{i}^{t^{+}}

and the update bias

r_{i}^{t^{+}}

5. Experiments and Discussion

In this section, we examine the performance of FedUB. Specifically, we compare it against several other advanced and popular federated learning algorithms on datasets such as Mnist, Cifar10, and Cifar100. Due to space limitations, more detailed experimental results will be provided in the Appendix B.

5.1. Data Sets

MNIST [36]: The MNIST dataset contains single-channel grayscale images of handwritten digits from 10 categories. It includes 60,000 images in the training set and 10,000 images in the test set, each with a dimension of 28 × 28 pixels. In total, the MNIST dataset comprises 70,000 images. This dataset serves as a benchmark for evaluating machine learning models on the task of handwritten digit recognition.
EMNIST [37]: The Extended MNIST (EMNIST) dataset is an expansion of the MNIST dataset, including handwritten digits and letters, covering a total of 47 categories (including digits, uppercase, and lowercase letters, with some categories merged to reduce confusion). It is divided into several subsets, totaling about 814,255 single-channel grayscale images, each with a dimension of 28 × 28 pixels. The EMNIST dataset maintains the format of the MNIST dataset and aims to provide a richer and more challenging data resource for handwritten text recognition tasks.
CIFAR10 [38]: The CIFAR10 dataset is a collection of small color images distributed across 10 categories, with a total of 60,000 images, each with a resolution of 32 × 32 pixels, including 50,000 for training and 10,000 for testing. The dataset is augmented through techniques such as random cropping and rotation to enhance the model’s generalization capability.
CIFAR100 [38]: Similar to CIFAR10, the CIFAR100 dataset contains 100 categories, with 600 images per category, of which 500 are for training and 100 for testing. All images are 32 × 32 pixels color images. CIFAR100 offers a richer array of categories, presenting a greater challenge to deep learning models for image recognition and classification.

For fairness in comparison and evaluation of different methods, we utilize the same neural network architectures for training. Specifically, on the MNIST and EMNIST datasets, we employ a three-layer fully connected neural network [26] for training on local clients. On the CIFAR10 and CIFAR100 datasets, we use the LeNet [39] convolutional neural network for local client training. Additionally, for each dataset, we simulate four scenarios that might be encountered in real life, to more accurately reflect the characteristics of data distribution in reality. Notably, we examine the impact of independent and identically distributed (iid) versus non-iid data settings on model training. In the iid setting, training samples are randomly selected and equally distributed among clients, ensuring that each client holds an equal amount of data, and these data are evenly distributed across all possible categories. In the non-iid setting, we use the Dirichlet distribution [40] to simulate the uneven distribution of data labels, specifically setting two different non-iid scenarios with Dirichlet distribution parameters of 0.6 and 0.3, respectively. Moreover, to further simulate the imbalance in data distribution, we generate this imbalance by sampling from a log-normal distribution and setting its variance to 0.3.

5.2. Results and Analysis

We conducted extensive experiments to determine the advantages of FedUB in terms of model convergence speed and performance. Additionally, we demonstrated the superiority of FedUB over other algorithms across various datasets and in terms of data heterogeneity.

In the MNIST, EMNIST, and CIFAR10 datasets, we set the number of communication rounds to 200, while for the CIFAR100 dataset, the number of communication rounds is set to 500. The Table 1 below shows the accuracy on different datasets for various methods with a 15% participation rate among 100 clients, where D1 indicates a Dirichlet coefficient set to 0.6, and D2 indicates a Dirichlet coefficient set to 0.3 (Figure 4). More details about the dataset settings will be elaborated on in Appendix A.

Table 1 demonstrates the superior performance of our method, especially in handling heterogeneous data, showcasing excellent performance and convergence speed. Under the unbalanced scenario of CIFAR10, it achieves an accuracy rate 0.86% higher than FedDC. Even under non-iid conditions with Dirichlet coefficients of 0.6 and 0.3, FedUB can attain accuracy rates of 82.36% and 82.17%, respectively. In scenarios of stronger heterogeneity and more extreme data environments, FedUB’s consideration of gradient differences, through global models and learning update biases with local models, enables it to outperform, maintaining its advantage over other methods.

For the CIFAR10 dataset, we employ the LeNet neural network for training on local clients. In FedUB, the hyperparameter

λ

is set to 0.01. Additionally, we set the number of iterations to 200 rounds, batch size for local training phases to 5, initial learning rate to 0.1, and decay rate to 0.998, all following prior work [16]. The results indicate that FedUB surpasses other methods in accuracy across all four settings on the CIFAR10 dataset, also showing remarkably fast convergence speed (Figure 5 and Figure 6).

Table 2 shows the rapid convergence speed of FedUB on the CIFAR100 dataset, which we attribute to the training process involving learning the update bias amount

r_{i}

and correcting and aggregating local models in consideration of

r_{i}

and gradient bias. This significantly accelerates the global model’s convergence speed. Specifically, as shown in Table 2, with 125 rounds of communication, FedUB achieves global model test accuracies of 0.4721, 0.4623, 0.4556, and 0.4721 under iid, Dirichlet coefficients of 0.6 and 0.3, and client unbalance scenarios, respectively, outperforming other methods. Compared to FedDC, which reaches accuracies of 0.472, 0.462, 0.455, and 0.472 under iid, D1, D2, and unbalance scenarios with 152, 149, 140, and 153 communication rounds, respectively, FedUB accelerates by 17.76%, 16.11%, 10.71%, and 18.3% in these four scenarios. This demonstrates FedUB’s effectiveness in speeding up convergence and reducing the number of communication rounds between clients and the server.

In MNIST, the hyperparameter is set to 0.1. Under non-iid conditions, FedUB achieves the highest accuracy among all methods. In EMNIST, the hyperparameter is set to 0.1. Under iid and non-iid conditions, FedUB achieves the highest accuracy among all methods. Additionally, FedUB demonstrates rapid convergence speed across these datasets.

Due to space limitations, the complete experimental results and the settings of hyperparameters will be provided in Appendix B.

In the following section, we will analyze the computational complexity of FedUB.

The FedUB algorithm consists of two main stages: local model training and global model aggregation.

Time Complexity:

1.: Local Model Training:

In FedUB, each client performs several local training iterations using mini-batch gradient descent (MBGD). Specifically, if the number of local iterations on each client is denoted as

E

, the number of samples in the client’s dataset is denoted as

n_{i}

, and the complexity of computing the model gradient is denoted as

O (d)

, where

d

represents the dimensionality of the model parameters, then the time complexity of local model training on each client

i

is:

O (E \cdot n_{i} \cdot d)

(12)

2.: Global Model Aggregation:

The server aggregates the local models from

K

clients. The aggregation involves summing up the model parameters and biases from all clients, with a time complexity of

O (K \cdot d)

.

Therefore, the overall time complexity of FedUB is:

\begin{array}{l} O (\sum_{i = 1}^{K} E \cdot n_{i} \cdot d + K \cdot d) \\ = O (K \cdot E \cdot \bar{n} \cdot d + K \cdot d) \\ = O (K \cdot d \cdot (E \cdot \bar{n} + 1)) \\ = O (K \cdot d \cdot E \cdot \bar{n}) \end{array}

(13)

where in Equation (13),

\bar{n}

denotes the average number of samples across all clients.

Space Complexity:

1.: Client:

Each client needs to store the model parameters

θ_{i}

and the bias

r_{i}

. The storage space for both is

O (d)

. Additionally, during local training, the client might need extra space for gradient computation and intermediate variables, but this space requirement is usually smaller compared to storing

θ_{i}

and

r_{i}

.

Therefore, the total space complexity for each client is

O (d)

.

2.: Server:

The server stores the global model parameters

θ

and the aggregation weights

p_{i}

for each client, with space complexities of

O (d)

and

O (K)

, respectively. Therefore, the total space complexity for the server is

O (d + K)

.

In summary, the overall space complexity of FedUB is:

O (K \cdot d + K) = O (K \cdot d)

(14)

Therefore, the time complexity of the FedUB algorithm is

O (K \cdot d \cdot E \cdot \bar{n})

and the space complexity is

O (K \cdot d)

. Compared to methods like FedAvg [14] and FedDC [19], FedUB has the same time complexity. Computationally, FedUB additionally calculates the update bias, and with the same time complexity, FedUB has better performance and faster convergence speed.

5.3. Discussion of FedUB

Better Performance of FedUB: Table 3 compares the test accuracies of FedUB and other baseline methods across different datasets and settings. On the CIFAR10 dataset, FedUB consistently achieves the highest test accuracy, while FedAvg and FedProx scored the lowest. For instance, in the non-iid simulation experiment where the Dirichlet coefficient is set to 0.3, FedUB’s test accuracy is 0.8217, FedDC’s accuracy is 0.8148, and FedAvg’s accuracy is 0.761. Additionally, in the unbalanced setting, FedUB achieves a 1.03% higher final test accuracy than FedDC. In 10 out of 16 trials, FedUB achieves the highest test accuracy, demonstrating its effectiveness in practical large-scale distributed settings. Table 2 compares the test accuracies of FedUB and other baseline methods on the CIFAR100 dataset at 125 rounds of communication, showing that FedUB has significantly higher accuracy than other baseline methods. Intuitively, FedUB introduces the update bias as a regularization term, optimizing the training process by considering the difference between each client’s local model update and the global model update. This approach further addresses the problem of non-iid data compared to other algorithms and, compared to FedDC, avoids the cumulative errors that drift during training may cause. Moreover, FedUB significantly accelerates the convergence speed of the global model through update bias and adaptive aggregation weights, improving the model’s generalization ability. Experimental results show that FedUB achieves higher accuracy while reducing the number of communication rounds, demonstrating strong stability and performance advantages.

Faster Convergence Speed of FedUB: We compared the convergence speed of FedUB with baseline methods on the CIFAR10 and CIFAR100 datasets using Dirichlet coefficients of 0.6 and 0.3, respectively. The results are presented in Table 3. We counted the communication rounds required by each method to reach a predetermined target accuracy, using the communication rounds needed for FedAvg to reach the target accuracy as the baseline for speed measurement. In the table, (>500) indicates that the method could not reach the target accuracy within 500 rounds, and SpeedUp values marked with (>) are calculated based on the communication rounds (>500) for FedAvg. The results show that FedUB outperforms all comparison methods in convergence speed on the CIFAR10 and CIFAR100 datasets with Dirichlet coefficients of 0.6 and 0.3. Specifically, on the CIFAR100 dataset with the Dirichlet coefficient set to 0.6, FedUB reached 0.4 accuracy in 82 communication rounds, making it more than 6.1 times faster than FedAvg. These results indicate that the FedUB algorithm is superior to baseline algorithms in terms of convergence speed.

Robustness of FedUB to Heterogeneous Data: non-iid data can slow down model convergence in practice [14]. By comparing the convergence curves in Appendix B, it is evident that data distribution significantly impacts model convergence speed and accuracy, with iid datasets converging faster than non-iid datasets. As shown in Table 3, FedUB achieves higher accuracy than baseline methods in six out of eight non-iid experiments across all datasets. Compared to other methods, training FedUB on more challenging tasks improves target accuracy, leading to greater communication savings. Specifically, for the CIFAR10 dataset, FedUB achieves 0.8236 accuracy under D1 and 0.8217 under D2 (where the Dirichlet coefficient is 0.3, more non-iid than the 0.6 setting), maintaining a competitive edge over other baseline algorithms.

Limitations of FedUB: Although FedUB significantly accelerates the convergence speed and enhances the performance of the model compared to other baseline methods [14,15,16,17,19], it still has certain limitations.

1.: Introduction of Regularization Term:

FedUB introduces a novel regularization term to the clients’ objective function. While regularization can mitigate overfitting, it also necessitates precise tuning of its weight. Failure to appropriately adjust this weight may result in suboptimal balance between generalization ability and convergence speed. Additionally, this increases the complexity of hyperparameter tuning when dealing with different datasets.

2.: Robustness in the Face of More Clients:

In our experiments, we only simulated scenarios with 100 clients. The effectiveness of FedUB in real-world situations with 500 or 1000 clients remains unclear.

3.: Weight Allocation in Model Aggregation Phase:

In the model aggregation phase of FedUB, weights are determined by the divergence metric between each client’s local model and the global model. Initially, local models that are more similar to the global model are assigned larger weights, thereby accelerating model convergence. However, in the later stages, the weights of certain local models become too small, leading to the global model learning less from the data of these clients, which, in turn, somewhat reduces the performance of the global model. This is reflected in the experiments: As shown in Table 1, the final accuracy of FedUB is lower than that of FedDC [19] in the cases of CIFAR100-iid, D1, and D2. For future improvements, introducing a weight adjustment

a_{i}

for

p_{i}

, utilizing a mixed dual-weight mechanism

a_{i} p_{i}

, could potentially accelerate convergence while avoiding the negative impact of a single weight

p_{i}

on the final accuracy of the model.

6. Conclusions

In this paper, we propose an innovative federated learning algorithm, FedUB, to address challenges posed by non-iid data in federated learning. The FedUB algorithm effectively reduces the gap between local and global updates by incorporating the update bias into the local model’s objective function. Extensive experiments demonstrate that FedUB not only accelerates model convergence but also improves its generalization across different datasets. Notably, when facing challenges like non-iid data distribution and heterogeneous client environments, FedUB maintains robustness and generalization capabilities. Additionally, during the server-side aggregation stage, FedUB considers the impact of update bias, introducing a bias metric to dynamically adjust the weights of each client, thereby speeding up convergence and enhancing the global model’s performance. However, the introduction of a new regularization term increases the complexity of hyperparameter tuning. Moreover, adaptive weighting accelerates convergence in the early stages but may compromise model performance if some local models significantly differ from the global model in later stages. In the future, FedUB can further optimize the configuration of the update bias to reduce discrepancies between local and global models. Furthermore, exploring other types of regularization terms could further enhance the algorithm’s robustness. FedUB is suitable for various real-world federated learning scenarios with heterogeneous data and client environments, such as smart healthcare and intelligent transportation.

Author Contributions

Methodology, H.Z.; software, H.Z. and M.H.; validation, P.Z.; writing—original draft preparation, H.Z. and M.H.; writing—review and editing, P.Z.; visualization, M.L.; funding acquisition, M.L. and J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Natural Science Foundation of China (No. 62102134), the Major Science and Technology Projects of Longmen Laboratory (No. 231100220300), the Key Scientific Research Project in Colleges and Universities of Henan Province of China (No. 21A510003, 23A520046, and 23A413005) and the Key Science and Technology Project of Henan Province of China (No. 222102210053, 232102210130, and 232102210138), Henan University of Science and Technology Student Innovation Key Project (No. 2023220).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interests.

Appendix A. Detailed Settings for Datasets

Datasets: We utilize real-world datasets for image classification, including MNIST, EMNIST, CIFAR10, and CIFAR100. The EMNIST dataset is an extension of MNIST, containing images of letters in addition to digits, all consisting of single-channel grayscale images with a resolution of 28 × 28 pixels. For both MNIST and EMNIST, the training set contains 60,000 samples, and the test set contains 10,000 samples. We use a fully connected neural network (FCN) for classification, which includes an input layer, two fully connected hidden layers, and an output layer. The two hidden layers each contain 200 neurons. Both the CIFAR10 and CIFAR100 datasets include 60,000 3 × 32 × 32 color images. CIFAR10 has 10 classes, while CIFAR100 has 100. For CIFAR10 and CIFAR100, the training set contains 50,000 samples, and the test set contains 10,000. The CNN architecture for image classification in CIFAR10 and CIFAR100 includes two convolutional layers with 64 5 × 5 filters each, followed by a max-pooling layer after each convolutional layer, two fully connected layers with 394 and 192 neurons, respectively, and a softmax layer for prediction.

IID Setting: The experiments include three types of data settings: one iid setting and two non-iid settings. In the iid data distribution, all clients have the same number of samples, which are independently and identically distributed across the training dataset.

Non-IID Setting: In the non-iid settings, we sample data according to the Dirichlet distribution. Under the non-iid setup, the label proportions of each client follow a Dirichlet distribution. Each client’s samples are drawn from the entire training dataset based on the label proportions following the Dirichlet distribution. The Dirichlet distribution’s hyperparameter controls the degree of data heterogeneity, and we use two types of Dirichlet distributions with hyperparameters of 0.3 and 0.6, respectively. The Dirichlet-0.3 distribution is more non-iid than the Dirichlet-0.6 distribution.

Unbalance Setting: In the unbalance setting, the sample sizes vary across clients. To create unbalanced datasets, the number of samples each client has follows a log-normal distribution. The variance of the distribution is a hyperparameter, and in the iid setting, this variance is 0. In the unbalance setting, we set the variance to 0.3.

Appendix B. Simulation Experimental Results

In our simulation experiments, we tested the performance of FedUB and compared it with five other federated learning algorithms. To more closely approximate real-world scenarios, we set up 100 clients with a 15% participation rate, conducting experiments across four different datasets and four different data heterogeneity settings. Specifically, for the MNIST, EMNIST, and CIFAR10 datasets, we set the number of communication rounds to 200, and for the CIFAR100 dataset, we set it to 500 rounds. Specifically, the experimental results for CIFAR10, as shown in Figure A1, indicate that at the end of the final training phase, FedUB’s accuracy was higher than the other methods under all four data heterogeneity settings, demonstrating rapid convergence. In the results shown in Figure A2, under the unbalanced setting for CIFAR100, FedUB’s final accuracy was 0.25% higher than that of FedDC. However, as shown in Figure A5, in the first 25% of communication rounds for the CIFAR100 dataset, FedUB’s accuracy was higher than that of the other algorithms under all four data heterogeneity settings. As illustrated in Figure A3 for the MNIST dataset, FedUB’s accuracy exceeded that of other algorithms under the Dirichlet coefficients of 0.6 and 0.3, i.e., D1 and D2 situations. As shown in Figure A4 for the EMNIST dataset, FedUB’s accuracy was higher than that of other algorithms under iid, D1, and D2 conditions. In summary, FedUB demonstrated superior performance across different datasets and data heterogeneity settings, significantly improving the algorithm’s convergence speed.

Figure A1. Experimental results of CIFAR10 dataset.

Figure A2. Experimental results of CIFAR100 dataset.

Figure A3. Experimental results of MNIST dataset.

Figure A4. Experimental results of EMNIST dataset.

Figure A5. Experimental results for the previous 25% of rounds on the CIFAR100 dataset.

Figure A6. MNIST—λ sensitivity of FedUB.

Hyperparameter Sensitivity in FedUB: In FedUB, there is only one manually set hyperparameter,

λ

, which specifically affects the penalty term

D_{i}

in

F_{i} (θ_{i}, r_{i}, g_{i}, g) = L_{i} (θ_{i}) + \frac{λ}{2} D_{i} (g_{i}, r_{i}, g) + C_{i} (θ_{i}, g_{i}, g)

. This hyperparameter controls the weight of the penalty term in the objective function during client local model training, adjusting the deviation between local updates and the global model. A larger

λ

increases the penalty’s influence, leading each client to prioritize consistency with the global model during local training, possibly at the cost of capturing local data features. Conversely, a smaller

λ

reduces the penalty, allowing for more local flexibility, enabling the model to adapt better to local data characteristics, but potentially causing deviations in the consistency of the global model.

To determine the optimal value for

λ

, we ran the FedUB algorithm with varying values of

λ

and observed changes in model performance. These experiments were conducted on the MNIST, EMNIST, CIFAR10, and CIFAR100 datasets. In the experiments, we set a range of values for

λ

and measured the accuracy and loss under each configuration. Specifically, for the EMNIST dataset, we experimented with

λ

values of [0.01, 0.05, 0.1, 0.25, 0.5, 1]. As shown in Figure A7, the performance was relatively poor with

λ

values of 0.01 and 1, while reasonable values for

λ

fell within the range [0.05, 0.25]. Therefore, we set

λ

to 0.1 for all subsequent EMNIST experiments. Similarly, we set

λ

to [0.01, 0.05, 0.1, 0.25, 0.5, 1] for the MNIST dataset, [0.001, 0.005, 0.01, 0.05, 0.5, 1] for the CIFAR10 dataset, and [0.001, 0.005, 0.01, 0.05, 0.5, 1] for the CIFAR100 dataset.

Figure A7. EMNIST—λ sensitivity of FedUB.

Figure A8. CIFAR10—λ sensitivity of FedUB.

Figure A9. CIFAR100—λ sensitivity of FedUB.

Ultimately, the optimal values of

λ

for each dataset were determined as follows: 0.1 for MNIST, 0.1 for EMNIST, 0.01 for CIFAR10, and 0.01 for CIFAR100. From the loss curves, we can see that experiments with all different values of

λ

converge to a stable point, but a carefully chosen

λ

provides better performance.

Comparison with other recent methods: We expanded our experiments to include comparisons with recent methods, including FedBR proposed by Y. Guo et al. [41], which reduces local feature and classifier bias using pseudo-data and contrastive learning. Additionally, we compared our results with the three aggregation functions proposed by Seyedsina Nabavirazavi et al. [42], namely Switch, Layered-Switch, and Weighted FedAvg. These methods significantly enhance the robustness of federated learning systems against model poisoning attacks by incorporating randomness and mixture strategies during the aggregation process.

The methods proposed by Seyedsina Nabavirazavi et al. aim to enhance the robustness of federated learning (FL), especially against model poisoning attacks, by introducing three aggregation algorithms designed to improve robustness through the use of non-deterministic and mixed aggregation functions.

FedBR employs globally shared pseudo-data, independent of label distribution, to regularize local models and reduces local feature and classifier learning bias through a min-max contrastive learning approach, significantly improving FL performance in heterogeneous data environments.

FedUB introduces update bias to dynamically adjust local training strategies, reducing the bias between local and global models, thereby enhancing model generalization capabilities in heterogeneous data and accelerating model convergence.

In Table A1, we simulated the number of communication rounds required for FedUB and FedBR to achieve the specified accuracy on the CIFAR10 and CIFAR100 datasets, with a Dirichlet coefficient of 0.1. Specifically, the target accuracy was set to 0.4 for CIFAR10 and 0.3 for CIFAR100. Additionally, we compared the final accuracy achieved by FedUB and FedBR after 200 communication rounds on the CIFAR10 dataset and 400 communication rounds on the CIFAR100 dataset. The experimental results indicate that FedUB converges faster than FedBR in the early stages but has a lower final accuracy than FedBR. This aligns with our previous analysis in Section 5.3, where we discussed the limitations of FedUB, noting that the differential weighting of clients can accelerate convergence in the early stages of training but may reduce model performance in the later stages.

Table A1. Comparison of communication time to achieve specified accuracy and final accuracy between FedUB and FedBR.

	CIFAR10		CIFAR100
Method	SpeedUp	Accuracy	SpeedUp	Accuracy
FedBR [41]	-	0.5298	-	0.4732
FedUB	1.32×	0.512	1.25×	0.4643

References

Hilbert, M. Big data for development: A review of promises and challenges. Dev. Policy Rev. 2016, 34, 135–174. [Google Scholar] [CrossRef]
Lu, Y. Artificial intelligence: A survey on evolution, models, applications and future trends. J. Manag. Anal. 2019, 6, 1–29. [Google Scholar] [CrossRef]
Stergiou, C.L.; Plageras, A.P.; Psannis, K.E.; Gupta, B.B. Secure machine learning scenario from big data in cloud computing via internet of things network. In Handbook of Computer Networks and Cyber Security: Principles and Paradigms; Springer: Berlin/Heidelberg, Germany, 2020; pp. 525–554. [Google Scholar]
Mughal, A.A. Cybersecurity Architecture for the Cloud: Protecting Network in a Virtual Environment. Int. J. Intell. Autom. Comput. 2021, 4, 35–48. [Google Scholar]
Jiang, W.; Wu, D.; Dong, W.; Ding, J.; Ye, Z.; Zeng, P.; Gao, Y. Design and validation of a non-parasitic 2R1T parallel hand-held prostate biopsy robot with remote center of motion. J. Mech. Robot. 2024, 16, 051009. [Google Scholar] [CrossRef]
Buck, L.; McDonnell, R. Security and privacy in the metaverse: The threat of the digital human. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI EA’22, Proceedings of the 1st Workshop on Novel Challenges of Safety, Security and Privacy in Extended Reality), New Orleans, LA, USA, 29 April–5 May 2022; ACM: New York, NY, USA, 2022. [Google Scholar]
Nissenbaum, H. Protecting privacy in an information age: The problem of privacy in public. In The Ethics of Information Technologies; Routledge: London, UK, 2020; pp. 141–178. [Google Scholar]
Konečný, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated learning: Strategies for improving communication efficiency. arXiv 2016, arXiv:1610.05492. [Google Scholar]
Akhtarshenas, A.; Vahedifar, M.A.; Ayoobi, N.; Maham, B.; Alizadeh, T. Federated Learning: A Cutting-Edge Survey of the Latest Advancements and Applications. arXiv 2023, arXiv:2310.05269. [Google Scholar]
Zhou, L.; Pan, S.; Wang, J.; Vasilakos, A.V. Machine learning on big data: Opportunities and challenges. Neurocomputing 2017, 237, 350–361. [Google Scholar] [CrossRef]
Heizmann, M.; Braun, A.; Glitzner, M.; Günther, M.; Hasna, G.; Klüver, C.; Krooß, J.; Marquardt, E.; Overdick, M.; Ulrich, M. Implementing machine learning: Chances and challenges. Automatisierungstechnik 2022, 70, 90–101. [Google Scholar] [CrossRef]
Boulemtafes, A.; Derhab, A.; Challal, Y. A review of privacy-preserving techniques for deep learning. Neurocomputing 2020, 384, 21–45. [Google Scholar] [CrossRef]
Zhang, J.; Chen, B.; Zhao, Y.; Cheng, X.; Hu, F. Data security and privacy-preserving in edge computing paradigm: Survey and open issues. IEEE Access 2018, 6, 18209–18237. [Google Scholar] [CrossRef]
McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the Artificial Intelligence and Statistics, Fort Lauderdale, FL, USA, 20–22 April 2017; JMLR: Cambridge, MA, USA; pp. 1273–1282. [Google Scholar]
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst. 2020, 2, 429–450. [Google Scholar]
Acar, D.A.E.; Zhao, Y.; Navarro, R.M.; Mattina, M.; Whatmough, P.N.; Saligrama, V. Federated learning based on dynamic regularization. arXiv 2021, arXiv:2111.04263. [Google Scholar]
Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; Suresh, A.T. Scaffold: Stochastic controlled averaging for federated learning. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 12–18 July 2020; ACM: New York, NY, USA; pp. 5132–5143. [Google Scholar]
Qu, L.; Zhou, Y.; Liang, P.P.; Xia, Y.; Wang, F.; Adeli, E.; Fei-Fei, L.; Rubin, D. Rethinking architecture design for tackling data heterogeneity in federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10061–10071. [Google Scholar]
Gao, L.; Fu, H.; Li, L.; Chen, Y.; Xu, M.; Xu, C.-Z. Feddc: Federated learning with non-iid data via local drift decoupling and correction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10112–10121. [Google Scholar]
Mothukuri, V.; Parizi, R.M.; Pouriyeh, S.; Huang, Y.; Dehghantanha, A.; Srivastava, G. A survey on security and privacy of federated learning. Future Gener. Comput. Syst. 2021, 115, 619–640. [Google Scholar] [CrossRef]
Rahman, A.; Hasan, K.; Kundu, D.; Islam, M.J.; Debnath, T.; Band, S.S.; Kumar, N. On the ICN-IoT with federated learning integration of communication: Concepts, security-privacy issues, applications, and future perspectives. Future Gener. Comput. Syst. 2023, 138, 61–88. [Google Scholar] [CrossRef]
Stripelis, D.; Ambite, J.L. Federated learning over harmonized data silos. In Proceedings of the International Workshop on Health Intelligence, Washington, DC, USA, 13–14 February 2023; Springer: Berlin/Heidelberg, Germany; pp. 27–41. [Google Scholar]
Huang, C.; Huang, J.; Liu, X. Cross-silo federated learning: Challenges and opportunities. arXiv 2022, arXiv:2206.12949. [Google Scholar]
Zhu, H.; Xu, J.; Liu, S.; Jin, Y. Federated learning on non-IID data: A survey. Neurocomputing 2021, 465, 371–390. [Google Scholar] [CrossRef]
Ma, X.; Zhu, J.; Lin, Z.; Chen, S.; Qin, Y. A state-of-the-art survey on solving non-IID data in Federated Learning. Future Gener. Comput. Syst. 2022, 135, 244–258. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Criado, M.F.; Casado, F.E.; Iglesias, R.; Regueiro, C.V.; Barro, S. Non-iid data and continual learning processes in federated learning: A long road ahead. Inf. Fusion 2022, 88, 263–280. [Google Scholar] [CrossRef]
Xu, J.; Tong, X.; Huang, S.-L. Personalized federated learning with feature alignment and classifier collaboration. arXiv 2023, arXiv:2306.11867. [Google Scholar]
Zhang, J.; Hua, Y.; Wang, H.; Song, T.; Xue, Z.; Ma, R.; Guan, H. Fedala: Adaptive local aggregation for personalized federated learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; pp. 11237–11244. [Google Scholar]
Luo, J.; Wu, S. Adapt to adaptation: Learning personalization for cross-silo federated learning. In Proceedings of the IJCAI: Proceedings of the Conference, Vienna, Austria, 23–29 July 2022; Morgan Kaufmann: Amsterdam, The Netherlands; p. 2166. [Google Scholar]
Huang, Y.; Chu, L.; Zhou, Z.; Wang, L.; Liu, J.; Pei, J.; Zhang, Y. Personalized cross-silo federated learning on non-iid data. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 7865–7873. [Google Scholar]
Li, X.-C.; Zhan, D.-C.; Shao, Y.; Li, B.; Song, S. Fedphp: Federated personalization with inherited private models. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Bilbao, Spain, 13–17 September 2021; Springer: Berlin/Heidelberg, Germany; pp. 587–602. [Google Scholar]
Liu, J.; Wang, J.H.; Rong, C.; Xu, Y.; Yu, T.; Wang, J. Fedpa: An adaptively partial model aggregation strategy in federated learning. Comput. Netw. 2021, 199, 108468. [Google Scholar] [CrossRef]
Li, H.; Luo, L.; Wang, H. Federated learning on non-independent and identically distributed data. In Proceedings of the Third International Conference on Machine Learning and Computer Application (ICMLCA 2022), Shenyang, China, 16–18 December 2023; SPIE: Bellingham, WA, USA; pp. 154–162. [Google Scholar]
Wang, D.; Zhang, N.; Tao, M. Adaptive clustering-based model aggregation for federated learning with imbalanced data. In Proceedings of the 2021 IEEE 22nd International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Lucca, Italy, 27–30 September 2021; IEEE: New York, NY, USA; pp. 591–595. [Google Scholar]
LeCun, Y. The MNIST Database of Handwritten Digits. 1998. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 20 December 2023).
Cohen, G.; Afshar, S.; Tapson, J.; Van Schaik, A. EMNIST: Extending MNIST to handwritten letters. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; IEEE: New York, NY, USA; pp. 2921–2926. [Google Scholar]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Yurochkin, M.; Agarwal, M.; Ghosh, S.; Greenewald, K.; Hoang, N.; Khazaeni, Y. Bayesian nonparametric federated learning of neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; ACM: New York, NY, USA; pp. 7252–7261. [Google Scholar]
Guo, Y.; Tang, X.; Lin, T. Fedbr: Improving federated learning on heterogeneous data via local learning bias reduction. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; ACM: New York, NY, USA; pp. 12034–12054. [Google Scholar]
Nabavirazavi, S.; Taheri, R.; Iyengar, S.S. Enhancing federated learning robustness through randomization and mixture. Future Gener. Comput. Syst. 2024, 158, 28–43. [Google Scholar] [CrossRef]

Figure 1. Challenges brought to federated learning by non-iid and unbalanced situations.

Figure 2. Comparison of model performance with iid and non-iid data in federated learning. (The orange curve represents the relationship between the loss function and model parameters for the client, while the cyan curve represents the same relationship for the global model. In (a), points A and B respectively represent the model parameters for the client and server after one round of training in an ideal iid situation, while points C and D represent the model parameters for the client and server after the next round of training. (b) shows the parameters for the client and server after two rounds of training in a non-iid situation.).

Figure 3. The training process of the FedUB algorithm, where local and global parameters are iteratively updated on the Client and Server, respectively.

Figure 4. The accuracy curves of various federated learning algorithms under four settings on CIFAR10. (In the figures, the labels following methods such as FedAvg and FedProx correspond to the reference numbers cited in the text. Furthermore, similar labels in other figures later in the text follow the same pattern).

Figure 5. The accuracy curves of various federated learning algorithms under four different settings on CIFAR100.

Figure 6. The accuracy curves for the previous 25% of rounds (125 rounds) of various federated learning algorithms under four different settings on CIFAR100.

Table 1. Performance results of different federated learning algorithms on multiple data sets.

	FedAvg [14]	FedProx [15]	Scaffold [17]	FedDyn [16]	FedDC [19]	FedUB
Data Sets	FedAvg [14]	FedProx [15]	Scaffold [17]	FedDyn [16]	FedDC [19]	FedUB
CIFAR10-iid	0.7911	0.7887	0.8224	0.8183	0.8322	0.8338
CIFAR10-D1	0.7794	0.7797	0.8033	0.806	0.8192	0.8236
CIFAR10-D2	0.761	0.7665	0.7905	0.7949	0.8148	0.8217
CIFAR10-unbalance	0.7928	0.7895	0.8154	0.8188	0.8317	0.8403
CIFAR100-iid	0.3935	0.3921	0.4925	0.5075	0.5468	0.5426
CIFAR100-D1	0.4058	0.4068	0.49	0.5033	0.5311	0.5271
CIFAR100_D2	0.3982	0.4039	0.49	0.4992	0.5277	0.5171
CIFAR100-unbalance	0.4029	0.4043	0.498	0.5074	0.5315	0.534
MNIST-iid	0.9806	0.9814	0.9845	0.9838	0.9836	0.984
MNIST-D1	0.979	0.9789	0.984	0.9819	0.9838	0.9843
MNIST-D2	0.9781	0.9775	0.9837	0.9828	0.9835	0.9841
MNIST-unbalance	0.9807	0.9802	0.9833	0.9835	0.9843	0.9838
EMNIST-iid	0.945	0.9457	0.9538	0.948	0.9555	0.9558
EMNIST-D1	0.9418	0.9427	0.9537	0.9473	0.9541	0.9544
EMNIST-D2	0.9373	0.9376	0.9482	0.9473	0.9523	0.9555
EMIST-unbalance	0.9445	0.9466	0.9525	0.9466	0.9573	0.9554

Table 2. The test accuracy of different federated learning algorithms on the CIFAR100 dataset under four settings at round 125 (previous 25%).

	iid	D1	D2	Unbalance
Method	iid	D1	D2	Unbalance
FedUB	0.4721	0.4626	0.4556	0.4721
FedDC [19]	0.4339	0.4342	0.4333	0.4403
FedDyn [16]	0.3905	0.3919	0.3974	0.4017
Scaffold [17]	0.4219	0.4145	0.418	0.4291
FedProx [15]	0.3263	0.3469	0.3526	0.3459
FedAvg [14]	0.3249	0.3439	0.3474	0.3406

Table 3. On the CIFAR10 and CIFAR100 datasets under non-iid settings, the number of communication rounds required by different methods to achieve the same target accuracy, and the improvement of each method relative to the baseline (FedAvg).

Method	Accuracy	Non-iid (Dirichlet 0.6)		Non-iid (Dirichlet 0.3)
Method	Accuracy	Round	Speed Up	Round	Speed Up
CIFAR10
FedAvg [14]	0.7	64	-	79	-
	0.73	86	-	109	-
	0.75	109	-	148	-
FedProx [15]	0.7	64	1.00×	78	1.01×
	0.73	89	0.97×	109	1.00×
	0.75	114	0.96×	150	0.99×
Scaffold [17]	0.7	59	1.08×	74	1.07×
	0.73	72	1.94×	92	1.18×
	0.75	92	1.18×	112	1.32×
FedDyn [16]	0.7	52	1.23×	62	1.27×
	0.73	69	1.25×	79	1.38×
	0.75	82	1.33×	97	1.53×
FedDC [19]	0.7	40	1.60×	47	1.68×
	0.73	49	1.76×	64	1.70×
	0.75	59	1.85×	70	2.11×
FedUB	0.7	36	1.78×	41	1.93×
	0.73	44	1.95×	52	2.10×
	0.75	54	2.02×	62	2.39×
CIFAR100
FedAvg [14]	0.35	138	-	130	-
	0.38	281	-	270	-
	0.4	>500	-	440	-
FedProx [15]	0.35	137	1.01×	116	1.12×
	0.38	260	1.08×	234	1.15×
	0.4	438	>1.14×	367	1.20×
Scaffold [17]	0.35	70	1.97×	71	1.83×
	0.38	89	3.16×	91	2.97×
	0.4	108	>4.63×	107	4.11×
FedDyn [16]	0.35	100	1.38×	91	1.43×
	0.38	115	2.44×	110	2.45×
	0.4	132	>3.79×	135	3.26×
FedDC [19]	0.35	75	1.84×	77	1.69×
	0.38	89	3.16×	96	2.81×
	0.4	104	>4.81×	105	4.19×
FedUB	0.35	58	2.38×	56	2.32×
	0.38	69	4.07×	71	3.80×
	0.4	82	>6.10×	79	5.57×

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, H.; Zhang, P.; Hu, M.; Liu, M.; Wang, J. FedUB: Federated Learning Algorithm Based on Update Bias. Mathematics 2024, 12, 1601. https://doi.org/10.3390/math12101601

AMA Style

Zhang H, Zhang P, Hu M, Liu M, Wang J. FedUB: Federated Learning Algorithm Based on Update Bias. Mathematics. 2024; 12(10):1601. https://doi.org/10.3390/math12101601

Chicago/Turabian Style

Zhang, Hesheng, Ping Zhang, Mingkai Hu, Muhua Liu, and Jiechang Wang. 2024. "FedUB: Federated Learning Algorithm Based on Update Bias" Mathematics 12, no. 10: 1601. https://doi.org/10.3390/math12101601

APA Style

Zhang, H., Zhang, P., Hu, M., Liu, M., & Wang, J. (2024). FedUB: Federated Learning Algorithm Based on Update Bias. Mathematics, 12(10), 1601. https://doi.org/10.3390/math12101601

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FedUB: Federated Learning Algorithm Based on Update Bias

Abstract

1. Introduction

2. Related Work

3. Preliminaries

3.1. Traditional Federated Learning Algorithm

3.2. The Challenges of Non-Independent and Identically Distributed Data

4. The Proposed Method

4.1. The Update Bias in FedUB

4.2. Adaptive Aggregation Weights

5. Experiments and Discussion

5.1. Data Sets

5.2. Results and Analysis

5.3. Discussion of FedUB

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Detailed Settings for Datasets

Appendix B. Simulation Experimental Results

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI