A Personalized Federated Learning Algorithm Based on Dynamic Weight Allocation

Liu, Yazhi; Li, Siwei; Li, Wei; Qian, Hui; Xia, Haonan

doi:10.3390/electronics14030484

Open AccessArticle

A Personalized Federated Learning Algorithm Based on Dynamic Weight Allocation

by

Yazhi Liu

,

Siwei Li

,

Wei Li

^*

,

Hui Qian

and

Haonan Xia

College of Artificial Intelligence, North China University of Science and Technology, Tangshan 063210, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(3), 484; https://doi.org/10.3390/electronics14030484

Submission received: 12 December 2024 / Revised: 19 January 2025 / Accepted: 24 January 2025 / Published: 25 January 2025

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Federated learning is a privacy-preserving distributed machine learning paradigm. However, due to client data heterogeneity, the global model trained by a traditional federated averaging algorithm often exhibits poor generalization ability. To mitigate the impact of data heterogeneity, some existing research has proposed clustered federated learning, where clients with similar data distributions are grouped together to reduce interference from dissimilar clients. However, since the data distribution of clients is unknown, determining the optimal number of clusters is difficult, leading to reduced model convergence efficiency. To address this issue, this paper proposes a personalized federated learning algorithm based on dynamic weight allocation. First, each client is allowed to obtain a global model tailored to fit its local data distribution. During the client model aggregation process, the server first computes the similarity of model updates between clients and dynamically allocates aggregation weights to client models based on these similarities. Secondly, clients use the received exclusive global model to train their local models via the personalized federated learning algorithm. Extensive experimental results demonstrate that, compared to other personalized federated learning algorithms, the proposed method effectively improves model accuracy and convergence speed.

Keywords:

federated learning; personalized federated learning; data heterogeneity; clustered federated learning; model aggregation

1. Introduction

With the rapid development of artificial intelligence, it has become common for neural network models to use large-scale data for performance optimization. However, the data in the device usually contain the sensitive information of many users. Centrally collecting data to a central server for model training will cause serious data privacy issues. To this end, Google proposed federated learning (FL) in 2017 [1,2]. As a new distributed machine learning paradigm, FL collaborates with multiple clients to train a global model while ensuring that the original data on the device are not exposed to third parties [3].

In real-world scenarios, data between different clients are often non-independent and identically distributed (Non-IID) [4], which poses a significant challenge to federated learning. Global models trained by traditional federated averaging (FedAvg) [1] often struggle to generalize effectively when the data distribution varies greatly between clients. Such differences in data distribution cause global models to perform poorly in capturing the unique features of each client’s local data, resulting in less than optimal model performance [5].

To address the problem of the statistical heterogeneity of data among clients, the existing research proposes personalized federated learning (PFL) [6] to mitigate the Non-IID problem by training a personalized model for each client that fits its own data distribution. There are various schemes for PFL, such as multi-task learning, knowledge distillation, and clustered federated learning (CFL). Among them, clustered federated learning [7] groups clients with similar data distributions and trains global models independently within each cluster to adequately fit the data distributions of similar clients within each cluster. However, the performance of some CFL algorithms [8,9] depends on the number of clusters they choose, and this value is usually obtained through empirical settings or hyperparameter optimization, leading to their limited usefulness in dealing with complex data distributions in real scenarios. In addition, most CFL algorithms mainly focus on intra-cluster client collaboration while neglecting inter-cluster collaboration, failing to fully integrate the data distribution characteristics of the clusters, and thus limiting the generalization ability of the model. Finally, existing CFL algorithms still use traditional weighted average aggregation; however, this method makes it difficult to fully capture the data distribution differences among clients, and it also makes it difficult to accurately assess the contribution of each client to the aggregation model, resulting in the performance of the global model favoring some clients. To address the above problems, we consider assigning aggregation weights based on the similarity between client models and aggregating an exclusive global model for each client that adapts to its own data distribution. This dynamic weight assignment method avoids the limitation of a pre-defined number of clusters in traditional clustering methods and also solves the difficulty of inter-group collaboration. By flexibly assigning aggregation weights to each client, it can more accurately enhance collaboration between similar clients and attenuate interference between dissimilar clients.

To this end, this paper proposes a personalized federated learning algorithm (FedDWA) based on dynamic weight assignment. The algorithm adaptively assigns aggregation weights to clients, fully considering the potential connection between client data distribution and model parameters, so that each client obtains an exclusive global model adapted to its local data distribution. Compared with the global model obtained by aggregation within each cluster in CFL, the aggregation of each client-exclusive global model takes into account the contributions among all clients, while avoiding introducing the problem of setting the number of clusters. Specifically, during the aggregation of client-exclusive global models, the server calculates the cosine similarity of model updates among clients to assign aggregation weights to the clients. The higher the inter-client similarity, the larger the corresponding aggregation weight. Subsequently, the server aggregates exclusive global models for clients based on the calculated aggregation weights. This aggregation method not only overcomes the limitation of a single global model in handling complex data but also effectively copes with the difference in data distribution among clients, thus achieving the matching of client-specific models and client-local data. In addition, considering that the exclusive global model is still affected by slight data heterogeneity, the client uses the exclusive global model as a guide model to train an additional personalized model locally, which further attenuates the effect of data heterogeneity. In summary, the main contributions of this paper are as follows:

We propose a model aggregation algorithm based on dynamic weight assignment, which assigns aggregation weights based on the cosine similarity of model updates between clients, ensuring collaboration between similar clients and attenuating interference between dissimilar clients. On this basis, additional personalized models are trained for each client, which attenuates the influence of data heterogeneity among clients.
We propose a personalized federated learning algorithm. In the local update phase, additional personalization models are trained for each client to further attenuate the impact of data heterogeneity.
A performance evaluation on five image datasets, CIFAR-10, CIFAR-100, Tiny-Imag-eNet, MNIST, and Fashion-MNIST, shows that the proposed algorithm has higher accuracy compared with seven comparison baseline algorithms.

The paper structure is organized as follows: Section 1 provides the background and a description of the problem, Section 2 discusses the related work, Section 3 describes the problem formulation in detail, Section 4 presents our solution to the problem, Section 5 details the experimental setup and results, and Section 6 concludes the whole paper.

2. Related Work

2.1. Federated Learning for Data Heterogeneous Scenarios

In order to reduce the impact of data heterogeneity, the existing research mainly focuses on optimizing the two aspects of enhancing the generalization ability of the global model and developing personalized federated learning algorithms. For example, in order to enhance the generalization ability of the global model, FedProx, proposed by Li et al. [10], uses an optimization objective with a regularization term to reduce the negative impact of system heterogeneity and data heterogeneity on the performance of the global model. The scaffold proposed by Karimireddy et al. [11] corrects the drift phenomenon when the client model is updated by controlling variables. FedAvgM, proposed by Hsu et al. [12], uses a momentum-based optimization method to update the global model to reduce the adverse impact of Non-IID data on the model training process. FedRoD, proposed by Chen et al. [13], eliminates the impact of label distribution offset by introducing a balanced Softmax loss function and improves the generalization ability of the global model. For personalized FL research, FedAvg-FT, proposed by Cheng et al. [14], regards the local model updated by the client as a personalized model and evaluates its performance. Ditto, proposed by Li et al. [15], introduces the idea of federated multi-task learning. While updating the global model, it trains an additional personalized model for each client to reduce the impact of data heterogeneity. Flow, proposed by Panchal et al. [16], improves prediction accuracy by creating a dynamic personalized model that dynamically chooses between local and global parameters based on each client’s data distribution and specific instance.

2.2. Clustered Federated Learning

Clustered federated learning not only promotes cooperation between similar clients but also ensures that the aggregated model can better adapt to the data distribution of the clients [17]. Specifically, in CFL, the server first groups the clients according to the similarity of their data distribution, and then similar clients collaborate to train the global model within the cluster. CFL can be divided into one-time clustering and iterative clustering.

One-time clustering methods cluster clients only once during the entire federated learning training process. These methods require setting a specific number of clusters before application. For example, Ghosh et al. [18] proposed a clustered federated learning algorithm based on K-Means. The algorithm first obtains the model parameters of the client through preliminary training, then uses the K-Means algorithm to cluster the clients once, and finally performs federated average aggregation within each cluster. Briggs et al. [19] proposed a federated learning algorithm based on hierarchical clustering, which uses a hierarchical clustering algorithm to cluster clients. Liu et al. [20] divided clients into preset K clusters based on the similarity of sparse vectors between clients. Yang et al. [21] proposed the G-FML algorithm, which combines personalized federated learning with clustering learning by first clustering the customers through the K-means method and then improving the accuracy of the model through PFL on the basis of clustering. The limitation of the above algorithm is that it only ensures the collaboration between clients within a cluster but does not consider the knowledge sharing of clients between clusters. It is difficult to use the global characteristics of data between clients to improve the performance of the overall model.

To address the above problems, Ghosh et al. [9] proposed an improved K-Means clustered federated learning algorithm, IFCA. In each iteration, the server dynamically adjusts the client’s cluster according to the client’s loss function value to adapt to the client’s real-time data distribution. However, this method requires the distribution of the global model corresponding to K clusters to each client to determine the client’s cluster, and the resulting communication overhead is K times that of the FedAvg algorithm. In order to further improve the scalability of CFL, Sattler et al. [17] proposed a more advanced adaptive iterative clustering algorithm, FMTL, which does not require the number of clusters to be determined in advance. In each iteration, the algorithm performs binary clustering based on the similarity of the clients and automatically adjusts the number of clusters during the iteration. Although the algorithm improves the performance of the model to a certain extent, it still introduces a lot of computational overhead. At the same time, in real scenarios, due to the server’s unknown distribution of client data, it is difficult for the clustered federated learning algorithm to infer the optimal number of clusters.

Different from the above methods, this paper proposes a personalized federated learning algorithm FedDWA, which can adaptively assign aggregation weights to each client so that each client can obtain an exclusive global model suitable for its local data distribution, thereby effectively integrating global knowledge and avoiding limiting knowledge to specific clusters. In addition, this method trains an additional personalized model for each client so that it can better fit the client’s data distribution.

3. Problem Formulation

3.1. Federated Learning Objective

In the standard federated learning framework, a central server coordinates

N

clients to jointly train a global model

w

. It is assumed that each client

i \in N

has a local dataset

D_{i}

. Federated learning aims to solve the following problems:

\min_{w} G (F_{1} (w), \dots, F_{i} (w))

(1)

where

F_{i} (w) = \frac{1}{| D_{i} |} \sum_{x \in D_{i}} f (w, x)

,

F_{i} (w)

represents the local objective of client

i

, and

f (w, x)

represents the loss function of client

i

. In FedAvg,

G (\cdot)

represents the aggregation function, which is used to weigh and aggregate the local objectives of each client. In FedAvg,

G (\cdot)

represents the aggregation function used to weigh and aggregate the local objectives of each client, where

G = \sum_{i = 1}^{N} p_{i} F_{i} (w)

,

p_{i}

is a predefined weight,

p_{i} > 0

, and the sum of all

p_{i}

is equal to 1.

Due to the diversity of client device environments, usage scenarios, and user behaviors, data distribution among clients is statistically heterogeneous. The weighted average aggregation used by the traditional federated averaging algorithm allocates aggregation weights based on the number of samples owned by the client but fails to consider the contributions of clients, resulting in the global model obtained by aggregation favoring some clients, meaning that a single global model cannot fit the data distribution of all clients.

3.2. FedDWA Objective

In real-world scenarios, compared with traditional FL to train a single global model, training additional personalized models for each client can better meet its specific data distribution and requirements. Therefore, this paper proposes a personalized federated learning algorithm for Non-IID scenarios. The algorithm determines the optimal weighted aggregation weights for each client by calculating the similarity between client models for customized aggregation. In this work, two main tasks are considered: (a) optimizing the global objective

G_{g} (\cdot)

for updating the exclusive global model

w_{g, i}

of client

i

; and (b) optimizing the local objective

F_{i} (v_{i})

for updating the personalized model

v_{i}

of client

i

. To relate these tasks, a regularization coefficient λ is introduced to control the distance between the personalized model

v_{i}

of client

i

and its exclusive global model

w_{g, i}

. The optimization problem for each client

i

is formulated by the following equations:

\min_{v_{i}} u_{i} (v_{i}; w_{g, i}) = F_{i} (v_{i}) + \frac{λ}{2} ∥ v_{i} - w_{g, i} ∥^{2}

(2)

s . t . w_{g, i} \in \underset{w}{\arg \min} G_{g} (F_{1} (w), \dots, F_{N} (w))

(3)

where, when

λ

is equal to 0, the FedDWA objective is transformed to focus on the individual client’s local objective

F_{i} (v_{i})

, and the model training process is equivalent to training only locally. On the contrary, if

λ

is infinite, the FedDWA objective is close to the global objective

G_{g} (\cdot)

, at which point the model training process is equivalent to FedAvg.

4. Overview and Implementation

4.1. Overview

The overall process of FedDWA is shown in Figure 1. First, the server sends the exclusive global model to the corresponding client, which updates the local personalized model and the exclusive global model in turn. The training process of the personalized model takes the exclusive global model as a reference and dynamically adjusts the extent to which the personalized model refers to the exclusive global model through hyperparameters

λ

. Next, the client uploads the model updates corresponding to the updated exclusive global model to the server. The server calculates the cosine similarity between the model updates uploaded by the clients and computes the corresponding aggregation weights. Finally, the server aggregates the exclusive global model for each client.

4.2. Implementation and Algorithm Description

To address the above FedDWA objective, this work alternately updates the exclusive global model

{w_{g, i}}_{i \in [N]}

and the personalized model

{v_{i}}_{i \in [N]}

of each client.

For the exclusive global model

w_{g, i}

of client

i

, when the server receives the model updates from each client, it calculates the aggregation weight based on the similarity between the model updates. Clients with higher similarity have more aggregation weight in the aggregation process. In machine learning, there are many similarity measures such as Euclidean distance, cosine distance, and Jaccard distance. Among them, Euclidean distance and cosine similarity are widely used. Euclidean distance measures the absolute distance between points in space, which is directly related to the position of the points, while cosine distance is more reflective of the difference in direction by measuring the angle between vectors. Since model parameters usually reflect the convergence direction of the model, and cosine similarity is more robust than Euclidean distance in evaluating the similarity of high-dimensional model parameters [22], cosine similarity is more appropriate than Euclidean distance in evaluating the similarity of model update vectors [23,24]. Therefore, in this paper, cosine similarity is used to measure the similarity between models. Taking the exclusive global model aggregation of client

i

as an example, the server adopts the cosine similarity to measure the similarity between the model update vectors

Δ w^{i}

and

Δ w^{j}

, which can be expressed by the following formula:

s i m i (i, j) = \frac{(Δ w^{i}, Δ w^{j})}{| | Δ w^{i} | | \cdot | | Δ w^{j} | |} (i, j \in C)

(4)

where

Δ w^{i}

and

Δ w^{j}

denote the model updates of client

i

and client

j

in a particular training round, respectively. The value of similarity

s i m i (i, j)

is in the range of [−1,1]. The closer the value is to 1, the more similar the direction of model updates between clients; the closer the value is to −1, the larger the difference in direction between model updates.

C

is the set of participating users in the current phase.

For the choice of normalization method for similarity values, the characteristics of direct scaling and Softmax are taken into account; Softmax, due to its exponential function property, can amplify the differences between larger values, thus highlighting more significantly the contribution of high similarity clients while ignoring the influence of low similarity clients. This feature is particularly useful in situations where the task needs to differentiate the importance of different clients. In contrast, direct scaling, while also capable of normalization, is insensitive to differences in similarity values, resulting in a more even distribution of weights, which may ignore the contributions of high similarity clients. Therefore, we chose Softmax as the normalization method to ensure that the differences in similarity between clients are more accurately reflected. The cosine similarity between client

i

and the model updates of all other clients is normalized by applying the softmax function [25] to obtain the aggregation weight

β_{i j}

:

β_{i j} = \frac{e^{s i m i (i, j)}}{\sum_{k \in C} e^{s i m i (i, k)}}

(5)

where

β_{i j}

denotes the aggregation weight of client

j

over client

i

at the time of aggregation.

Meanwhile, considering the weight of its own model update, the aggregation of the exclusive global model of client

i

in round

t + 1

is shown below:

w_{g, i}^{t + 1} = α \cdot w_{i}^{t} + \sum_{j \in C, j \neq i} β_{i j} \cdot w_{j}^{t}

(6)

where

w_{g, i}^{t + 1}

denotes the exclusive global model of client

i

in round

t + 1

.

w_{i}^{t}

is the model trained locally by the exclusive global model

w_{g, i}^{t}

, and

w_{j}^{t}

is the model trained by the exclusive global model of the other client

j (j \neq i)

.

α

denotes the aggregated weights of client

i

itself, and

β_{i j}

is the weights from the other clients

j

. Here, the sum of

α

and all

β_{i j}

is 1:

α + \sum_{j \in C, j \neq i} β_{i j} = 1

(7)

After aggregation, the server sends the exclusive global model to the client. Subsequently, the client performs personalization training locally. Personalization training is divided into two steps: personalized model update and exclusive global model update. In addition, to improve the efficiency of model training, the data are split into small batches and fed into the network model for training in an iterative manner. The training method used is the stochastic gradient descent (SGD) algorithm, which has high training efficiency on small batches of data. SGD first computes the error

F_{i} (w_{g, i})

between the model-predicted labels and the true labels by forward propagation, and then computes the gradient

\nabla F_{i} (w_{g, i})

of the model parameters by backpropagation in order to update the model parameters to reduce the error. The exclusive global model update process for client

i

is shown below. The updates to the exclusive global model for client

i

are as follows:

w_{i}^{t} = w_{g, i}^{t} - η \nabla F_{i} (w_{g, i}^{t})

(8)

where

η

is the learning rate and

\nabla F_{i} (w_{g, i}^{t})

denotes the gradient of the loss function for client

i

.

In order to better fit the client’s data distribution, the server trains additional personalized models for each client. The training process of the personalized model uses the exclusive global model as a reference, and the distance between the personalized model and the exclusive global model is dynamically adjusted through hyperparameters

λ

to ensure that the model finds a balance between global and personalized learning. Meanwhile, a small batch gradient descent algorithm is used in the training process to further improve the training efficiency. In addition, a regularization term is added to the cross-entropy loss function of the personalization model for updating. The local gradient update part

\nabla F_{i} (v_{i}^{t})

enables the model to better fit the client’s own data distribution, while the regularization term

λ (v_{i}^{t} - w_{g, i}^{t})

, on the other hand, achieves a balance between personalization needs and global optimization by controlling the distance between the personalization model and the global model, thus avoiding the model from overfitting the local data or deviating from the direction of global optimization. The updates to the personalized model

v_{i}^{t}

for client

i

are as follows:

v_{i}^{t + 1} = v_{i}^{t} - η (\nabla F_{i} (v_{i}^{t}) + λ (v_{i}^{t} - w_{g, i}^{t}))

(9)

where

v_{i}^{t}

denotes the personalized model for training client

i

in round

t

,

λ

is the regularization coefficient, and

w_{g, i}^{t}

denotes the exclusive global model received by client

i

in round

t

.

The specific process of FedDWA is shown in Algorithms 1 and 2. The flowchart of the FedDWA algorithm is shown in Figure 2.

Algorithm 1. FedDWA Client Algorithm.

Input: Exclusive global model

w_{g, i}^{t}

Output:

w_{i}^{t}

//exclusive global model after training

1: Client

i

receives its corresponding exclusive global model

2: for each local epoch do

3 : for each batch b = {x, y}

of D_{i}

do

4 : w_{i}^{t} = w_{g, i}^{t} - η \nabla F_{i} (w_{g, i}^{t})

5 : v_{i}^{t + 1} = v_{i}^{t} - η (\nabla F_{i} (v_{i}^{t}) + λ (v_{i}^{t} - w_{g, i}^{t}))

6: end for

7: end for

8: return

w_{i}^{t}

Algorithm 2. FedDWA Server Algorithm.

Input: client set

C

, where

| C | = N

, total communication rounds T, learning rate

η

, local epochs

E

, Client participation rate

F

.
Output: personalized models

{v_{i}}_{i \in [N]}

Initialize: exclusive global model

w_{g}^{0}

, personalized models

{v_{i}}_{i \in [N]}

1 : C_{F} \leftarrow

randomly select N \times F

clients

2 : for t = 0, \dots, T - 1

do

3: for each client i in parallel do

4 : Send the exclusive global model w_{g, i}^{t}

to client

i

.

5 : w_{i}^{t}

\leftarrow

Algorithm 1 (w_{g, i}^{t}

)

6 : for client i \in C_{p}

do

7 : s i m i (i, j) \leftarrow s i m i l a r i t y (w_{i}^{t}, w_{j}^{t})

/ / Calculate the similarity between w_{i}^{t}

and w_{j}^{t}

.

8 : β_{i j} = \frac{e^{s i m i (i, j)}}{\sum_{k \in C_{p}} e^{s i m i (i, k)}}

//Normalize to get the aggregation weight.

9: end for

10 : w_{g, i}^{t + 1} = α \cdot w_{i}^{t} + \sum_{j \in C, j \neq i} β_{i j} \cdot w_{j}^{t}

11: end for

12: end for

4.3. Computational Complexity Analysis

In the following section, we will analyze the computational complexity of the FedDWA algorithm. The FedDWA algorithm consists of two main phases: model training and exclusive global model aggregation.

Time complexity:

1.: Model Training:

In FedDWA, each client needs to use its local dataset

D_{i}

to train the personalized model

v_{i}

and the exclusive global model

w_{g, i}

. Specifically, if the number of local iterations for each client is

E

, the number of samples in the client dataset

D_{i}

is

n_{i}

, and the complexity of computing the model gradient is

O (d)

(where

d

denotes the dimensionality of the model parameter); the time complexity of training personalized models and exclusive global models for each client is

O (2 \cdot E \cdot n_{i} \cdot d)

.

2.: Exclusive global model aggregation:

The server aggregates exclusive global models from

K

clients. The aggregation process involves a weighted summation of model parameters across all clients, with a time complexity of

O (K \cdot d)

. Therefore, the overall time complexity of FedDWA is as follows:

\begin{array}{l} O (2 \cdot \sum_{i = 1}^{K} E \cdot n_{i} \cdot d + K \cdot d) \\ = O (2 \cdot E \cdot d \cdot \sum_{i = 1}^{K} n_{i} + K \cdot d) \\ \approx O (2 \cdot E \cdot d \cdot K \cdot \bar{n} + K \cdot d) \\ = O (K \cdot d \cdot (2 \cdot E \cdot \bar{n} + 1)) \\ \approx O (K \cdot d \cdot 2 \cdot E \cdot \bar{n})) \\ \approx O (K \cdot E \cdot \bar{n} \cdot d)) \end{array}

(10)

where in Equation (10),

\bar{n}

denotes the average number of samples across all clients.

Space complexity:

1.: Clients:

Each client needs to store the personalized model parameters

v_{i}

and exclusive global model parameters

w_{g, i}

. The storage space complexity of both parts is

O (d)

. In addition, during the local training process, the client may also occupy some space for storing gradient information and intermediate computation results, but these space requirements are usually small compared to the space for storing

v_{i}

and

w_{g, i}

.

Therefore, the total space complexity of each client is

O (d)

.

2.: Server:

The server needs to store the exclusive global model parameters

w_{g, i}

and the aggregated weights

β_{i j}

for each client. The space complexity of storing the exclusive global model parameters is

O (d)

and the space complexity of storing the aggregated weights is

O (K)

. Therefore, the total space complexity of the server is

O (d + K)

.

To summarize, the overall space complexity of the FedDWA algorithm is as follows:

O (K \cdot d + K) \approx O (K \cdot d)

(11)

Therefore, the time complexity of the FedDWA algorithm is

O (K \cdot E \cdot \bar{n} \cdot d)

, and the space complexity is

O (K \cdot d)

. Compared to such algorithms like FedAvg, Ditto, and IFCA, FedDWA has the same time complexity and gives preference to other algorithms in terms of performance and convergence speed.

5. Experiments and Analysis

5.1. Datasets

We used five real-world datasets to evaluate the performance of the algorithms: CIFAR-10, CIFAR-100, Tiny-ImageNet, MNIST, and Fashion-MNIST(FMNIST). The CIFAR-10 dataset consists of 60,000 RGB color images in 10 categories with a 32 × 32 pixel resolution and contains 6000 images per category, including 5000 training images and 1000 test images. CIFAR-100 is an extended version of the CIFAR-10 dataset, which is widely used in image classification tasks. This dataset contains 60,000 32 × 32 pixel color images divided into 100 categories, each category contains 600 images, and the dataset is divided into 50,000 training images and 10,000 test images. The Tiny-ImageNet dataset is a subset of the ImageNet dataset, and the categories are more diversified compared with the previous two datasets, with a total of 200 categories of image data and image resolution. The Tiny-ImageNet dataset is a subset of the ImageNet dataset with more diverse categories than the previous two, containing a total of 200 categories of image data with a resolution of 64 × 64 pixels, and each category contains 500 training samples and 50 validation samples.

The MNIST dataset consists of 70,000 grayscale images of handwritten digits in 10 classes, each with a resolution of 28 × 28 pixels. The dataset includes 60,000 training images and 10,000 test images. The FMNIST dataset consists of 70,000 grayscale images of fashion products in 10 classes, each with a resolution of 28 × 28 pixels. Similar to the MNIST dataset, FMNIST includes 60,000 training images and 10,000 test images.

In this paper, we considered two methods of data partitioning that are widely used in the field of federated learning to simulate practical heterogeneous data distribution and pathological heterogeneous data distribution, respectively. In the practical heterogeneous distribution, we controlled the distribution of data on each client through Dirichlet distribution. Among them, for the three datasets of CIFAR-10, CIFAR-100, and Tiny-ImageNet, we chose the same Dirichlet coefficient (denoted as dir) for the experiment and set the value of dir to 0.3 and 0.5, where smaller dir values indicate higher heterogeneity. Pathological heterogeneous data distribution is the earliest heterogeneous data scenario considered and studied in federated learning. In pathological heterogeneous data distribution, we only assigned 2 classes to each client from the 10 classes of CIFAR-10, MNIST, and FMNIST, and ensured that the data samples owned by each client were disjoint [1]. Due to space limitations, the complete experimental setup and experimental results under pathological heterogeneous data distribution will be provided in Appendix A and Appendix B.

5.2. Model Settings

For all datasets, a convolutional neural network model was used in this paper for pattern recognition. The model consists of a feature extractor and a classifier. The feature extractor consists of two convolutional layers, each followed by a ReLU activation function and a 2 × 2 maximum pooling layer. The first convolutional layer uses 32 5 × 5 convolutional kernels to extract basic features from the input features, and the second convolutional layer uses 64 5 × 5 convolutional kernels for feature extraction. After feature extraction, the model maps the extracted features onto a 512-dimensional vector through a fully connected layer. This vector is passed through the ReLU activation function and finally converted into the output of the model by another fully connected layer.

5.3. Baselines

In this paper, seven FL algorithms were selected for comparison, including a classical FL algorithm: FedAvg, and eight PFL algorithms, FedAvg-FT, deploy the trained global model to the client and performs a step of fine-tuning on the local data; Ditto achieves fairness and robustness by introducing a regularization term to induce the client model to be aligned with the global model; IFCA minimizes the loss function by estimating the client’s cluster identity; FedGH [26] achieves personalization by introducing a uniform global header to reconcile model differences across clients and personalization tuning on top of that; FedPAC [27] achieves personalized federated learning through feature alignment and collaborative classifiers; FedSoft [7] allows each client to belong to multiple clusters at the same time, thus enabling a more flexible collaboration structure; FedCollab [28] improves model performance by clustering clients based on the data distribution distance and the data volume to optimize the client collaboration structure; and FedALA [29] adapts to aggregate global and local models through the adaptive local aggregation module that optimizes the local objective of each client, thus solving the problem of statistical heterogeneity in federated learning.

5.4. Parameter Settings

In all experiments, this paper used the SGD optimizer to train the models, and the loss functions are all cross-entropy loss functions. The number of local training rounds for all algorithms was uniformly set to 5, the learning rate was uniformly set to 0.01, the batch size was set to 16, the total number of clients was set to 20, and the participation rate

F

was set to 1. In addition, we set the number of communication rounds to 100, 200, and 300 in the CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets, respectively. In Ditto and FedDWA, the parameter

λ

was set to 1. When FedDWA performs model aggregation, the model aggregation weight

α

was set to 0.2. In addition, the number of clusters was set to 4 in the experiments of the IFCA algorithm.

5.5. Results and Discussion

In this section, we conducted extensive experiments to determine the advantages of FedDWA in terms of model convergence speed and performance. At the same time, we demonstrate the superiority of FedDWA over other algorithms under various datasets and data heterogeneity conditions. Table 1 shows the average test accuracies achieved by FedDWA and other algorithms on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets under different data heterogeneity scenarios. Among them, the bold in the Table 1 indicates the best accuracy.The experimental results show that the personalized federated learning algorithm generally outperforms the traditional federated learning algorithm FedAvg under different datasets and data heterogeneity conditions, which further proves the effectiveness of the personalized federated learning algorithm in dealing with data heterogeneity problems. Additionally, FedDWA achieves the best average test accuracy in most cases. Even in Non-IID conditions with Dirichlet coefficients of 0.3 and 0.5, FedDWA achieves the highest accuracies of 41.83% and 35.23%, respectively, on the large and complex dataset Tiny-ImageNet.

Figure 3, Figure 4 and Figure 5 show the average test accuracy of different algorithms on the CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets. It is obvious from these figures that FedDWA significantly outperforms other algorithms in terms of accuracy. This is because the FedDWA algorithm flexibly assigns aggregation weights to each client based on the cosine similarity between client model updates, thereby aggregating an exclusive global model for each client. Compared with other algorithms, this personalized aggregation method can more accurately enhance collaboration between similar clients and reduce the interference between dissimilar clients, thereby effectively solving the Non-IID data problem. In addition, in the local update phase, FedDWA introduces the hyperparameter

λ

to adjust the distance between the local model and the exclusive global model to avoid over-reliance on the exclusive global model or the ignoring of the characteristics of local data, so as to optimize the local training process. At the same time, Figure 3, Figure 4 and Figure 5 also show that FedDWA outperforms other algorithms in terms of convergence speed. This is because its adaptive aggregation weights and hyperparameter adjustment in the local training phase significantly accelerate the convergence of the model. In addition, for relatively simple datasets like CIFAR-10, the vast majority of the algorithms are able to converge quickly within the first 20 rounds and reach an accuracy close to 75% in a short period of time. However, when facing the more complex Tiny-ImageNet dataset, the convergence speed of most algorithms slows down significantly and the convergence curves show large fluctuations. This suggests that the complexity and heterogeneity of the dataset significantly increase the difficulty of convergence of the federated learning models, leading to the difficulty of stable convergence of the models within a limited number of communication rounds. By observing Figure 4 and Figure 5, it can be seen that FedDWA demonstrates a strong advantage in dealing with complex datasets, and its final accuracy is usually better than other algorithms. This suggests that FedDWA is able to adapt more effectively to the differences in client data distribution and promote effective collaboration between similar clients through the dynamic weight allocation mechanism, thus significantly improving the accuracy of the model.

Table 2 shows the number of communication rounds required for FedDWA and the comparison algorithm to achieve the target test accuracy on the Tiny-ImageNet dataset. Where the best results are marked in bold. The symbol ‘-’ indicates that the target value is not reached within a limited number of communication rounds. As can be seen from Table 2, FedDWA achieves the target accuracy using the minimum number of communication rounds. This result shows that FedDWA exhibits significant advantages in terms of communication efficiency, which not only takes full advantage of the personalized model but also achieves competitive performance on Non-IID data while reducing the communication overhead.

5.6. Hyperparametric Analysis

In the FedDWA algorithm, the hyperparameter

λ

is used to control the degree of dependence of the personalized model on the exclusive global model to better fit the client’s local data distribution. In the experiments, the initial setting is

λ = 1

. To investigate the effect of

λ

on model performance, experiments were conducted on three datasets—CIFAR-10, CIFAR-100, and Tiny-ImageNet—using different values of

λ

: {0.3, 0.5, 1, 1.5, 2}.

In Figure 6, we show how the model performance varies for different values of the hyperparameter

λ

. Overall,

λ = 1

produces better results. When the value of

λ

is too low or too high, the performance of FedDWA decreases. When

λ

is too low, the personalized model finds it difficult to effectively utilize the information from the exclusive global model; when

λ

is too high, the model is overly reliant on the exclusive global model and ignores the properties of the local data, leading to a decrease in model adaptability. The experimental results show that, therefore, choosing a reasonable value of

λ

ensures a balance between the independent global model and the personalized model, thus improving the overall accuracy of the model.

In the FedDWA algorithm, hyperparameter

α

denotes the aggregation weight of the client’s own model when the exclusive global model is aggregated. In Figure 7, we show how the model performance varies with different values of the hyperparameter

α

. From Figure 7, we can see that either too large or too small a value of hyperparameter

α

will reduce the model testing accuracy of FedDWA. Under the Non-IID setting, FedDWA performs best when

α

is taken to be 0.2 on the CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets, which is probably due to the fact that there is an implicit grouping of the clients at this point in time situation; when

α

takes a smaller value, the collaboration between similar clients gains more on the model performance compared to the clients themselves. Therefore, in the experimental part of this paper, we set

α

to 0.2.

5.7. Impact of Participating Rates

Our experiment focused on the CIFAR-10 dataset with parameter dir = 0.3. In this experiment, the total number of clients was 60. We set the participation rate of clients to {0.05, 0.1, 0.2, 0.5}. Figure 8 and Figure 9 represent the accuracy curves of the personalized model and the global model at different participation rates, respectively. As shown in Figure 8 and Figure 9, both the personalized model and the global model of our algorithm outperform the other algorithms in terms of accuracy under different participation rates and exhibit faster convergence. It is worth noting that many algorithms show obvious convergence instability when the participation rate decreases, while our algorithm is still able to maintain a relatively stable convergence trend in this case, which fully proves its robustness in dealing with different participation rates.

5.8. Discussion on Model Similarity Measures

In this section, we compare the FedDWA algorithm using cosine similarity and Euclidean distance as model similarity metrics through experiments on the CIFAR-10 dataset and analyze their performance in average test accuracy to evaluate the effectiveness of these two metrics. The experimental results are shown in Figure 10. The experimental results show that the FedDWA algorithm using cosine similarity as the similarity metric performs better in accuracy, verifying the effectiveness of cosine similarity in measuring model similarity. This is because cosine similarity can more accurately reflect the similarity between client model updates by considering the directional differences between client model updates rather than relying solely on the absolute size of the distance. In contrast, the FedDWA algorithm using Euclidean distance is greatly affected by data heterogeneity, resulting in lower model accuracy. Therefore, this paper used cosine similarity to measure the similarity between model updates.

6. Conclusions

In this paper, we propose a personalized federated learning algorithm based on dynamic weight allocation, FedDWA. In the server aggregation phase, the algorithm enhances collaboration among similar clients and encourages them to learn more similar features by calculating the cosine similarity of model updates among clients and dynamically allocating aggregation weights based on the similarity. In the local update phase, each client trains a personalized model by adjusting the distance between the local model and the global model after receiving the exclusive global model. This not only improves the generalization ability of the model but also enables the model to better adapt to the local data distribution. The experimental results show that FedDWA shows significant improvements in convergence speed and model accuracy compared to all benchmark algorithms on the CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets. Considering the diversity of real-world data, future research will focus on using more complex and diverse non-IID datasets to further analyze the adaptability and robustness of FedDWA. In addition, considering more realistic application scenarios, we will expand the scale of experiments in the future to verify the scalability of FedDWA in large-scale client environments, especially in the case of hundreds or thousands of clients, evaluate its performance in large-scale distributed scenarios, and explore how to further reduce the algorithm’s computational and communication overhead when the number of clients surges. Finally, we also plan to introduce appropriate privacy-preserving techniques into FedDWA to further enhance the security of the joint learning system and ensure its effectiveness in real-world applications.

Author Contributions

Conceptualization, W.L.; Data curation, W.L.; Formal analysis, Y.L., W.L., and H.Q.; Funding acquisition, Y.L.; Investigation, S.L.; Methodology, Y.L., S.L., W.L., and H.X.; Project administration, Y.L. and W.L.; Resources, H.X.; Software, S.L.; Validation, Y.L.; Visualization, W.L.; Writing—original draft, S.L.; Writing—review and editing, Y.L. and S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was funded by the Science and Technology Project of the Hebei Education Department, grant number ZD2022102.

Data Availability Statement

The datasets that support the results of this study are publicly available datasets, and the use of these datasets in this work adheres to the licenses of these datasets. The CIFAR-10 dataset is available at http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz (accessed on 20 July 2024). The CIFAR-100 dataset is available at https://www.cs.toronto.edu/~kriz/cifar.html (accessed on 26 December 2024). The Tiny-ImageNet dataset is available at http://cs231n.stanford.edu/tiny-imagenet-200.zip (accessed on 26 December 2024). The MNIST dataset is available at https://www.kaggle.com/datasets/hojjatk/mnist-dataset (accessed on 20 July 2024). The FMNIST dataset is available at https://github.com/zalandoresearch/fashion-mnist (accessed on 20 July 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Experimental Settings Under Pathological Heterogeneous Data Distribution

Under pathologically heterogeneous data distribution, this paper adopted two settings, that is, the dataset was randomly divided into 20 and 100 clients, and the data of each client consisted of data corresponding to 2 random categories. After the client data was divided, the data of each client was divided into a training set and a test set. The test set and training set of all clients had the same data distribution. In addition, the experiment evaluated the performance of FedDWA under two settings. When the total number of clients was 20, assuming that 100% of the clients participate in training, the total number of global communications used for training was 100 times; when the total number of clients was 100, assuming that 20% of the clients participate in training, the total number of global communications used for training was 300 times.

Appendix B. Simulation Experimental Results Under Pathological Heterogeneous Data Distribution

This experiment compared the performance of different algorithms in terms of the average test accuracy of the client models under pathological heterogeneous data distribution. The experimental results are shown in Table A1. Among them, the bold in the Table A1 indicates the best accuracy.

Table A1. A comparison of the average test accuracy (%) of the FedDWA algorithm with baseline algorithms on the CIFAR-10, MNIST, and FMNIST datasets, with 100 clients and 20% participation, and with 20 clients and 100% participation.

Algorithm	CIFAR-10		MNIST		FMNIST
Algorithm	20%	100%	20%	100%	20%	100%
FedAvg	37.47	39.53	92.9	88.95	75.44	75.28
FedAvg-FT	78.63	83.13	99.17	99.52	98.06	98.64
Ditto	78.9	82.93	99.11	99.49	98.06	98.24
IFCA	77.6	84.4	98.17	99.2	97.33	98.41
FedGH	67.13	78.53	98.06	98.97	97.33	98.64
FedPAC	67.13	74.2	96.4	99.51	93.94	93.47
FedCollab	79.82	83.47	98.45	99.12	97.81	98.5
FedSoft	75.52	82.61	97.94	99.24	97.69	98.44
FedALA	80.4	83.01	98.61	99.41	98.12	98.68
FedDWA (Ours)	82.33	85.2	99.39	99.6	98.39	98.75

As shown in Table A1, the traditional FedAvg algorithm performs the worst, primarily because a single global model finds it difficult to adapt to different data distributions. The FedAvg-FT and Ditto algorithms achieve higher accuracy in most scenarios by locally fine-tuning the global model to capture personalized information while retaining global information. The performance of IFCA is also relatively good due to its clustering of clients with similar data distributions to generate a more adapted cluster model. However, IFCA still includes clients with large differences in data distributions during clustering, which limits the performance of the cluster model. FedPAC uses a model update strategy based on weighted aggregation and compression, but its compression mechanism may lose some key information, thus affecting accuracy. The lower accuracy of FedGH is mainly due to its strategy of separating global heads and local features, which weakens the synergy between global knowledge and local features, leading to poor performance on complex datasets. FedSoft and FedCollab improve model accuracy by employing more flexible collaboration structures. FedALA shows relatively better performance thanks to its adaptive local aggregation module, which optimizes the local objective for each client.

Nevertheless, the experimental results in Table A1 show that the proposed FedDWA algorithm outperforms other algorithms in terms of accuracy on all three datasets. FedDWA dynamically assigns weights, utilizes the similarity of client model updates, and fits an exclusive global model for each client that adapts to its data distribution, thereby avoiding performance degradation in the processing of non-IID data. Furthermore, on the MNIST and Fashion-MNIST datasets, the accuracy differences among algorithms, excluding FedAvg, are small. This is mainly due to the small amount of data per client in these datasets, which results in minimal differences in model training across clients. Finally, FedDWA consistently shows advantages across different client participation rates. When the client participation rate is 20%, FedDWA outperforms all other algorithms on all datasets, indicating that even with low client participation, FedDWA can efficiently integrate data and provide robust performance.

The training curves of the average test accuracies for FedDWA and the other six personalization algorithms are shown in Figure A1, Figure A2 and Figure A3. It can be seen that the proposed algorithm achieves faster convergence and higher final accuracy, proving the effectiveness of the model. Additionally, the FedDWA algorithm exhibits smaller fluctuations and smoother changes during model updates, indicating higher stability. In contrast, the FedPAC algorithm fluctuates more during training due to its compression mechanism, which causes an inconsistent loss of information across different iterations, affecting model update stability. Other personalized algorithms show relatively stable convergence curves with smaller fluctuations. The curves in Figure A1, Figure A2 and Figure A3 clearly demonstrate the impact of different client participation rates on performance, with higher participation rates resulting in higher accuracy and faster convergence. Even with a decrease in client participation, the FedDWA algorithm maintains good accuracy and convergence across datasets, demonstrating its strong competitiveness.

Figure A1. The accuracy curves for FedDWA compared to other baseline algorithms on the CIFAR-10 dataset: (a) 100 clients, 20% participation; (b) 20 clients, 100% participation.

Figure A2. The accuracy curves for FedDWA compared to other baseline algorithms on the MNIST dataset: (a) 100 clients, 20% participation; (b) 20 clients, 100% participation.

Figure A3. The accuracy curves for FedDWA compared to other baseline algorithms on the Fashion-MNIST dataset: (a) 100 clients, 20% participation; (b) 20 clients, 100% participation.

References

McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; Arcas, B.A. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, USA, 22 April 2017; pp. 1273–1282. [Google Scholar]
Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications. ACM Trans. Intell. Syst. Technol. 2019, 10, 1–19. [Google Scholar] [CrossRef]
Nguyen, D.C.; Ding, M.; Pathirana, P.N.; Seneviratne, A.; Li, J.; Poor, H.V. Federated learning for internet of things: A comprehensive survey. IEEE Commun. Surv. Tutor. 2021, 23, 1622–1658. [Google Scholar] [CrossRef]
Lu, Z.; Pan, H.; Dai, Y.; Si, X.; Zhang, Y. Federated learning with non-iid data: A survey. IEEE Internet Things J. 2024, 11, 19188–19209. [Google Scholar] [CrossRef]
Lee, R.; Kim, M.; Li, D.; Qiu, X.; Hospedales, T.; Huszár, F.; Lane, N. Fedl2p: Federated learning to personalize. In Proceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023), New Orleans, LA, USA, 10–16 December 2023; p. 36. [Google Scholar]
Sabah, F.; Chen, Y.; Yang, Z.; Azam, M.; Ahmad, N.; Sarwar, R. Model optimization techniques in personalized federated learning: A survey. Expert Syst. Appl. 2024, 243, 122874. [Google Scholar] [CrossRef]
Ruan, Y.; Joe-Wong, C. Fedsoft: Soft clustered federated learning with proximal local updating. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, 22 February–1 March 2022; pp. 8124–8131. [Google Scholar]
Li, C.; Li, G.; Varshney, P.K. Federated learning with soft clustering. IEEE Internet Things J. 2021, 9, 7773–7782. [Google Scholar] [CrossRef]
Ghosh, A.; Chung, J.; Yin, D.; Ramchandran, K. An efficient framework for clustered federated learning. In Proceedings of the 34th Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; Volume 33, pp. 19586–19597. [Google Scholar]
Li, T.; Sahu, A.K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V. Federated optimization in heterogeneous networks. In Proceedings of the 3rd Machine Learning and Systems Conference, Austin, TX, USA, 2–4 March 2020; Volume 2, pp. 429–450. [Google Scholar]
Karimireddy, S.P.; Kale, S.; Mohri, M.; Reddi, S.; Stich, S.; Suresh, A.T. Scaffold: Stochastic controlled averaging for federated learning. In Proceedings of the 37th International Conference on Machine Learning, Online, 13–18 July 2020; pp. 5132–5143. [Google Scholar]
Hsu, T.-M.H.; Qi, H.; Brown, M. Measuring the effects of non-identical data distribution for federated visual classification. arXiv 2019, arXiv:1909.06335. [Google Scholar]
Chen, H.-Y.; Chao, W.-L. On bridging generic and personalized federated learning for image classification. arXiv 2021, arXiv:2107.00778. [Google Scholar]
Collins, L.; Hassani, H.; Mokhtari, A.; Shakkottai, S. Fedavg with fine tuning: Local updates lead to representation learning. In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, 28 November–9 December 2022; Volume 35, pp. 10572–10586. [Google Scholar]
Li, T.; Hu, S.; Beirami, A.; Smith, V. Ditto: Fair and robust federated learning through personalization. In Proceedings of the 38th International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 6357–6368. [Google Scholar]
Panchal, K.; Choudhary, S.; Parikh, N.; Zhang, L.; Guan, H. Flow: Per-instance personalized federated learning. In Proceedings of the 37th Neural Information Processing Systems (NeurIPS 2023), New Orleans, LA, USA, 10–16 December 2023; p. 36. [Google Scholar]
Sattler, F.; Müller, K.-R.; Samek, W. Clustered federated learning: Model-agnostic distributed multitask optimization under privacy constraints. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 3710–3722. [Google Scholar] [CrossRef] [PubMed]
Ghosh, A.; Hong, J.; Yin, D.; Ramchandran, K. Robust federated learning in a heterogeneous environment. arXiv 2019, arXiv:1906.06629. [Google Scholar]
Briggs, C.; Fan, Z.; Andras, P. Federated learning with hierarchical clustering of local updates to improve training on non-IID data. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–9. [Google Scholar]
Liu, B.; Guo, Y.; Chen, X. Pfa: Privacy-preserving federated adaptation for effective model personalization. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 923–934. [Google Scholar]
Yang, L.; Huang, J.; Lin, W.; Cao, J. Personalized federated learning on non-IID data via group-based meta-learning. ACM Trans. Knowl. Discov. Data 2023, 17, 1–20. [Google Scholar] [CrossRef]
Huang, Y.; Chu, L.; Zhou, Z.; Wang, L.; Liu, J.; Pei, J.; Zhang, Y. Personalized cross-silo federated learning on non-iid data. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 7865–7873. [Google Scholar]
Wang, Z.; Fan, X.; Qi, J.; Jin, H.; Yang, P.; Shen, S.; Wang, C. Fedgs: Federated graph-based sampling with arbitrary client availability. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; pp. 10271–10278. [Google Scholar]
Xu, X.; Duan, S.; Zhang, J.; Luo, Y.; Zhang, D. Optimizing federated learning on device heterogeneity with a sampling strategy. In Proceedings of the 2021 IEEE/ACM 29th International Symposium on Quality of Service (IWQOS), Tokyo, Japan, 25–28 June 2021; pp. 1–10. [Google Scholar]
Zhu, D.; Lu, S.; Wang, M.; Lin, J.; Wang, Z. Efficient precision-adjustable architecture for softmax function in deep learning. IEEE Trans. Circuits Syst. II Express Briefs 2020, 67, 3382–3386. [Google Scholar] [CrossRef]
Yi, L.; Wang, G.; Liu, X.; Shi, Z.; Yu, H. FedGH: Heterogeneous federated learning with generalized global header. In Proceedings of the Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, ON, Canada, 29 October–3 November 2023; pp. 8686–8696. [Google Scholar]
Xu, J.; Tong, X.; Huang, S.-L. Personalized federated learning with feature alignment and classifier collaboration. arXiv 2023, arXiv:2306.11867. [Google Scholar]
Bao, W.; Wang, H.; Wu, J.; He, J. Optimizing the collaboration structure in cross-silo federated learning. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 1718–1736. [Google Scholar]
Zhang, J.; Hua, Y.; Wang, H.; Song, T.; Xue, Z.; Ma, R.; Guan, H. Fedala: Adaptive local aggregation for personalized federated learning. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; pp. 11237–11244. [Google Scholar]

Figure 1. An overview of FedDWA. Here,

R_{1 N}

represents the cosine similarity between the model updates uploaded by client 1 and client

N

, while

R_{12}

and

R_{2 N}

are defined similarly.

Figure 1. An overview of FedDWA. Here,

R_{1 N}

represents the cosine similarity between the model updates uploaded by client 1 and client

N

, while

R_{12}

and

R_{2 N}

are defined similarly.

Figure 2. The training process of the FedDWA algorithm, where the personalized model and exclusive global model parameters are iteratively updated on the client and server, respectively.

Figure 3. Test accuracy of each algorithm on CIFAR-10 dataset under different Dirichlet. (a) Comparison of accuracy at dir = 0.3. (b) Comparison of accuracy at dir = 0.5.

Figure 4. Test accuracy of each algorithm on CIFAR-100 dataset under different Dirichlet. (a) Comparison of accuracy at dir = 0.3. (b) Comparison of accuracy at dir = 0.5.

Figure 5. Test accuracy of each algorithm on Tiny-ImageNet dataset under different Dirichlet. (a) Comparison of accuracy at dir = 0.3. (b) Comparison of accuracy at dir = 0.5.

Figure 6. The impact of different hyperparameter

λ

values on the accuracy of the FedDWA model across three datasets.

Figure 6. The impact of different hyperparameter

λ

values on the accuracy of the FedDWA model across three datasets.

Figure 7. The impact of different hyperparameter

α

values on the accuracy of the FedDWA model across three datasets.

Figure 7. The impact of different hyperparameter

α

values on the accuracy of the FedDWA model across three datasets.

Figure 8. The impact of the client participating rate

F

on the personalized models. Each figure separately shows the convergence curve on the CIFAR10 dataset with

F

in {0.05, 0.1, 0.2, 0.5}. (a)

F = 0.05

. (b)

F = 0.1

. (c)

F = 0.2

. (d)

F = 0.5

.

Figure 8. The impact of the client participating rate

F

on the personalized models. Each figure separately shows the convergence curve on the CIFAR10 dataset with

F

in {0.05, 0.1, 0.2, 0.5}. (a)

F = 0.05

. (b)

F = 0.1

. (c)

F = 0.2

. (d)

F = 0.5

.

Figure 9. The impact of the client participating rate

F

on global models. Each figure separately shows the convergence curve on the CIFAR10 dataset with

F

in {0.05, 0.1, 0.2, 0.5}. (a)

F = 0.05

. (b)

F = 0.1

. (c)

F = 0.2

. (d)

F = 0.5

.

Figure 9. The impact of the client participating rate

F

on global models. Each figure separately shows the convergence curve on the CIFAR10 dataset with

F

in {0.05, 0.1, 0.2, 0.5}. (a)

F = 0.05

. (b)

F = 0.1

. (c)

F = 0.2

. (d)

F = 0.5

.

Figure 10. The test accuracy curve for the cosine similarity and Euclidean distance metric on the CIFAR-10 dataset. (a) A comparison of accuracy at dir = 0.3. (b) A comparison of accuracy at dir = 0.5.

Table 1. A comparison of the average test accuracy (%) of the FedDWA algorithm with baseline algorithms on the CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets.

Algorithm	CIFAR-10		CIFAR-100		Tiny-ImageNet
Algorithm	dir = 0.3	dir = 0.5	dir = 0.3	dir = 0.5	dir = 0.3	dir = 0.5
FedAvg	53.44	58.32	22.46	26.13	16.37	18.06
FedAvg-FT	82.87	76.14	42.95	29.09	39.79	31.28
Ditto	85.41	77.53	39.66	28.36	39.71	33.12
IFCA	85.84	79.7	42.9	30.05	36.76	33.28
FedGH	79.07	73.56	41.31	36.69	39.71	31.87
FedPAC	84.67	78.65	42.28	34.77	39.54	33.86
FedCollab	82.66	76.34	43.84	36.64	36.21	31.17
FedSoft	80.4	77.24	40.58	33.21	36.36	33.37
FedALA	83.05	77.97	42.18	36.3	34.79	33.32
FedDWA (Ours)	86.97	81.74	45.63	38.13	41.83	35.23

Table 2. Communication rounds to reach target test accuracy (C_acc(x)) under Non-IID settings.

Algorithm	Non-IID (dir = 0.3)	Non-IID (dir = 0.5)
	C_aac (36%)	C_aac (30%)
	Round	Round
FedAvg-FT	82	149
Ditto	106	62
IFCA	125	72
FedGH	131	192
FedPAC	110	91
FedCollab	153	198
FedSoft	183	85
FedALA	-	84
FedDWA (Ours)	51	48

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Li, S.; Li, W.; Qian, H.; Xia, H. A Personalized Federated Learning Algorithm Based on Dynamic Weight Allocation. Electronics 2025, 14, 484. https://doi.org/10.3390/electronics14030484

AMA Style

Liu Y, Li S, Li W, Qian H, Xia H. A Personalized Federated Learning Algorithm Based on Dynamic Weight Allocation. Electronics. 2025; 14(3):484. https://doi.org/10.3390/electronics14030484

Chicago/Turabian Style

Liu, Yazhi, Siwei Li, Wei Li, Hui Qian, and Haonan Xia. 2025. "A Personalized Federated Learning Algorithm Based on Dynamic Weight Allocation" Electronics 14, no. 3: 484. https://doi.org/10.3390/electronics14030484

APA Style

Liu, Y., Li, S., Li, W., Qian, H., & Xia, H. (2025). A Personalized Federated Learning Algorithm Based on Dynamic Weight Allocation. Electronics, 14(3), 484. https://doi.org/10.3390/electronics14030484

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Personalized Federated Learning Algorithm Based on Dynamic Weight Allocation

Abstract

1. Introduction

2. Related Work

2.1. Federated Learning for Data Heterogeneous Scenarios

2.2. Clustered Federated Learning

3. Problem Formulation

3.1. Federated Learning Objective

3.2. FedDWA Objective

4. Overview and Implementation

4.1. Overview

4.2. Implementation and Algorithm Description

4.3. Computational Complexity Analysis

5. Experiments and Analysis

5.1. Datasets

5.2. Model Settings

5.3. Baselines

5.4. Parameter Settings

5.5. Results and Discussion

5.6. Hyperparametric Analysis

5.7. Impact of Participating Rates

5.8. Discussion on Model Similarity Measures

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Experimental Settings Under Pathological Heterogeneous Data Distribution

Appendix B. Simulation Experimental Results Under Pathological Heterogeneous Data Distribution

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI