1. Introduction
The Internet of Things (IoT) has led to a significant increase in the number of IoT devices [
1]. These devices are being used in various domains, such as smart cities [
2], smart transportation [
3], and smart healthcare [
4]. The surge in IoT devices has also resulted in a massive influx of data, which has fueled advancements in artificial intelligence [
5]. Centralized machine learning approaches [
6,
7,
8,
9] have the potential to enable intelligent IoT applications. However, it is important to consider the privacy risks associated with centralized machine learning. This is because it involves uploading sensitive end-device data to a third-party central server.
Federated learning [
10] is a distributed machine learning approach that addresses privacy concerns by sharing model parameters instead of raw data. Initially, federated learning was proposed for implementation on edge devices [
11]. The global model is created by aggregating client model parameters, and participating clients share this global model. However, in practical scenarios, clients often have different data distributions and requirements. Therefore, a standardized global model may not cater to the individual needs of all clients. To address this challenge, personalized federated learning techniques [
12,
13,
14,
15] have emerged, aiming to tailor models for each client.
Personalized federated learning approaches typically achieve model personalization through shared parameters. However, Cao et al. [
16] argue that a uniform model structure may limit the development of personalized models. Client models are usually tailored to their specific task requirements, resulting in inherent diversity. Therefore, Cao et al. have proposed a personalized federated learning technique that utilizes generative adversarial networks. This method caters to differences in model structures among clients. However, it still requires participating clients to standardize the number and sequence of labels during training. This not only reveals the local label information of clients but also makes it impossible for clients with different classification tasks but similar training data to participate in the training. Although the tasks differ, the data types across clients are similar. For example, the MNIST dataset can be used for both binary classification tasks, such as distinguishing odd and even numbers, and decimal classification tasks, such as recognizing individual numbers. By enabling clients with different tasks but similar training data to participate in federated learning simultaneously, we can expand the range of training data and enhance the privacy and security of clients.
To address the above challenges, we propose a personalized federated learning method for labels based on image clustering. In this approach, shared samples are clustered and filtered, and sample labels are corrected. The corrected samples are then combined with local data to create a new dataset for subsequent training, thereby facilitating collaborative training between clients for label personalization. Our research makes the following key contributions:
- (1)
We pose a label-personalized federated learning problem. This means that due to different classification tasks, different clients can label the same type of data with different labels.
- (2)
We propose a label-personalized federated learning method, LPFL-GD, which enables collaborative training between clients under the label personalization condition by introducing image clustering.
- (3)
We use the image clustering method to solve the problem that the samples generated by the shared generation adversarial network cannot realize federated learning under the condition of label personalization.
- (4)
We analyze the convergence of the method theoretically and provide proof of convergence.
The remainder of this paper is organized as follows.
Section 2 reviews related research.
Section 3 formulates the label personalization problem.
Section 4 describes the proposed LPFL-GD method in detail.
Section 5 presents the details and results of the experiments.
Section 6 summarizes the paper.
2. Related Work
McMahan et al. [
10] introduced the Federal Average Algorithm (FedAvg) as a means to improve model performance on edge devices without compromising user privacy. In the FedAvg framework, the central server first selects a subset of clients to participate in training and distributes the global model to each selected client. Next, each client trains the model independently using its local data. Finally, the central server aggregates model updates from participating clients to form the global model.
In order for federated learning to converge to an optimal solution, it has traditionally relied on the assumption that data are independently and identically distributed (IID) across clients. However, real-world data distributions often deviate from this ideal, leading to suboptimal performance of federated learning models, as demonstrated by experiments conducted by Zhao et al. [
17], highlighting the challenge posed by non-IID data. As a result, numerous studies have focused on addressing the statistical challenges associated with federated learning. To mitigate the limitations of FedAvg, Li et al. [
18] proposed the FedProx algorithm, which introduces a proximal term to account for system and statistical heterogeneity. Similarly, Karimireddy et al. [
19] introduced the SCAFFOLD algorithm, which accounts for data heterogeneity through the use of control variables. Furthermore, Gao et al. [
20] presented the FedDC algorithm, which incorporates an auxiliary term to correct weight discrepancies between local and global models, effectively addressing statistical heterogeneity. In addition, Wang et al. [
21] proposed the FedMA algorithm, which uses Bayesian nonparametric methods to deal with data heterogeneity. While these approaches collectively mitigate the shortcomings of FedAvg in heterogeneous data settings using different techniques, they still adhere to the global aggregation approach of FedAvg. In practical scenarios, different client requirements and data distributions pose a challenge, making it difficult for a single global model to meet the diverse needs of all clients. In addition, different data distributions further hinder the convergence of a single global model to an optimal solution.
Personalized federated learning techniques have emerged as a promising direction to address the diverse needs of customers and the challenges posed by non-IID data. Tan et al. [
22] thoroughly analyzed the motivations behind personalized federated learning and identified potential avenues for future research. Fallah et al. [
23] proposed a local fine-tuning approach for personalized models. After training a global model, they used local data to fine-tune the global model, resulting in a personalized model tailored to the specific needs of each client. Ma et al. [
24] investigated the impact of hierarchy in the aggregation process. Their approach assessed the importance of each layer in the model for different clients, enabling personalized model aggregation for clients with statistically heterogeneous data. Huang et al. [
25] introduced a personalized cross-silo federated learning technique called FedAMP. This technique differs from traditional global model aggregation by maintaining a cloud model for each client on the central server. By using an attentive message-passing mechanism, the method encourages collaboration between similar clients, ultimately generating personalized models tailored to individual client needs. These personalized federated learning approaches offer valuable insights and techniques for improving model adaptation, accounting for data heterogeneity, and satisfying diverse client requirements. They represent significant contributions to the field and pave the way for further advances in personalized federated learning research.
However, the aforementioned methods all require clients to upload their model parameters and ensure model consistency for centralized aggregation, potentially leading to client model leakage. While certain security techniques such as differential privacy [
26,
27,
28], secure multi-party computation [
29,
30,
31], homomorphic encryption [
32,
33], and trusted execution environments [
34,
35] can provide some protection, they often incur significant communication and computational costs or rely on specialized hardware devices. In light of these challenges, Cao et al. [
16] proposed PerFed-GAN, a personalized federated learning approach based on generative adversarial networks. This method allows each client to share samples generated by the GAN. The central server collects all the generated samples and randomly selects a portion to send back to the local clients for further training. However, the need to harmonize client labeling remains a key issue. If participants have different classification tasks or label orders, the local model may receive incorrect training labels, leading to degradation in model performance.
Unlike existing methods, our work addresses collaborative learning between clients while allowing for label personalization. Even when different clients apply different labels to the same type of data, our approach has the ability to further improve model performance.
Table 1 presents information on related work, including methods, publication dates, central ideas and results.
3. Statement of Label Personalization Problem
In this section, we focus on describing the personalized federated learning problem, which aims to facilitate collaboration among a group of clients to collectively train personalized models while ensuring privacy protection.
Consider the existence of
clients
in our proposed personalization scenario. Assume that different clients have different classification criteria for the same dataset. As a result, for the same type of image, different clients may assign different labels, that is:
The represents the data of the -th type from the -th client and represents the label of .
Different model structures and different labeling of images of the same type will lead to different parameters being obtained from training. Therefore, the simple parameter aggregation methods in federated learning are not suitable for our work. Sharing samples generated by generative adversarial networks and then randomly distributing them to local clients will produce images belonging to the same category as the local images but with different labels, which will have a negative impact on the model update.
For the classification tasks described in this article, different clients label each type of image according to their own task criteria. Then, the labeled images are used to train the model
, and obtain the parameters
. Here,
represents the local model of the
-th client, and
represents the parameters of the model
. A function
is defined to evaluate the performance of the model
, where
represents a sample from the local dataset
of the
-th client and
represents the label of sample
. The goal of the client’s classification task is to optimize the parameters
of the given model
by minimizing the expectation of function
on
:
The
is the expectation function. Suppose client
obtains the optimal model parameters
through
training, that is:
For personalized federated learning, the goal is to facilitate collaboration among each client and jointly train personalized models
for each client while not exposing
to other clients
. Suppose client
obtains the optimal model parameters
through federated training, that is:
The performance of models trained through federated learning should not be worse than models trained locally, that is, when
:
For classification tasks, the model trained through federated learning should achieve higher classification accuracy compared with the local mode.
4. LPFL-GD
In this section, we introduce the main ideas of LPFL-GD, describe the choice of clustering method, and analyze the convergence of the algorithm.
Table 2 shows what the important parameters represent.
4.1. Core Idea and Algorithm Flow
The method proposed in this paper is to deal with the problem that personalized federated learning cannot adapt to the heterogeneous label environment. The central idea is to replace the sharing method of random sampling by screening samples and correcting labels. Therefore, it is necessary to use local data to train the local model and generator to obtain new parameters and . After local training is completed, the generator needs to generate a sample set containing local features instead of the local dataset and upload it to the central server. The client part of the federated training round is complete. After all the samples are uploaded by the client, the central server merges all the samples into a new collection , . Then, the most critical part of this paper is to cluster the collection using the DBSCAN clustering algorithm, divide the into several clusters, screen the clusters corresponding to the local dataset for each client, and perform label correction on the whole cluster according to the labels of the local dataset, and finally obtain the sample set for the client to expand the dataset. The is delivered to the corresponding client and merged with the local dataset . At this point, the first round of federal training is completed, and subsequent training repeats the first round of federal training until the entire federal training is completed.
As shown in
Figure 1, we outline the overall steps and process of the proposed approach.
- (1)
Train the generator and classifier using the local dataset to generate a new sample set and then upload the sample set to the central server;
- (2)
After collecting the sample sets uploaded by all clients, the central server performs clustering on all samples;
- (3)
After clustering is completed, shared samples are selected for each client and the sample labels are corrected;
- (4)
The local client receives the sample set sent by the central server, merges it with its own dataset, and continues training the local model. Then, steps 1 to 4 are repeated.
LPFL-GD is a label-personalized federated learning method and its algorithmic process is shown in Algorithm 1.
Algorithm 1: LPFL-GD |
Input: |
Output: |
1: | for each round do |
2: | for each client in parallel do |
3: | Update the parameters by training and on . |
4: | Generate sample set by . |
5: | Upload the to the central server. |
6: | end for |
7: | for each client in parallel do |
8: | Merge the sample set into . |
9: | end for |
10: | Cluster the sample set using the DNSCAN algorithm. |
11: | for each client in parallel do |
12: | Select randomly a subset of samples from . |
13: | Correct the labels in . |
14: | end for |
15: | for each client in parallel do |
16: | Merge the sample set , sent by the central server, into . |
17: | Update the parameters by training and on the new . |
18: | end for |
19: | end for |
4.2. Implementation of the Clustering Method
The purpose of introducing the clustering algorithm in this paper is to transform the central server random sampling into selecting shared samples and correcting the labels of the shared samples, and then merging the corrected samples with the local training set. This ensures that the local model learns the correct labeling information and also extends the local dataset to improve the performance of the local model.
The DBSCAN algorithm is a density-based clustering algorithm that does not require the input of the number of categories in advance. Clustering can be achieved by adjusting the parameters Eps and MINPTS. Unlike other clustering methods, this paper clusters samples generated by generative adversarial networks, rather than original data. Therefore, the existing improvement methods for the DBSCAN algorithm cannot achieve significant results in the experiments of this paper. In order to achieve better results in collaborative learning among clients, we tuned the parameters of the DBSCAN algorithm during the experiment.
4.3. Convergence Analysis
In this context, we define as the input space and as the output space. The input follows the distribution . represents the hypothesis space, , and the manually labeled results are represented by , which means the generalization error of as 0.
Definition 1. Assume that there are two classifiers .
In this paper, the generalization disagreement between them is defined as:Consequently, for :The generalization error of the classifier can be expressed as .
Assuming,
, if:This means that the probability of the disagreement between classifiers and being less than is less than . When the value of is determined, optimizing the classifier’s performance is equivalent to minimizing .
Theorem 1. Given two initialized datasets , their sizes are, respectively, .
The classifiers obtained through training on ,
denoted as ,
have generalization errors ,
respectively, with the confidence parameter . Existing methods [36,37] can achieve a classification generalization error within 0.5. The generative adversarial network (GAN) composed of classifier and generator is utilized to generate a dataset with size .
After clustering and label correcting ,
the dataset is obtained, with a generalization error of .
The combination of and results in .
Training on
results in ,
with a generalization error of .
If:
where
, thenwhere .
Proof. We use
to denote the disagreement rate of
. Similarly, we can use
to denote the disagreement rate of
, then:
The purpose of further training is to obtain a classifier with a low disagreement rate from the dataset. In order to obtain a classifier with better performance, the dataset should ensure that the possibility of a classifier , which has a generalization error greater than , having a lower disagreement rate than on is smaller than .
Due to the upper bound of the generalization error of classifier
being
, and the generalization error of clustering being
, we can obtain
. So, the probability that classifier
observes a disagreement rate smaller than
on
is less than
Further,
Since that
If
, then
When
and
, the derivative of Formula (16) leads to the conclusion that Formula (16) monotonically increases with decreasing
. Therefore,
According to the Poisson theorem, Formula (19) can be approximated as:
Assuming
, then
When
,
In conclusion, when the generalization error of
classifiers is not less than
, and their disagreement rate with
is lower than
, then
Theorem 1 is proved. □
It is not difficult to deduce based on Formulas (17) and (12) that when is sufficiently large, , meaning that in the same confidence level, the upper bound of the generalization error of classifier is smaller than that of classifier . Therefore, it can be proven that the experiment has further converged. Although is obtained by label correction through on , the samples in are still generated by a GAN network composed of classifier and a generator. Then, is obtained by further training and combined on the basis of . Therefore, can map the differences between and , and further map the differences between the training datasets for . For personalized federated learning, it is common for there to be significant differences between the datasets of different clients. So, the condition for performance improvement is usually met.
5. Experiments
In this section, we verify the effectiveness of the proposed method on different datasets and compare it with the main methods. In addition, we adjust the parameters of the DBSCAN algorithm and compare it with the main clustering algorithms.
5.1. Datasets
The datasets commonly used in personalized federated learning experiments include the benchmark datasets MNIST [
20,
25,
38,
39] and Fashion-MNIST [
20,
24,
40,
41].
MNIST: The MNIST dataset consists of a training set and a test set. The training set contains 60,000 images, while the test set contains 10,000 images. The images in the MNIST dataset are grayscale handwritten digit images with a size of 28 × 28 pixels.
Fashion-MNIST: The Fashion-MNIST dataset is a collection of fashion-related image data consisting of a training set with 60,000 samples and a test set with 10,000 samples. All samples have a size of 28 × 28 pixels and are grayscale images.
5.2. Performance Evaluation
Federated learning aims to further optimize local models, so this paper chooses relative test accuracy (RTA) as the evaluation criterion. For example, if the accuracy of the local model is 70% and the accuracy after k rounds of federated learning iterations is 90%, the relative test accuracy of federated learning would be approximately 90%/70% ≈ 1.29. In the experiment, we use the mean of the relative test accuracy of all the local models for evaluation and observe the change in the mean relative test accuracy (MRTA) after different rounds of iterations:
where
represents model accuracy.
5.3. Experimental Settings
IID Data Setting: For the IID setting, it is usually assumed that the local datasets of each client are independently and identically distributed in terms of data distribution. Taking MNIST as an example, the experimental training set is formed by uniformly and randomly selecting images from each class of the MNIST training set without repetition. The allocation of test data among classes follows the same proportion as that of training data. Each client uniformly and randomly selects images from each class of the MNIST test set without repetition to form the experimental test set.
Non-IID Data Setting: The IID setting is considered ideal, and current federated learning methods excel in such environments. However, in reality, achieving the IID environment is difficult, especially in personalized federated learning scenarios. Therefore, experiments in a non-IID environment become necessary. In this setting, the local datasets of each client have different data distributions and may exhibit significant variations. Taking MNIST as an example, for the 10 classes of MNIST samples, a local dataset might only have a subset of these classes, rather than the complete set. Furthermore, even within the classes it possesses, the quantities of samples for each class may vary. As a result, experiments in a non-IID environment are more challenging compared with those in the IID setting. This is an unavoidable issue in the context of personalized federated learning.
In both IID and non-IID settings, the amount of local data per client is 500. In the IID setup, each client contains all 10 categories, with 50 samples per category; in the non-IID setup, the maximum sample size for each category is 120 and then decreases sequentially until the sample size for some categories is 0. The purpose of this is to create local data based on the client’s data, with some categories being the majority and others having little or no data. The test and training sets have the same data distribution regardless of the IID and non-IID settings.
The experiments in this paper were performed on a 2080 ti GPU.
5.4. Performance on the MNIST Dataset
In the experiment, it was necessary to create scenarios where different clients label the same type of image differently. For the IID setting, all categories in the dataset were evenly distributed. When labeling the images, each client labels the first category as 0, and so on until the 10th category is labeled as 9. For the non-IID setting, we randomly selected 6 categories out of 10, with the first category’s label assigned to 0, the second category’s label assigned to 1, and so on, until the sixth category’s label was assigned to 5.
In this paper, experiments were conducted under IID and non-IID conditions, respectively. As shown in
Figure 2, under the IID condition, this method improved the testing accuracy of client models by 1.5–5.2%, with an average improvement of 3.24% in testing accuracy. In the non-IID experiment, the improvement brought by this method was relatively greater due to the significant variations in local data distribution and the differing benefits for each client. The testing accuracy of each client’s model increased by 2.9–13.4%, with an average improvement of 5.74% in testing accuracy. The results show that even when there are large differences in the local data distribution and different clients label the same type of data differently, our method can still substantially improve the test accuracy of the model.
5.5. Performance on the Fashion-MNIST Dataset
Compared with the MNIST dataset, the Fashion-MNIST dataset poses greater difficulty in training. The data pre-processing method is similar to the MNIST dataset, where each class is randomly assigned non-repeated labels to create an environment of label personalization. By adjusting the local training parameters, we conducted experiments in both IID and non-IID settings.
As shown in
Figure 3, the experiment obtained effective results. In the IID setting, the proposed method in this paper improved the test accuracy of the client’s models by 0.2–3.9%, with an average test accuracy improvement of 2.34%. In the non-IID setting, the method improved the test accuracy of the client’s models by 0.1–6.7%, with an average test accuracy improvement of 2.46%. The experimental results show that our method improves the performance of the model even if different clients label the same type of data inconsistently.
Analysis of the experimental results for
Figure 2 and
Figure 3: When the number of clients participating in federated learning is 1, the performance improvement in the local model is minimal because the generated samples’ features are derived from the local dataset. However, when the number of clients is greater than 1, there are differences among the local datasets, enabling mutual learning between clients. As a result, there is a relatively significant improvement in the performance of the local models. Therefore, in terms of improving the performance of the local models, the impact of differences between datasets is more significant compared with the number of clients participating in the federation.
5.6. Compare with Other Federated Learning Methods
This paper compares the experimental results with existing federated learning methods, including PerFED-GAN [
16], RHFL [
42], and Fedmd [
43].
The main work in this paper is the clustering of shared samples by the central server. The clustered shared samples are used to filter and correct the sample labels according to the categories of the local client samples in order to avoid the problem of model performance degradation due to different clients labeling the same type of samples with different labels. Therefore, comparing the mean relative test accuracy of the model of the non-clustered PerFED-GAN method with the mean relative test accuracy of the model in our experiments can reflect the importance of our method.
The different number of classes in different classification tasks leads to variations in the local model structures. There are relatively few federated learning methods that consider model heterogeneity. In this study, we compare our method with the RHFL and Fedmd methods. Due to the requirement of aligning average scores, the Fedmd method cannot achieve collaborative federated learning with different numbers of classes. In order to demonstrate the effectiveness of our method under the condition of label personalization, we keep the number of classes consistent among all local models but differentiate the label arrangements.
As shown in
Figure 4 and
Figure 5, our proposed method significantly improves the model performance under both IID and non-IID experimental conditions. In contrast, using the PerFED-GAN method leads to a rapid decline in model performance by 14–35% in terms of average accuracy.
As shown in
Figure 6 and
Figure 7, the experimental results demonstrate that our proposed method outperforms both the RHFL method and the Fedmd method, regardless of whether under IID or non-IID conditions. During the federated training process, it can even happen that the model accuracy of the RHFL method and Fedmd method is lower than the local model accuracy.
In conclusion, based on the experimental results, our method is more suitable for federated learning scenarios with label personalization compared with other methods, and it outperforms the comparison methods in terms of model performance. Particularly, the performance improvement is more significant in non-IID data settings.
Analysis of the experimental results for
Figure 4 and
Figure 5: The classifier is a part of the generative adversarial network. Generative adversarial networks are adversarial training, and in this experiment, the evaluation criterion used is the mean relative test accuracy (23), which is the average of the relative test accuracies of all clients. Therefore, the curve of mean relative test accuracy fluctuates as the number of federated learning rounds increases. In this experiment, the clients retain the model from each round of federated learning. After the completion of federated training, the model with the highest performance is selected as the final trained model.
Analysis of the PerFED-GAN model’s sharp performance degradation as the number of federated learning rounds increases: In a heterogeneous label environment, different clients label the same class of images differently. In the process of federated learning, the central server aggregates the samples uploaded by different clients and then randomly assigns them to the local server. As a result, the labels of the locally obtained images are inconsistent with the labels assigned by the local images, leading to the phenomenon of the performance of the local model being degraded during the training process. In addition, the labels of the locally obtained images in each round are not only different from the locally owned ones but also different from the previous rounds, which leads to a more obvious performance degradation of the model.
5.7. Parameter Tuning of Clustering Algorithm
The goal of the clustering algorithm is to generate pseudo-samples generated by the adversarial network, not the original data. Therefore, the existing algorithm parameters cannot be directly applied to this experiment and need to be further adapted. Since the clustering target is not the original data and the generated samples are random, conventional evaluation indices cannot be used to evaluate the clustering algorithm in this experiment. Therefore, the focus of parameter tuning is mainly concentrated on two aspects: 1. the number of clusters produced by clustering; and 2. whether the samples contained in each cluster belong to the same category.
Using the Fashion-MNIST dataset as an example, a generator is used to generate a set of samples for debugging the parameters of the DBSCAN algorithm. The sample set consists of 500 images, including all 10 categories. Since the image is generated by the adversarial network, the quality of the sample will necessarily be lower than the quality of the original image. Therefore, the common parameters of the DBSCAN algorithm cannot cluster this group of samples. It is found that the default parameter Eps is 0.5, and when Eps is greater than 0.3, all samples will converge into a cluster. The adjustment range of Eps parameters is 0.05~0.3, and the change amount of each debugging is 0.05. The MINPTS parameter cannot be set too high or too low. The default value is 5. In addition to the default values, this experiment also tests the case of 10 and 15. The corresponding classification results of Eps and MINPTS parameter selection are shown in
Table 3. Refer to the number of categories in
Table 3 and observe the corresponding classification results. Finally, Eps = 0.1 and MINPTS = 5 were selected as experimental parameters.
5.8. Comparison with Other Clustering Algorithms
Figure 8 shows the comparison results of the DBSCAN algorithm used in this paper with the TSCM algorithm [
44], the K-DBSCAN algorithm [
45], and the K-means++ algorithm [
46]. The experimental results show that our tuning of the parameters of the DBSCAN algorithm improves the mean relative test accuracy of the model more significantly than other algorithms.
6. Conclusions
In this paper, we propose the Federated Pseudo-Sample Clustering Algorithm (LPFL-GD). Based on the samples generated by the shared generative adversarial network, the DBSCAN algorithm is introduced on the central server to divide the shared samples into clusters of different categories through image clustering, and to adjust the labels in the clusters according to different clients, in order to adapt to the heterogeneous labeling environment. Since the clustering object is the sample generated by the generative adversarial network, rather than the original data, the parameters of the existing DBSCAN algorithm cannot be directly applied to our experiment. In order to obtain better experimental results, we adjusted the parameters of the DBSCAN algorithm and explained our parameter-tuning method. LPFL-GD can further improve the performance of local models on local datasets by up to 13.4%. The experimental results show that, on the one hand, the adjustment of clustering parameters is more suitable for our experiment than the existing methods, and good results are obtained; on the other hand, compared with the existing federated learning methods, LPFL-GD is more suitable for personalized federated learning scenarios under label-personalized conditions. When clients of different classification criteria cooperate with each other, our proposed method can improve the performance of the model. This extends the scope of federated learning scenarios beyond a single classification standard that all clients must adhere to.
Limitations of the method proposed in this paper are as follows: 1. Samples generated by GAN networks will reflect the characteristics of the original data, and shared samples may cause local data information leakage to a certain extent, resulting in relatively poor data security performance. 2. The process of sharing and generating the samples generated by the antagonistic network is a heavy burden that provides relatively poor communication efficiency for communication.