Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

PerFreezeClip: Personalized Federated Learning Based on Adaptive Clipping

Electronics 2024, 13(14), 2739; https://doi.org/10.3390/electronics13142739

by Jianfei Zhang^*

and Zhilin Liu

Reviewer 1: Anonymous

Reviewer 2:

Chengzu Dong

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Reviewer 5:

Marian Pompiliu Cristescu

Electronics 2024, 13(14), 2739; https://doi.org/10.3390/electronics13142739

Submission received: 31 May 2024 / Revised: 9 July 2024 / Accepted: 10 July 2024 / Published: 12 July 2024

(This article belongs to the Special Issue Deep Learning for Data Mining: Theory, Methods, and Applications)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1. Objectives should be written more comprehensively with the support of existing work.
2. The conclusion should be more concise; it should include only major findings.
3. Algorithms 1 and 2 should be explained in more detail.

4. Explain the comparison of the related work section. The table is required that must state the clear difference between your work and other published works

Comments on the Quality of English Language

English style fine

Author Response

Dear Reviewer 1,

Thank you very much for your constructive feedback and suggestions. Your suggestions are very helpful in improving our manuscript. As you may have noticed, there were several issues that needed to be addressed, and based on your suggestions, we have made some changes to our previous manuscript. Revised portions are highlighted on the paper. The changes are as follows.

Comment 1: Objectives should be written more comprehensively with the support of existing work.

Response 1: Thank you very much for your comments, which are correct, and we realize that our objectives were indeed not written comprehensively enough. We have revised the description of the objectives in Part III of the previous manuscript(Page 5, lines 192-198). A more comprehensive narrative has been developed that builds on our existing work.

Comment 2: The conclusion should be more concise; it should include only major findings

Response 2: Thank you very much for your valuable suggestion, we agree with you very much, the conclusion section of our manuscript was not concise enough, we have modified the conclusion section in our revised manuscript to include only the main findings and future perspectives to make it more concise(Page 17, line 647).

Comment 3: Algorithms 1 and 2 should be explained in more detail.

Response 3: We have carefully considered your comments and agree with your suggestion that a more detailed description of Algorithm 1 and Algorithm 2 would make our approach clearer. Therefore, we have added Algorithm 1 and Algorithm 2 to Section 3 to more clearly describe the freezing method in Algorithm 1 and the overall flow of Algorithm 2. (Page 6, lines 219-223, Page 8, lines 297-303)

Comment 4: Explain the comparison of the related work section. The table is required that must state the clear difference between your work and other published works

Response 4: Thank you very much for pointing out the problems regarding the related work section of our manuscript, we did neglect to compare it with some of the related personalization methods. This suggestion of yours is very helpful in improving the quality of our manuscript. We have made some additions in the last paragraph of Section 2.2, describing some of our advantages over other methods(Page 3, lines 142-147). In addition, we have added a table at the end of Section 2 describing the strategic and technical differences between our approach and other personalization methods(Page 4, line 149 Table1.Related work).

Once again, we sincerely appreciate your valuable comments, which will undoubtedly help improve the quality of our manuscript.

Yours sincerely,

Corresponding author:

Name: Jianfei Zhang

E-mail: [email protected]

Reviewer 2 Report

Comments and Suggestions for Authors

Main question addressed: The main question addressed by this research is how to address the challenge of data heterogeneity in federated learning, specifically the bias introduced by non-IID data.

Originality and relevance: The topic is highly relevant in the field of federated learning, as data heterogeneity is a significant challenge. The proposed algorithm, PerFreezeClip, appears to be an original approach to addressing this challenge.

Contribution to the field: PerFreezeClip adds to the field by providing a personalized federated learning algorithm that adapts to non-IID data, improving test accuracy and convergence speed compared to non-personalized federated learning algorithms.

Methodology improvements: Consider adding more diverse datasets to the experiments and exploring other types of gradient clipping techniques. Additionally, it would be beneficial to analyze the computational overhead of PerFreezeClip compared to other algorithms.

Consistency of conclusions: The conclusions are consistent with the evidence presented, demonstrating the effectiveness of PerFreezeClip in addressing the main question.

Appropriateness of references: The references appear appropriate, but it would be beneficial to include more recent studies on federated learning and data heterogeneity.

1.Dong, C., Zhou, J., An, Q., Jiang, F., Chen, S., Pan, L., & Liu, X. (2023). Optimizing performance in federated person re-identification through benchmark evaluation for blockchain-integrated smart uav delivery systems.

Additional comments on tables and figures: The tables and figures are clear and well-organized, effectively illustrating the performance of PerFreezeClip compared to other algorithms.

Comments on the Quality of English Language

Good

Author Response

Dear Reviewer 2,

Thank you very much for your constructive comments and suggestions. We also appreciate your clear and lucid comments on our manuscript from different perspectives. Your suggestions have been very helpful in improving our manuscript. Based on your suggestions, we have noticed that there are several issues that need to be addressed and we have made some changes to our previous manuscript. Revised portions are highlighted on the paper. The changes are as follows.

Comment:

Main question addressed: The main question addressed by this research is how to address the challenge of data heterogeneity in federated learning, specifically the bias introduced by non-IID data.
Originality and relevance: The topic is highly relevant in the field of federated learning, as data heterogeneity is a significant challenge. The proposed algorithm, PerFreezeClip, appears to be an original approach to addressing this challenge.
Contribution to the field: PerFreezeClip adds to the field by providing a personalized federated learning algorithm that adapts to non-IID data, improving test accuracy and convergence speed compared to non-personalized federated learning algorithms.
Methodology improvements: Consider adding more diverse datasets to the experiments and exploring other types of gradient clipping techniques. Additionally, it would be beneficial to analyze the computational overhead of PerFreezeClip compared to other algorithms.
Consistency of conclusions: The conclusions are consistent with the evidence presented, demonstrating the effectiveness of PerFreezeClip in addressing the main question.
Appropriateness of references: The references appear appropriate, but it would be beneficial to include more recent studies on federated learning and data heterogeneity.

1.Dong, C., Zhou, J., An, Q., Jiang, F., Chen, S., Pan, L., & Liu, X. (2023). Optimizing performance in federated person re-identification through benchmark evaluation for blockchain-integrated smart uav delivery systems.

Additional comments on tables and figures: The tables and figures are clear and well-organized, effectively illustrating the performance of PerFreezeClip compared to other algorithms.

Comment 1: Methodology improvements: Consider adding more diverse datasets to the experiments and exploring other types of gradient clipping techniques. Additionally, it would be beneficial to analyze the computational overhead of PerFreezeClip compared to other algorithms.

Response 1:

Thank you very much for some constructive comments on the improvement of our method, and we agree with you. It is a good suggestion to explore other types of gradient cropping techniques in more diverse datasets so that the applicability of the gradient cropping techniques employed in our method can be evaluated more broadly. Your suggestion provides direction for our next research. Unfortunately, due to time and financial constraints, we may not be able to explore more diverse datasets for experiments for the time being. However, we are optimistic about exploring different types of gradient cropping techniques on other more complex and diverse datasets in our future work.
In addition, your suggestion to analyze the computational overhead of PerFreezeClip is correct, as the gradient cropping technique imposes additional computational overhead, which we agree with. Since our main goal is to mitigate the effects of heterogeneous data, we did not experiment from the perspective of communication efficiency. However, your suggestion is correct and helpful in improving our previous manuscript. Therefore, we have made a change in the first paragraph of Section III, where we introduced the following four papers to briefly analyze the computational overhead(Page 4 lines 157-163). PerFreezeClip's adoption of the gradient cropping technique incurs additional computational overhead, as per the literature [1][2], which is the same as your analysis. Related work [3][4] shows that freezing methods can reduce the computational and communication resources required to train learning models in federated learning. Although our analysis in the manuscript is relatively simple, this suggestion of yours this provides a good research direction for our future work on communication efficiency. These four documents correspond to document numbers 24, 25, 26, and 27 in the manuscript.

[1]Babakniya, Sara, et al. "Federated sparse training: Lottery aware model compression for resource constrained edge." Workshop on Federated Learning: Recent Advances and New Challenges (in Conjunction with NeurIPS 2022). 2022.
[2]Bibikar, Sameer, et al. "Federated dynamic sparse training: Computing less, communicating less, yet learning better." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 36. No. 6. 2022.
[3]Sidahmed, Hakim, et al. "Efficient and private federated learning with partially trainable networks." arXiv preprint arXiv:2110.03450 (2021).
[4]Pfeiffer, Kilian, et al. "CocoFL: Communication-and computation-aware federated learning via partial NN freezing and quantization." arXiv preprint arXiv:2203.05468 (2022).

Comment 2: Appropriateness of references: The references appear appropriate, but it would be beneficial to include more recent studies on federated learning and data heterogeneity.

Dong, C., Zhou, J., An, Q., Jiang, F., Chen, S., Pan, L., & Liu, X. (2023). Optimizing performance in federated person re-identification through benchmark evaluation for blockchain-integrated smart uav delivery systems.

Response 2: Thank you very much for your positive comments on the appropriateness of the references in our manuscript; referring to the latest research on federal learning can improve the quality of our manuscript, a point we share with you. We are also grateful to you for recommending a paper for us, which we have read carefully and found that the edge-based smart drone delivery system in this paper makes good use of techniques related to federated learning, in which some shortcomings of traditional federated learning, such as single-point-of-failure, blockchain forks, and computational overheads, are mentioned, and thus there is a need to further optimize federated learning, IoT, and drone networks adequately. Composition. This idea is contextually similar to the idea in our manuscript that traditional federated learning does not allow for more fine-grained personalization and thus needs to be optimized. Both suggest that traditional federated learning needs to be optimized. Adding this literature to our manuscript I think is a good suggestion. We have added a description of existing techniques in the literature related to joint learning to the introductory section of the revised manuscript, which we believe enhances the comprehensiveness of the article(Page 1 lines 39-41).

Once again, we sincerely appreciate your valuable comments, which will undoubtedly help improve the quality of our manuscript.

Yours sincerely,

Corresponding author:

Name: Jianfei Zhang

E-mail: [email protected]

Reviewer 3 Report

Comments and Suggestions for Authors

(1.) Page 1 Line 10: Non-IID" is not spelled out until line 141 as "non-identically distributed" which I believe should be "non-identically independently distributed" because there are 2"I"s instead of one "I" as "IID" instead of "ID",

(2.) Page 3 lines 109-138 is 30 lines of text that is too much for a single paragraph, and should be subdivided into smaller paragraphs to enhance readability and comprehension.

(3.) Page 10 line 372: "RGB" should be spelled out as "RedGreenBlue (RGB)"

(4.) Page 10 line 376: I could not find where "CIFAR" is spelled out. "CIFAR" first appears in line 17 on page 1 where it appears as "CIFAR-10" and "CIFAR-100" as dataset names.

(5.) Page 19: The titles in the list of References need to be carefully edited because I noticed inconsistent formats for capitalization of words in the titles.

For example Reference 3 on lines 632 and 633 uses capital letters for title of "IEEE Consumer Electronics Magazine" when titles in previous references on line 628 to 632 only use capital letter for first word in title. Similarly line 636 has Capital "M" in "Machine" but not capitals in "learning" and "systems". Likewise reference 21 has capital letters in each of the major words of title "IEEE 30th Workshop on Machine Learning for Signal Processing". The authors of the manuscript submission need to look at the format requirements for this "Electronics" journal and make appropriate corrections because use of both formats cannot be correct.

Author Response

Dear Reviewer 3,

We appreciate your professional comments on our article. And from the bottom of our hearts, we thank you for your careful reading of our article, and we apologize for our careless mistakes and thank you for reminding us of them. In our resubmitted manuscript, some of the misspellings and bibliographic errors have been corrected. Revised portions are highlighted on the paper. The specific corrections are listed below.

Comment 1: Page 1 Line 10: Non-IID" is not spelled out until line 141 as "non-identically distributed" which I believe should be "non-identically independently distributed" because there are 2"I"s instead of one "I" as "IID" instead of "ID",

Response 1: Thank you for double-checking, your advice is correct. We have corrected "non-identically distributed" to "non-identically independently distributed"(Page 4, line 153), which is the correct term for "Non-IID". This is the correct meaning of "Non-IID".

Comment 2: Page 3 lines 109-138 is 30 lines of text that is too much for a single paragraph, and should be subdivided into smaller paragraphs to enhance readability and comprehension.

Response 2: Thank you very much for this part of your suggestion, we have taken note of it and agree with the paragraph length issue on page 3, lines 109 to 138. I have re-subparagraphized this part of the text in order to enhance readability and comprehension. We have divided the paragraph into four subparagraphs, which allows for greater clarity in this part of the work(Page 3, lines 111-147).

Comment 3: Page 10 line 372: "RGB" should be spelled out as "RedGreenBlue (RGB)"

Response 3: Thank you very much for the modifications you have given. We have noted the problem you have pointed out and have corrected the text on page 10, line 372, we have corrected the "RGB" into "RedGreenBlue (RGB)"(Page 11, line 417).

Comment 4: Page 10 line 376: I could not find where "CIFAR" is spelled out. "CIFAR" first appears in line 17 on page 1 where it appears as "CIFAR-10" and "CIFAR-100" as dataset names.

Response 4: Thank you very much for your feedback, we have noted your issue. We have changed "CIFAR" to the full expression "CIFAR-10 or CIFAR-100" as the dataset name (Page 11, line 421).

Comment 5: Page 19: The titles in the list of References need to be carefully edited because I noticed inconsistent formats for capitalization of words in the titles.

Response 5: Thank you again for your careful reading. We double-checked the reference list and corrected the reference list on page 19 for uniform formatting to ensure consistent case formatting of all headings. Keep the first letter of the title capitalized.(Page 18-19)

Once again, we sincerely appreciate your valuable comments, which will undoubtedly help improve the quality of our manuscript.

Yours sincerely,

Corresponding author:

Name: Jianfei Zhang

E-mail: [email protected]

Reviewer 4 Report

Comments and Suggestions for Authors

The paper introduce the PerFreezeClip, which is a personalized federated learning algorithm that utilizes adaptive clipping and freezing methods to address data heterogeneity issues and enhance model performance. The paper is interesting and well-written. However, quality of English language needs to be improved, e.g. line 169. When ending the sentence with an equation, a period should be used.

Comments on the Quality of English Language

see above.

Author Response

Dear Reviewer 4,

Thank you very much for your constructive feedback and suggestions. Thank you for your positive comments on our paper. As you may have noticed, there may be some problems with the English aspect of our manuscript, therefore, we have checked the manuscript in its entirety and revised it with the expectation of improving the quality of English. Revised portions are highlighted on the paper. The corrections are listed below.

Comment: The paper introduce the PerFreezeClip, which is a personalized federated learning algorithm that utilizes adaptive clipping and freezing methods to address data heterogeneity issues and enhance model performance. The paper is interesting and well-written. However, quality of English language needs to be improved, e.g. line 169. When ending the sentence with an equation, a period should be used.

Response: We sincerely thank the reviewers for their careful reading. Regarding your suggestion on line 187, we recognize the importance of maintaining clarity and consistency in the use of punctuation, especially when ending sentences with equations.

We've changed the sentence

“Where represents the loss function of the client, where .”

“Where represents the loss function of the client, .” We removed the "where"(Page 5, line 187)

We apologize for our carelessness, and in our resubmission of the manuscript, we will ensure that all such cases are corrected accordingly in the revised manuscript. Meanwhile, we will check and revise the entire English grammar as well as the presentation of the resubmitted manuscript to improve the English quality of the whole article.

Once again, we sincerely appreciate your valuable comments, which will undoubtedly help improve the quality of our manuscript.

Yours sincerely,

Corresponding author:

Name: Jianfei Zhang

E-mail: [email protected]

Reviewer 5 Report

Comments and Suggestions for Authors

1. Can the authors be more specific when talking about updating head layer weights, custom parameters? In order for the proposed method to be credible, I think they need to provide more details about this process, as well as about the concrete way in which the improvement of the optimization space exploration for the layer parameters, remaining during the subsequent training, takes place.

2. What should the users of the method proposed by the authors rely on (what should they analyze), so that they "wisely" choose the threshold for cutting the gradient. In my opinion, the recommendations made by the authors in section 4 are far too brief and leave room for arbitrariness.

3. Do the authors anticipate or have they already performed experiments regarding what can happen if the number of local training rounds, on each device, will be greater than 10?

Author Response

Dear Reviewer 5,

Thank you very much for your constructive comments and suggestions. We also appreciate your clear and lucid comments on our manuscript from different perspectives. Your suggestions have been very helpful in improving our manuscript. Based on your suggestions, we have noticed that several issues need to be addressed and we have made some changes to our previous manuscript. Revised portions are highlighted on the paper. The changes are as follows.

Comment 1: Can the authors be more specific when talking about updating head layer weights, custom parameters? In order for the proposed method to be credible, I think they need to provide more details about this process, as well as about the concrete way in which the improvement of the optimization space exploration for the layer parameters, remaining during the subsequent training, takes place.

Response 1:

When it comes to updating the header layer weights and customizing the parameters, we found that the description was indeed not specific enough. Your suggestion is correct, so in order to be more specific, we have made changes in Section 3.3, lines 310-316. We provide a more specific description of the update of the head weights from the perspective of the local training process(Page 8, lines 310-316).
Thanks to the authors for the reminder, we did find that the exploration of the optimization space for the parameters of the remaining layers in subsequent training was written too briefly. No specific way was described in the previous manuscript. In our approach, the way to explore the optimization space of different layer parameters is to use the freezing method. By freezing the backbone layer, we explore the optimal parameters of the head layer so that it can be quickly adapted to the specific requirements of a new task or dataset. Conversely, by freezing the head layer, we explore the network parameters of the base layer to ensure that the base layer learns a better representation of generic features under joint training. Therefore, in our revised draft, we have revised page 6, lines 229-231. (Page 6, lines 229-231)

Comment 2: What should the users of the method proposed by the authors rely on (what should they analyze), so that they "wisely" choose the threshold for cutting the gradient. In my opinion, the recommendations made by the authors in section 4 are far too brief and leave room for arbitrariness.

Response 2:

Thank you very much for your reminder to choose the gradient cropping threshold wisely in Section 4, and we realize that the advice we gave was indeed too brief. Your suggestion would go a long way in improving the quality of our manuscript, and we agree with it. We provide more advice on how to choose the gradient trimming threshold wisely in Section 4, lines 387-395. It is based on a combination of four main considerations. (Page 10, lines 387-395)

initial model- and task-based estimation. Typical ranges of gradients can first be estimated through preliminary experiments or based on prior experience with similar tasks. For example, the magnitude and trend of the gradient of the model during the initial training phase can be observed.
based on statistical analysis of the gradient. The statistical properties of the gradient, including the mean and standard deviation, are monitored during the training process, and based on this statistical information, an appropriate trimming threshold can be selected to ensure that the trimmed gradient can effectively control its paradigm.
combine model architecture and optimizer. Different model architectures and optimizers may have different sensitivities to gradient cropping. For example, an optimizer that uses sparse gradients may require a smaller trimming threshold to avoid over-trimming.
experimental validation. After setting the threshold, a series of experiments are conducted to verify the convergence performance and training stability of the model. Observe the learning curve of the model, and its performance on the validation set to ensure that the selected threshold is applicable to your specific task.

Comment 3: Do the authors anticipate or have they already performed experiments regarding what can happen if the number of local training rounds, on each device, will be greater than 10?

Response 3: Thank you very much for your comment, and we have experimented in response to the question with you. In Fig. 7 on page 17 of the manuscript, we performed a related experiment (Page 17, figure 7). We fixed the freezing scale parameter and clipping threshold when the number of local epochs is 10. When the number of local epochs is set to 20 or 30, the training accuracy improves, but not by much. When the number of local training rounds is 40, we find that the training accuracy decreases, which means that the freezing ratio parameter and clipping threshold are not appropriate at this time. From this experiment, we conclude that the test accuracy of PerFreezeClip is related to the freezing ratio parameter, the clipping threshold, and the number of local training rounds.

Once again, we sincerely appreciate your valuable comments, which will undoubtedly help improve the quality of our manuscript.

Yours sincerely,

Corresponding author:

Name: Jianfei Zhang

E-mail: [email protected]

Article Menu

PerFreezeClip: Personalized Federated Learning Based on Adaptive Clipping

Further Information

Guidelines

MDPI Initiatives

Follow MDPI