Next Article in Journal
Hole Appearance Constraint Method in 2D Structural Topology Optimization
Previous Article in Journal
Dynamic Byzantine Fault-Tolerant Consensus Algorithm with Supervised Feedback Mechanisms
 
 
Article
Peer-Review Record

Generalized Federated Learning via Gradient Norm-Aware Minimization and Control Variables

Mathematics 2024, 12(17), 2644; https://doi.org/10.3390/math12172644
by Yicheng Xu, Wubin Ma *, Chaofan Dai, Yahui Wu and Haohao Zhou
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Mathematics 2024, 12(17), 2644; https://doi.org/10.3390/math12172644
Submission received: 24 July 2024 / Revised: 22 August 2024 / Accepted: 23 August 2024 / Published: 26 August 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper is very well-written and the results are very clear. However, it is hard to follow the proposed methodology of the authors. Please make a simple methodology diagram using block diagrams which will help the readers a great deal in better understanding of your work.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The authors present a federated learning approach based on first-order flatness and error function, which expects to generalize better than traditional methods using only the error function. However, the following comments need to be achieved to improve their manuscript.

For the non-expert readers, please include mathematical formulation and definitions of

i) Local Minima

ii) Global Minima

iii) Flat Minima

iv) Sharp Minima

v) Discuss the above concepts and their differences.

vi) The principle of empirical risk

The neighborhood is continuous in Equations 4, 5, and 6 (centered in w with radio rho). How is it possible to evaluate the infinite neighbors? In the case of discretization, how is it done?

In mathematics, the zeroth-order, first-order, and second-order terms are terms coming from power series. Specify which terms are elevated to the zero, one, and two powers in the zeroth-order flatness, first-order flatness, etc.

In the comparison with zeroth-order flatness, it is not clear if epsilon is a neighbor or a perturbation value. Please clarify the above and discuss how to find the value (perhaps linear programming or other techniques). Moreover, the authors emphasize that first-order flatness is superior to zero-order flatness. Does that mean second-order flatness is superior to first and zero-order flatness?

About the name GAM (Gradient Norm Aware Minimization), are you minimizing the vector gradient norm (distance)? The above seems unclear in your proposal. Please clarify the above or change their names.

Why is it necessary to transform Equation 12 into Equation 17? Please add a numerical method to compute it.

The rationale behind the parameters c and c_i in Equations 18 and 19 needs to be clarified. Would the parameter c_i decay the local gradient in the client according to the previous weights? The parameter c seems arbitrary because to compute it (Equation 20) uses delta c_i. Please include a Table with all the necessary parameters, such as Delta, Eta, etc., and specify if there is only one parameter for every weight or a single parameter for all the weight updates.

Section 4.2.2 seems to break the federated learning approach because the clients incorporate gradient information from all clients. The above approximates the parameters c_i and c. How is it possible, even when communication links fail?

Comments on the Quality of English Language

Please improve the English language when possible to make it clearer.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

1、The title does not adequately explain the research content of the paper.
2、To what extent does the non-independent and identically distributed (Non-IID) nature of data across clients in federated learning environments exacerbate the challenge of client drift?
3、In the context of the proposed FedGAM algorithm, how effectively do control variables facilitate the correction of local updates to steer model training towards globally flat minima, and what quantifiable advancements in model performance can be identified when compared to conventional federated learning techniques?

Comments on the Quality of English Language

Minor editing of English language required.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors
  • The authors combined FL (Federated Learning) and GAM (Gradient Norm-Aware Minimization) to propose FedGAM, which enhances the generalization ability of the global model.
  • The experiments are well-executed, addressing the three main causes that contribute to client drift individually.
  • I suggest that the authors should provide a more detailed explanation of the symbols used so that readers unfamiliar with previous literature can still understand the content of the paper.
  • In the experiments, the alpha value is used to control data heterogeneity. The authors should consider explaining this more clearly or citing relevant papers to facilitate the reproducibility of the experiments.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Comments and Suggestions for Authors

agree to publishing it

Back to TopTop