Next Article in Journal
Accelerate Incremental TSP Algorithms on Time Evolving Graphs with Partitioning Methods
Previous Article in Journal
Finding Hamiltonian and Longest (s,t)-Paths of C-Shaped Supergrid Graphs in Linear Time
 
 
Article
Peer-Review Record

Pruning Adapters with Lottery Ticket

Algorithms 2022, 15(2), 63; https://doi.org/10.3390/a15020063
by Jiarun Wu † and Qingliang Chen *,†
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Algorithms 2022, 15(2), 63; https://doi.org/10.3390/a15020063
Submission received: 20 January 2022 / Revised: 5 February 2022 / Accepted: 9 February 2022 / Published: 14 February 2022

Round 1

Reviewer 1 Report

The authors' paper on adapter pruning appears to be a well-written research paper. The literature review is presented comprehensively, taking into account the latest research. In addition, the research conducted corresponds to the topic of the journal. I suggest accepting the paper in current form.

There are a couple of typos, highlighted in the attached PDF.

Comments for author File: Comments.pdf

Author Response

Thank you for finding the typos for us. We’ve corrected the typos in the paper 

Reviewer 2 Report

The paper proposes novel ways of pruning redundant parameters in adapters, that are relying on the Lottery Ticket Hypothesis.
Pruning is done on 3 levels: weights, neurons and adapter layers. Adapters are pruned iteratively and with each iteration weights are set to initial values. Evaluation is performed on GLUE datasets and subnetworks are found successfully. Results have shown a significant decrease in size with no performance drop. Original adapters are even outperformed in some datasets.

Suggestions for improvements:

- paper should be reorganized as some sections are missing
- rework Introduction section (give general intro, discuss open questions, what issues need to be solved, what is your motivation etc.)
- missing Related work section (provide an overview of existing research in comparison with your approach)
- state your motivation for choosing Adam (over SGD e.g.)
- how were the hyperparameters chosen... arbitrarily or motivated by some ground theory?
- how long did it take to train the models with the proposed architecture?
- explain how values were chosen in pruning strategies
- missing Conclusion section (rework Discussion section so final conclusions are separated from Discussion section)
- state the possible limitations of your work (e.g. generalization power of your findings to iterative pruning on deeper networks, effects of model sparseness on different types of datasets)

Author Response

Please see the attachment

Author Response File: Author Response.docx

Back to TopTop