Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Block-Wisely Supervised Network Pruning with Knowledge Distillation and Markov Chain Monte Carlo

Appl. Sci. 2022, 12(21), 10952; https://doi.org/10.3390/app122110952

by Huidong Liu¹, Fang Du^1,2, Lijuan Song^1,2

and Zhenhua Yu^1,2,*

Reviewer 1:

Fadi Hazboun

Reviewer 2:

David Cárdenas-Peña

Reviewer 3:

Adnene Arbi

Reviewer 4: Anonymous

Reviewer 5:

Carlos Villaseñor

Appl. Sci. 2022, 12(21), 10952; https://doi.org/10.3390/app122110952

Submission received: 11 August 2022 / Revised: 14 October 2022 / Accepted: 26 October 2022 / Published: 28 October 2022

Round 1

Reviewer 1 Report

Congratulations to the authors for this great work, I wish you success and good luck.

Author Response

We are thankful to the reviewer for the positive comments.

Reviewer 2 Report

The authors propose a novel approach for network pruning based on imitation learning from a baseline teacher NN to a student one with reduced FLOPs. The main found contribution is the MCMC proposal that models the pruning transitions within a block.

The methodological component is clearly described so that it can be replicated. The attained results on relevant datasets and known baseline DNNs evidence the best trade-off between FLOP reduction and accuracy loss.

Figures and tables hold a high quality and the writing is adequate for a scientific manuscript.

Author Response

We are thankful to the reviewer for the positive comments.

Reviewer 3 Report

1) The authors should be studied the problem of over\Under fitting.

2) How can you detecting these problems?

3) What is the proposed idea for solving these problems? I think some simulations are needed at this step.

Author Response

We are thankful to the referee for proposing these questions. Our baseline and pruning models are convergence-stable models trained on publicly available datasets, and there is no over\Under fitting. In addition, MCMC converges to a stable solution of our distribution after finite state transfer, and there is no over\Under fitting.

Reviewer 4 Report

- The paper is well written and organized.

- The authors might want to discuss how LASSO can be incorporated or compared with the proposed method.

- More information on the MCMC sampling scheme is needed, e.g., number of iterations, burn-in periods, thin, and autocorrelation of the chain.

- Page 3, line 133, there might be an error, which should be the subsection on "MCMC based sub-structure search".

Author Response

We are thankful to the referee for proposing these questions. For the first question, in line 257 of the paper, our approach is compared with the channel pruning approach CP based on LASSO. As shown in Table 5, our method also trims more FLOPs at less loss of accuracy when compared to channel pruning method based on LASSO. For MCMC sampling, we set the number of restarts, burn-in periods, thin, and length of each MCMC chain to 10, 3000, 1 and 6000, respectively. As suggested by the referee, we have corrected the error in line 134 of the paper.

Reviewer 5 Report

I congratulate the authors for this new paper

The authors present a new methodology of structural pruning based on Knowledge Distillation and Markov Chain Monte Carlo. The paper is well written and show significant results. The comparison with other methods seems fair. The idea is simple but it have shown promising results.

I have de following suggestions:

1. The paper is well written, but some typos can be found, please make a minor review.

2. Some acronyms are not defined, for example AMC in line 31, all the acronyms have to be presented before they can be used.

3. Line 51, the following text is no clear: “Moreover, the training of architecture-related hyper-parameters often suffers from low stability.” Do you mean numerical stability? Or maybe the oscillations in the loss curves. From the control theory perspective this can be hard to understand. Please add a cite to support this claim.

4. In equation (1), you present a accuracy metric, it seems to be a gaussian over the reconstruction error. I understand that to use this in the eq. (3) you need range [0,1]. Maybe other metric that can be use it, it’s the R2-score. Had you considered this metric?

5. In line 144, you set the L1-norm for measuring importance of the kernel weights, you add the cite, but it would be nice to elaborate why you are using it.

6. Algorithm 1, line 4. What is the meaning of the dot in Z<-q(.|Z_{t-1})

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Article Menu

Block-Wisely Supervised Network Pruning with Knowledge Distillation and Markov Chain Monte Carlo

Further Information

Guidelines

MDPI Initiatives

Follow MDPI