Next Article in Journal
Attenuation Characterization of Terahertz Waves in Foggy and Rainy Conditions at 0.1–1 THz Frequencies
Previous Article in Journal
Unsupervised Forgery Detection of Documents: A Network-Inspired Approach
 
 
Article
Peer-Review Record

A Novel Channel Pruning Compression Algorithm Combined with an Attention Mechanism

Electronics 2023, 12(7), 1683; https://doi.org/10.3390/electronics12071683
by Ming Zhao 1, Tie Luo 1,*, Sheng-Lung Peng 2 and Junbo Tan 1
Reviewer 1:
Reviewer 2: Anonymous
Electronics 2023, 12(7), 1683; https://doi.org/10.3390/electronics12071683
Submission received: 21 February 2023 / Revised: 28 March 2023 / Accepted: 30 March 2023 / Published: 3 April 2023
(This article belongs to the Section Computer Science & Engineering)

Round 1

Reviewer 1 Report

In this paper, the authors integrate the existing pruning framework CHIP with the BAM attention mechanism to enhance pruning performance. Although the authors claim the work to be novel, a significant portion of the paper under the algorithm section is directly based on the BAM: Bottleneck Attention Module and CHIP: CHannel Independence-based Pruning for Compact Neural Networks papers. Moreover, the reviewer found that a significant portion of text in the algorithm section is copied from original papers, including figure 3. Moreover, the authors don’t cite the CHIP paper in the manuscript. As the reviewer didn’t find significant novelty in this paper, the reviewer doesn’t recommend this paper for publication.

Author Response

Dear Reviewer:

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “A Novel Channel Pruning Compression Algorithm Combining Attention Scheme” (ID: electronics-2265844). Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our research. We have studied comments carefully and have made correction which we hope to meet with approval. The main corrections in the paper and the responds to the reviewer’s comments are as flowing:

Responds to the reviewer’s comments:

 

Point #1 In this paper, the authors integrate the existing pruning framework CHIP with the BAM attention mechanism to enhance pruning performance. Although the authors claim the work to be novel, a significant portion of the paper under the algorithm section is directly based on the BAM: Bottleneck Attention Module and CHIP: Channel Independence-based Pruning for Compact Neural Networks papers. Moreover, the reviewer found that a significant portion of text in the algorithm section is copied from original papers, including figure 3. Moreover, the authors don’t cite the CHIP paper in the manuscript. As the reviewer didn’t find significant novelty in this paper, the reviewer doesn’t recommend this paper for publication.

 

Answer:Thanks for the valuable comments, we reviewed our article carefully. Due to our negligence, CHIP was omitted and not quoted when sorting out all the references. We corrected and rechecked all references. The algorithm in this paper is based on the pruning frame CHIP. On the one hand, to improve the performance of the network model and increase the accuracy, we integrate the BAM attention mechanism; on the other hand, to improve the compression performance, we change the pruning idea and add the iterative thought to the pruning process, and carry out multiple pruning. The performance of the model is improved from the above two perspectives, and we can see from the results that the results are still quite good.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors present a new approach for pruning of neural networks. Their idea is interesting and worthy of investigation, but the way their new approach is presented needs to be improved.

Section 3 does not provide all the necessary details for understanding the approach:

-you did not explain all the variables that appear in the formula. What is the meaning of C,H, W (line 157)?

-what is the meaning of r (line 174). Some explanation exists, but it appears only on line 185. Is it the same r?

-in Equation 3 you use AvgPool and Avgpool. Is it the same function? 

-In line 227, you use h and w. Are they the same as H and W?

-the algorithm that you have provided is not clear. In line 3, how do you get M’, from where? Line 4->11 are not properly indented and are difficult to follow and understand. Line 13, what do you understand by “Fine-tuning M’ ”? Line 14, how did you choose 0.5% for the difference in accuracy?

 

You should explain properly all the variables used in your definition, and include examples. It will make it much easier for the reader to understand your approach. The explanations given in this version are difficult to follow.

 

In Section 4, you should also explain what FlOPS(M) and GFLOPS mean. Also, based on what did you conclude that your method is “consistently superior to other advanced pruning methods”? Using which results? By looking at the values in the table, we can see there are experiments in which it does not perform the best.

You should also detail  your conclusion “It is worth  mentioning that CCPCSA may have higher accuracy than the original model when the  pruning ratio is smaller.” Based on what results presented in the paper, did you draw this conclusion?

 

 

 

 

Some minor English errors/typos that should be corrected:

-line 14, “this model obtained” should be “this obtained model”

-line 17, “the results showed that the algorithm showed strong adaptability” should be rephrased

-line 18, “ResNet50 pruned, on the CIFAR-100 dataset, the amount of model parameters 18 was reduced by 80.3%” something is missing (maybe for/on)

-line 55, “ Based on this, this paper proposes” should be rephrased.

-lines 58-62, should be complete sentences

-line 81, what is DNNs?

-line 104, “Pioneered by [30, 31] The first structured pruning ...“ should be “Pioneered by [30, 31] the first structured pruning method …”

-line 126, “the work [37] in proposes” should be “the work in [37]  proposes”

-lines 128-129, “the limitations of SENet itself  make the scalar values generated by this model cannot fully reflect the channel importance…”, something is missing (the sentence is not clear)

-line 135, “both spatial and channel attention modules are utilized, which is based on an efficient structural design ...“ should be “both spatial and channel attention modules are utilized, which are based on an  efficient structural design”

-line 160, “calculated as Eq.(1) is shown  follows:” should be “is calculated as shown in Equation (1)” 

-line 167,  “function. Attention is mapped to” should be “function, and  attention is mapped to”

-line 173, “is used. to  save” should be “is used. To save” or “is used, to save”

-line 175, “as Eq.(3) shown in” should be “as shown in Eq.(3)”

-line 180, “Use inflated convolution “ should be “We use inflated convolution “

-line 189, “is calculated as (4) :” should be “is calculated as shown in Equation (4) :

-line 243, “calculated as:” should be “calculated as shown in Equation (5):

-line 254, “selected nuclear norm ion “, something is missing?

-line 272, “approximated as” should be “approximated as shown in Equation (6)”

-line 291, Figure 4 caption is incomplete

-line 295, “will be based on the pytorch…” should be “used  the Pytorch..”

Author Response

Dear Reviewer:

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “A Novel Channel Pruning Compression Algorithm Combining Attention Scheme” (ID: electronics-2265844). Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our research. We have studied comments carefully and have made correction which we hope to meet with approval. The main corrections in the paper and the responds to the reviewer’s comments are as flowing:

Responds to the reviewer’s comments:

 

Point #1 Section 3 does not provide all the necessary details for understanding the approach:

 

-you did not explain all the variables that appear in the formula. What is the meaning of C,H, W (line 157)?

Answer:Thanks for the valuable comments. CHW is the number of channels, the height of the feature graph and the width of the feature graph respectively.

They describe how many channels the current feature graph has and the size of the feature graph.

We have explained and indicated in the article.

 

 

-what is the meaning of r (line 174). Some explanation exists, but it appears only on line 185. Is it the same r?

Answer:Thanks for the valuable comments. r is the compression ratio. Line 185 is the same r. For the sake of description, instead of using the compressed result directly, we use the compression ratio.

 

 

 

-in Equation 3 you use AvgPool and Avgpool. Is it the same function?

Answer: Thank you for your correction. Due to our oversight, the spelling was wrong and we have corrected it. AvgPool is average pooling.

 

 

 

-In line 227, you use h and w. Are they the same as H and W?

Answer:Thanks for the valuable comments. Since this is another theory described in the new section, we have changed the spelling. They are of the same meaning as describing the size of the feature graph.

 

 

 

-the algorithm that you have provided is not clear. In line 3, how do you get M’, from where? Line 4->11 are not properly indented and are difficult to follow and understand. Line 13, what do you understand by “Fine-tuning M’ ”? Line 14, how did you choose 0.5% for the difference in accuracy?

Answer:Thank you for your valuable advice.

  • The first few lines in the algorithm are descriptions of input and output variables that we have identified. M 'is the output model. In order to facilitate the description in the algorithm flow, we carry out the symbol definition in advance.
  • Line 4->11 were incorrectly indented due to some unknown error, which we have corrected.
  • Since the accuracy of the model may be lost after pruning, the fine-tuning operation is to allow the model to continue training for a few rounds to achieve the purpose of restoring the accuracy.
  • In order to compress the model as much as possible without excessive loss accuracy, we set the loss value. The difference choice for row 14 is our choice. The 0.5% set in this article, of course, can be set to other values. However, in order to balance the accuracy and pruning rate, this value should not be set too large.

 

 

-In Section 4, you should also explain what FlOPS(M) and GFLOPS mean. Also, based on what did you conclude that your method is “consistently superior to other advanced pruning methods”? Using which results? By looking at the values in the table, we can see there are experiments in which it does not perform the best.
You should also detail  your conclusion “It is worth  mentioning that CCPCSA may have higher accuracy than the original model when the  pruning ratio is smaller.” Based on what results presented in the paper, did you draw this conclusion?

Answer: Thanks for the valuable comments.

(1) We changed the careless writing. FLOPs is short for floating point operations. The mathematical unit of FLOPs(M) is million and the mathematical unit of GFLOPs is billion.

(2) We revised the loose statement. Our method may be deficient in one aspect, but overall it is superior to other methods by looking at the values in the table.

(3) We added experimental data to illustrate the possibility of a low pruning rate that outperforms the original model. Since few experimental data were recorded for such cases, they did not appear in the previous version of the experimental tables.

 

 

 

Point #2 Some minor English errors/typos that should be corrected:

Answer: Thank you for the advice you have given. We have checked and corrected each item in the text using the revision function. Some statements have been rewritten. Thank you again for your patience in giving every suggestion.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The reviewer observed that the authors had not tried to address the concerns raised in the previous review response other than adding the missing citations. Stills, there is a significant overlap of the text content from CHIP and BAM papers, and the authors have not put effort into addressing these concerns. Furthermore, as the reviewer feels that the proposed method is just a combination of two existing methods, BAM and CHIP, there is no significant novelty, as claimed in the title. For this reason, the reviewer doesn't recommend the article for publication. 

Author Response

Dear Reviewer:

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “A Novel Channel Pruning Compression Algorithm Combining Attention Scheme” (ID: electronics-2265844). Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our research. We have studied comments carefully and have made correction which we hope to meet with approval. The main corrections in the paper and the responds to the reviewer’s comments are as flowing:

Responds to the reviewer’s comments:

 

Point #1 The reviewer observed that the authors had not tried to address the concerns raised in the previous review response other than adding the missing citations. Stills, there is a significant overlap of the text content from CHIP and BAM papers, and the authors have not put effort into addressing these concerns. Furthermore, as the reviewer feels that the proposed method is just a combination of two existing methods, BAM and CHIP, there is no significant novelty, as claimed in the title. For this reason, the reviewer doesn't recommend the article for publication. 

 

 

AnswerThanks for the valuable comments. Beside new citations, we made extensive revisions to the content using our own writing style in order to improve its clarity. During our research, we discovered that there had been no other studies on combining BAM and CHIP methods, and we used iterative pruning to compact the model as much as possible. This modification could be seen section 1, “The aforementioned methods have made progress to a certain extent for neural network model streamlining and other related dimensions, but the degree of model compression and the degree of accelerated computation are not sufficient and not necessarily suitable for deployment to mobile terminal devices. Based on this, a cyclic pruning compression algorithm combining attention is proposed.”  In section 3, the algorithm in Figure 4 could show this cyclic pruning processing codes, we also have redrawn the Schematic diagram of clustering, it is different from the original frame diagram. Our experimental results support the feasibility of this approach, leading us to believe that our work has some degree of novelty.

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors have addressed most of my concerns.

However, there are a few important details that must be included in the paper (it is not enough to explain them to me, a reviewer, you have to explain them to the reader):

-what is the meaning of r (line 174). Some explanation exists, but it appears only on line 185. Is it the same r?

AnswerThanks for the valuable comments. r is the compression ratio. Line 185 is the same r. For the sake of description, instead of using the compressed result directly, we use the compression ratio.

R: You should mention that in the article

-In line 227, you use h and w. Are they the same as H and W?

AnswerThanks for the valuable comments. Since this is another theory described in the new section, we have changed the spelling. They are of the same meaning as describing the size of the feature graph.

R: You should mention that in the article

 

 

-the algorithm that you have provided is not clear. In line 3, how do you get M’, from where? Line 4->11 are not properly indented and are difficult to follow and understand. Line 13, what do you understand by “Fine-tuning M’ ”? Line 14, how did you choose 0.5% for the difference in accuracy?

 

  • The first few lines in the algorithm are descriptions of input and output variables that we have identified. M 'is the output model. In order to facilitate the description in the algorithm flow, we carry out the symbol definition in advance.
  • R: You should explain what “Get model M’ “ in line 3 means (is it a copy of M, is it an empty model). Seeing as it is an output parameter, from where do you get it? 
  •  
  • Since the accuracy of the model may be lost after pruning, the fine-tuning operation is to allow the model to continue training for a few rounds to achieve the purpose of restoring the accuracy.
  • In order to compress the model as much as possible without excessive loss accuracy, we set the loss value. The difference choice for row 14 is our choice. The 0.5% set in this article, of course, can be set to other values. However, in order to balance the accuracy and pruning rate, this value should not be set too large.

R: You should mention that in the article

 

-In Section 4, you should also explain what FlOPS(M) and GFLOPS mean. Also, based on what did you conclude that your method is “consistently superior to other advanced pruning methods”? Using which results? By looking at the values in the table, we can see there are experiments in which it does not perform the best.
You should also detail your conclusion “It is worth mentioning that CCPCSA may have higher accuracy than the original model when the pruning ratio is smaller.” Based on what results presented in the paper, did you draw this conclusion?

Answer: Thanks for the valuable comments.

(1) We changed the careless writing. FLOPs is short for floating point operations. The mathematical unit of FLOPs(M) is million and the mathematical unit of GFLOPs is billion.

R: You should mention that in the article

(3) We added experimental data to illustrate the possibility of a low pruning rate that outperforms the original model. Since few experimental data were recorded for such cases, they did not appear in the previous version of the experimental tables.

R: You should mention that in the article

Some new typos that must be corrected:

-In lines 58-59, " Then a detailed description of the steps and details of the algorithms in this paper is presented in Section 3." should be " Then a detailed description of the steps and details of the algorithms are presented in Section 3."

- line 128, the sentences "However, the limitations of SENet itself make the scalar values generated by this model cannot fully reflect the channel importance and are not enough to improve the pruning performance. " needs to be rephrased or completed. It is not understandable.

-line 168, "Eq.(2): The" should be just "Eq.(2):"

 

 

Author Response

Dear Reviewer:

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “A Novel Channel Pruning Compression Algorithm Combining Attention Scheme” (ID: electronics-2265844). Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our research. We have studied comments carefully and have made correction which we hope to meet with approval. The main corrections in the paper and the responds to the reviewer’s comments are as flowing:

Responds to the reviewer’s comments:

 

Point #1 -what is the meaning of r (line 174). Some explanation exists, but it appears only on line 185. Is it the same r?

Answer:Thanks for the valuable comments. r is the compression ratio. Line 185 is the same r. For the sake of description, instead of using the compressed result directly, we use the compression ratio.

R: You should mention that in the article

Answer:Thanks for the valuable comments. We have done that. It could be seen in line 186,187.

 

 

-In line 227, you use h and w. Are they the same as H and W?

Answer:Thanks for the valuable comments. Since this is another theory described in the new section, we have changed the spelling. They are of the same meaning as describing the size of the feature graph.

R: You should mention that in the article

Answer:Thanks for the valuable comments. We have done that.

 

 

 

You should explain what “Get model M’ “ in line 3 means (is it a copy of M, is it an empty model). Seeing as it is an output parameter, from where do you get it? 

Answer: Thank you for your correction. We rearranged the algorithm flow to make sure it was easy to understand.

 

 

 

Some new typos that must be corrected:

-In lines 58-59, " Then a detailed description of the steps and details of the algorithms in this paper is presented in Section 3." should be " Then a detailed description of the steps and details of the algorithms are presented in Section 3."

- line 128, the sentences "However, the limitations of SENet itself make the scalar values generated by this model cannot fully reflect the channel importance and are not enough to improve the pruning performance. " needs to be rephrased or completed. It is not understandable.

-line 168, "Eq.(2): The" should be just "Eq.(2):"

Answer:Thanks for the valuable comments. We have made corrections.

Author Response File: Author Response.pdf

Round 3

Reviewer 1 Report

As the authors have addressed the concerns raised by the reviewer to some extent in the revised version, the author would like to recommend the article for publication. Some minor grammatical issues still need to be fixed in the final submission.

Author Response

Dear Editors and Reviewers:

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled “A Novel Channel Pruning Compression Algorithm Combining Attention Scheme” (ID: electronics-2265844).  We check the manuscript carefully and modified some expression and words; Revised portions are marked in yellow in the paper. The main corrections in the paper and the responds to the reviewer’s comments are as flowing:

 

point #1: This paper is not well written. Its English presentation is not good enough. Should consult English technical writing expert to help.

 

Answer: We are very sorry for our poor writing, the manuscript has been checked carefully from a to z, Thanks for valuable comments.

 

In the introduction, from line 43 to 44, “Pruning as a method to accelerate pre-trained larger models is a common way to compress networks.” we rewrite the sentence as

“Pruning as a method to accelerated pre-training of larger models is a common way to compress networks.”

In section 2.2, from line 108 to 110, “Such methods can obtain theoretically high compression ratios, but such theoretical compression ratios are difficult to achieve in practical compression.”  we replace this sentence with “Such methods can obtain high compression ratios in theory, but are difficult to achieve it in practical compression.”

and from line 117 to 120, “The primary focus of this paper is on channel pruning, a technique aimed at model compression achieved by eliminating a specific number of channels and their associated filters.” we replace this sentence with “The main focus of this paper is channel-based pruning technique, which aims to achieve model compression by eliminating a specific number of channels and their associated filters.”

In section 2.2 from line 123 to 125, “Both approaches only take into account the local statistics of two adjacent layers, meaning that they prune one layer to minimize reconstruction error in the next layer” we modified it as “Both approaches only consider the local statistics of two adjacent layers, meaning that they prune one layer to minimize reconstruction error in the next layer’

Lastly, in line 289 and line 361, there is a space missing separately.

 Some other minor expression errors have been corrected as well  in the manuscript.

Author Response File: Author Response.pdf

Back to TopTop