Review Reports - Towards Super Compressed Neural Networks for Object Identification: Quantized Low-Rank Tensor Decomposition with Self-Attention

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The article "Towards Super Compressing Neural Networks for Object Identification: Quantized Low-rank Tensor Decomposition with Self Attention" focuses on addressing the challenge of managing the increasing complexity and storage space requirements of deep convolutional neural networks (DCNNs) while striving for improved performance. The authors propose a method called Quantized Low-rank Tensor Decomposition (QLTD) to achieve significant compression of DCNNs while maintaining high identification accuracy. The QLTD method involves low-rank Tucker decomposition to compress pre-trained weights, vector quantization to further exploit redundancies, and the introduction of a self-attention module to enhance training responsiveness in critical regions. The results of experiments on the CIFAR-10 dataset demonstrate that QLTD achieves a compression ratio of 35.43× with less than 1% loss in accuracy and a compression ratio of 90.61× with less than 2% loss in accuracy. The article aims to provide a more compact representation for neural networks to achieve a significant compression ratio in terms of parameter count and maintain a good balance between compressing parameters and maintaining identification accuracy.

I will propose several points that should be included to enhance the paper's comprehensiveness.

1. The document does not explicitly outline any disadvantages of the proposed method. However, it's important to note that while the Quantized Low-rank Tensor Decomposition (QLTD) method achieves significant compression of deep convolutional neural networks (DCNNs) while maintaining high identification accuracy, there may be potential limitations or trade-offs that are not explicitly addressed in the document.

2. The conclusion or discussion section in Chapter 4 of this paper lacks a comprehensive analysis of potential drawbacks or limitations of the QLTD method.

3. Please provide the training time and inference time for the four networks (Maestro, TDNR, PQF, and QLTD in Table 1).

4. If researchers aim to replicate the feasibility and accuracy of the experiment, we encourage the author to share the GitHub source code, as this would significantly enhance the paper's value and potentially increase its citations. However, upon checking the provided link (https://github.com/liubc17/QLTD), I discovered that the repository is empty.

Comments on the Quality of English Language

This paper appears to be well-researched, well-documented, and makes a significant contribution to the field of neural network compression.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This work explores the compression of neural networks using a variety of techniques, including Tucker decomposition, quantization, permutation, and self-attention modules. Since the focus isn't specifically on object detection, we defer the final decision on its relevance for the special issue to the editors. The examples provided towards the end of the work are centered on image classification tasks using Cifar10 and ImageNet datasets. We think that the authors shoud consider submitting their work to a more specialized venue.

Regarding the content of the work, there are several important editing issues that need to be addressed:

1. Figure 1: The complexity of Figure 1 makes it difficult to understand. Consider redrawing it and adding a detailed and informative caption.

2. Introduction to Tucker decomposition: An introduction to Tucker decomposition along with a reference should be added for clarity and to ensure the paper is as self-contained as possible.

3. Related Works Section: Merely listing previous works is insufficient; explain how they relate to your work. Clarify the novelty of your approach compared to other low-rank decomposition methods, particularly those utilizing Tucker decomposition.

4. Algorithm 1: The pseudocode you use in the algorithm is not suffciently formal. Be more explicit, especially regarding the following points:

- Clarify the concept of a "codebook." If it refers to centroids, explain why it's termed a codebook.

- Elaborate on how initial weights are obtained in lines 6, 7, and 8.

5. As noted in line 190, the kernel size of DCNNs is often too small for compression. If I understand correctly, you compress input channels with a 1x1 convolution, apply the kernel, and then expand it back to the expected number of output channels with another 1x1 convolution. This resembles depth-wise convolutions. Please comment on this.

6. Section 3.6: This section is challenging to follow. In the typical formulation of a self-attention layer (dot-product), there are no weights. However, your layer appears to incorporate weights. Please provide a precise formulation of the attention layer used and explain how weights are initialized.

7. Fine-Tuning: It's unclear whether part of the network is frozen during fine-tuning or if the entire network is re-trained.

8. References: Avoid lists in the introduction; instead, reference a couple of recent surveys. Additionally, address any typographical errors in the bibliography and throughout the article.

9. Please make the code open access.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

This is an interesting paper on a relevant topic. However, it needs some work when it comes to clarity and correctness. I have added some detailed comments below.

In the abstract and introduction, you mention that this method is relevant to situations when storage space and computational resources are limited. It would be good to mention examples of such situations. When and where are these high compression rates necessary?

Row 57: You write “extremely large compression rates”. What is extremely large in this content? This writing is used in several places in the manuscript.

Row 91: There is a strange word at the end of this row.

Row 118: You write: “most of the methods…”. Does this mean that some of the methods can achieve large compression ratios? Please clarify.

Row 120: You write: “often result…” But not always? Please clarify.

Row 138: You write: “achieve very large compression ratio”. What is very large? Please clarify.

Row 151: You describe a method that achieves “superior network compression and higher final accuracy”. Superior to what? Higher final accuracy than what? Please clarify.

Section 3: Consider moving subsection 3.2 to the end, since several definitions in the overall framework are explained in the following subsections. Now it is hard to follow section 3.2.

Section 3: Please make sure that everything in the equations is defined properly. (E.g.; HxWxS in row 182 etc.).

Row 304: You write: “We learn an encoding β” It is unclear what that means here.

Please review the language used in the text. E.g.; row 206 “a bunch of…” please don’t use this kind of expression in the paper.

Row 211: c(t) – what is t here?

Section 3.6: This subsection is hard to follow. Since this is the key contribution to this paper it needs to be explained more comprehensively and clearly.

Row 252: Please define “The error” in a clearer way.

Figure 2: Unnecessary figure since this is a well-known image data set, a reference would be enough.

Row 273: please clarify the two different percentage values.

Figure 3: Unnecessary figure since this is a well-known image data set, a reference would be enough.

Table 1: it is hard to compare since different compression rations has been used.

Row 281-282: the expression “extremely high compression ratio” is not a suitable description.

Row 287: “we conduct more detailed compared experiment with PQF”. I can not understand this sentence. Please clarify.

Row 290: You state that you reach 10 x higher compression ratio than PQF. However, you reach 90.31 and PQF 79.34. Please explain how it can be 10 times higher.

Row 296: Please clarify this sentence, I cannot understand the meaning of it.

Row 297: “a much better trade-off between…” This should be expressed in a more scientifically correct way.

Row 320: “under close compression ratio of parameter count”. What is meant by this? Please clarify.

Comments on the Quality of English Language

Please review the language used in the paper and avoid expressions like “extremely large", “a bunch of…”, etc.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The author has answered my questions very carefully, and I suggest that the current version should be accepted for publication.

Reviewer 2 Report

Comments and Suggestions for Authors

I am happy with the revision done by the authors and believe the article can be accepted in its current version