Next Article in Journal
Dual-Polarized Stacked Patch Antenna for Wireless Communication Application and Microwave Power Transfer
Next Article in Special Issue
Impacts of GPS Spoofing on Path Planning of Unmanned Surface Ships
Previous Article in Journal
A Privacy-Oriented Approach for Depression Signs Detection Based on Speech Analysis
 
 
Article
Peer-Review Record

Two-Branch Attention Learning for Fine-Grained Class Incremental Learning

Electronics 2021, 10(23), 2987; https://doi.org/10.3390/electronics10232987
by Jiaqi Guo 1, Guanqiu Qi 2, Shuiqing Xie 1 and Xiangyuan Li 1,*
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Electronics 2021, 10(23), 2987; https://doi.org/10.3390/electronics10232987
Submission received: 27 October 2021 / Revised: 21 November 2021 / Accepted: 25 November 2021 / Published: 1 December 2021
(This article belongs to the Special Issue Advancements in Cross-Disciplinary AI: Theory and Application)

Round 1

Reviewer 1 Report

The paper reflects a very poor presentation of the problem statement. The keywords are not satisfactory. The figures are with insufficient information. The proportionate presentation of the equation is missing. They have considered three fine-grained object dataset. They have used traditional CNN. What is the benefit of it in this application?

The experimental results are most unsatisfactory.  The conclusion is very exhausted. The works needs major upgradation. Please do it as fast as possible.  

Author Response

Thanks for your time. We have made major upgradation on this manuscript. A revision summary is provided below:

  1. We have rewritten most of the introduction to better describe the motivation and technical contribution of this work.
  2. We have greatly enhanced captions of Figures 1 and 3 to make the descriptions self-contained.
  3. We have revised the descriptions of baseline models (see section 4.2) with the addition of model configurations for each evaluated network.
  4. We have added texts to describe the hyperparmeter selection and a parameter size comparison with other models in section 4.3.
  5. We have greatly updated the ablation study section (4.4) to justify the design choices.
  6. We have greatly updated the results section (4.5) to provide a detailed discussion on results.
  7. We have also updated the conclusion section to summarize the highlights of this paper.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper designs a two-level deep learning network architecture(TBAL-net) to localize critical regions and learn fine-grained feature representation. The model is defined as a lightweight attention module and 
it is trained by the incremental learning approach. 

Overall the proposed network is interesting and the reported performance is comparable to the state-of-the-art network. However, there are some concerns that shall be addressed.

First, in addition to recognition performance, the model size of each model shall be compared (as the paper mentioned that t
the proposed model is lightweight.  

Secondly, in the experiment part, the authors defined incremental step = 10. It would be interesting the validate other 
choices as well to highlight the effectiveness and the optimum choices of the incremental training protocol.

Lastly, the model configuration of each network shall be illustrated.

 

Also, there are minor issues as the following.
- The number at the header of each column (50,60,...) should be labeled with its definition in the table to make it self-contained.
- The definition of coarse-grained and fine-grained shall be discussed for better understanding for readers outside the specific field.
- All the descriptions of each figure shall be discussed in detail. Description of Figure 1 shall be provided (what is FC). Similarly, detailed description of Figure 2.
Figure 3 is not mentioned in the paper and the discussion is not provided.

 

Author Response

  1. First, in addition to recognition performance, the model size of each model shall be compared (as the paper mentioned that the proposed model is lightweight.

Response: Thanks for the comment. We have added text to discuss the model size in terms of the learnable parameters in comparison with its peers. In fact, the addition of attention modules incurs slightly more parameters than the MMAL model. However, these additional parameters provide consistent performance gains during the class incremental learning. The return on investment is high, which is reason we call it a lightweight attention module. The word lightweight is an adjective for the attention module, not for the entire model (see lines 290-295).

 

  1. Secondly, in the experiment part, the authors defined incremental step = 10. It would be interesting the validate other choices as well to highlight the effectiveness.

Response: Thank you for your valuable comments. Following your suggestion, we have conducted a series of additional experiments to investigate the effect of the number of incremental phases, which is added as an ablation study in section 4.4 (see lines 329 – 336, Figures 4, 5, and 6).

 

  1. Lastly, the model configuration of each network shall be illustrated.

Response: Thank you for your valuable comments. We have added network configuration of the baselines used in the experiment in section 4.2 (see lines 247-270). Also, a detailed description of the network configuration for the proposed model is given in section 4.3 (see lines 274-295).

 

 

Also, there are minor issues as the following.

  1. The number at the header of each column (50,60,...) should be labeled with its definition in the table to make it self-contained.

Response: We fixed this formatting issue of the table.

 

  1. The definition of coarse-grained and fine-grained shall be discussed for better understanding for readers outside the specific field.

Response: Thanks for the comment. We have added text to explain the difference between coarse and fine-gained visual tasks that also motive this study. Please see lines 52-55 for an update.

 

  1. All the descriptions of each figure shall be discussed in detail. Description of Figure 1 shall be provided (what is FC). Similarly, detailed description of Figure 2.

Figure 3 is not mentioned in the paper and the discussion is not provided.

Response: We have 1) enhanced the captions of Figures 1 and 3 and added texts to describe figures 1, 2, and 3 (see lines 135-141, 158-159, 193-199, 201-207). We also adjusted the locations of these figures so that they are closer to the corresponding text descriptions.

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper proposes two-branch attention learning for fine-grained class incremental learning. The description of the proposed TBAL-Net framework is not clear. The paper claims that the attention module similar to [36] is effective in fine-grained CIL; however, the difference between the proposed approach and the existing model frameworks is not well clarified. Furthermore, the paper claims that the localization modules in MMAL only increase a few parameters; however, there is no experiment to justify this.

1. Figure 1. Please add more descriptions (e.g., Module 1, Module 2, APLM, etc) into this Figure so that it is self-contained. 
2. Figure 2 is very similar to that is used in [36]. Is there anything new in Figure 2?
3. Figure 3 is not described in the main text.
4. Line 80: The experimental evaluation between the proposed approach and the existing approaches are NOT considered as a technical contribution of the paper.
5. Line 11: 'a unified classified' -> 'a unified classifier'.
6. Line 166: Please add 'space' between the symbol and texts.
7. Line 189: How the hyperparameter \lambda is learned?
8. Line 243: The spacing looks very weird.
9. Line 284: What parameter is evaluated in this experiment?
10. Line 287: Table 3 crosses two pages.
11. There are no discussions provided in Section 4 experimental results. Only Tables and Figures are provided without discussions.

Author Response

Report 1

  1. The paper claims that the localization modules in MMAL only increase a few parameters; however, there is no experiment to justify this.

Response:Thanks for the great comment. We have added several sentences to compare the number of parameters between the proposed model and two baseline models (see lines 290 - 295).

 

  1. Figure 1. Please add more descriptions (e.g., Module 1, Module 2, APLM, etc) into this Figure so that it is self-contained.

Response: Thanks for the comment. We have enhanced the caption of this figure to make it self-contained and more reader friendly (see lines 135-141).

 

  1. Figure 2 is very similar to that is used in [36]. Is there anything new in Figure 2?
    Response: Thanks for the comment. There is no new design in Figure 2. We should have pointed out that the internal design of channel and spatial attention modules are the same as [36]. Several parameters were empirically chosen to fit the learning task of our study. Although the design of these two attention modules is from existing work, its effect on the FGVC CIL setting has not been extensively validated, and our study is the first attempt. We revised the manuscript accordingly (see line 158).

 

  1. Figure 3 is not described in the main text.

Response: Thanks for pointing this out. We should have examined the manuscript for rigor. We revised the manuscript by 1) adding an enhanced caption for Figure 3 and 2) highlighting the description of the figure (see lines 193 - 208)

 

  1. Line 80: The experimental evaluation between the proposed approach and the existing approaches are NOT considered as a technical contribution of the paper.

Response: Thanks for the comment. We have revised the presentation of the technical contribution of this manuscript (see lines 79 - 86 for an update).

 

  1. Line 11: 'a unified classified' -> 'a unified classifier'.

Response: Thanks for the catch. We have fixed the typo.

 

  1. Line 166: Please add 'space' between the symbol and texts.

Response: Thanks for the comment. We have thoroughly examined the manuscript and added spaces between symbols and texts.

 

  1. Line 189: How the hyperparameter \lambda is learned?

Response: Thanks for the comment. We have revised the manuscript at line 212 to clarify that \lambda is learnable parameter in cosine normalization and can be updated through the back propagation.

 

  1. Line 243: The spacing looks very weird.

Response: Thanks for the catch. We thoroughly checked the formatting of the entire manuscript and fixed several places that are misformatted.

 

  1. Line 284: What parameter is evaluated in this experiment?

Response: We discussed the evaluated hyperparameter at lines 283 – 290. In addition, the number of incremental phases is also evaluated as an ablation study (see lines 329 – 336, Figures 4, 5, and 6).

 

  1. Line 287: Table 3 crosses two pages.

Response: Thanks for your comment. We have edited the format of the tables in this manuscript so that each table takes less vertical space and can be easily placed within a single page.

 

  1. There are no discussions provided in Section 4 experimental results. Only Tables and Figures are provided without discussions.

Response: Thanks for the comment. We have revised section 4.4 (Ablation study see lines 321-336) and section 4.5 (Results at lines 372-396) with more detailed result interpretation and analysis.

 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Now it can be considered for publication. 

Reviewer 3 Report

The revision is fine.

Back to TopTop